Re-adding a previously crashed Proxmox Node - After Manual VM migration

jusmax · Feb 14, 2025

Good Day All,

I have a question regarding restoring a failed node on a three cluster environment. This is an old implementation which is to be upgraded and moved in April of this year currently running 6.4-14 and also running a very old version of Ceph. In planning the move, we had a node fail unexpectedly, however the HA was not set up for two of the VMs to move after failure. Therefore, as these are production nodes, a choice of either using a backup or manually moving the VMs had to be decided and the latter was chosen using "6.5.2. Recovering/Moving Guests from Failed Nodes" section from Proxmox's documentation: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_recovering_moving_guests_from_failed_nodes

The VMs were restored by moving the VM config the files to the respective nodes 2 and 3's folders in "/etc/pve/nodes/node1/qemu-server/" which as stated on the doc, understandably violates Proxmox VE’slocking principles. The node 1 was pulled and checked on the bench at our workshop to find that it booted properly and seems to be working fine. We will continue to monitor it for the next 48 hours. My question is if we were to edit the same qemu-server folder in the offline node and delete the config files so as not to conflict with the remaining two active nodes in the cluster where the manually migrated VMs now reside and then add in the node1, would this pose a problem for the cluster. No attempt has been made to remove the node yet from the environment via pvecm. In theory this should work as everything is stored in Ceph. Kindly let me know if I am wrong in my thinking. Thank you.

waltar · Feb 14, 2025

jusmax said:
My question is if we were to edit the same qemu-server folder in the offline node and delete the config files so as not to conflict with the remaining two active nodes in the cluster where the manually migrated VMs now reside and then add in the node1, would this pose a problem for the cluster.

On an isolated node from cluster the config files are readonly as there is no connection to cluster for "vm/lxc updates" which is right.

jusmax said:
No attempt has been made to remove the node yet from the environment via pvecm.

That's right too if the node comes back after your hw testrun time.

jusmax said:
In theory this should work as everything is stored in Ceph.

Yes. When hw is ok just connect as before and boot, maybe after you want to migrate vm/lxc back regulary by webui then.

jusmax · Feb 14, 2025

Hi Waltar,

Thank you for your input and guidance. We would put node 1 in place over the weekend after removing the vm config files from the qemu-server folder on the node 1 to be safe. I will update here to provide our results. Thanks again.

waltar · Feb 14, 2025

You will not be able to remove the config files as it will be readonly ... but you can try with endless effort and even no matter what the config is as it will be auto-updated as when is reconnected to cluster

Search

Search

Re-adding a previously crashed Proxmox Node - After Manual VM migration

jusmax

Active Member

waltar

Famous Member

jusmax

Active Member

waltar

Famous Member

We value your privacy