When a cluster node is lost is it possible to restart its VM on another node?

guerby

Active Member
Nov 22, 2020
80
11
28
51
Hi,

I'm testing some edge cases with proxmox VE 6.3: I have a cluster p1 with three nodes node1 node2 node3, all use only a shared NFS (from another machine outside the cluster) for VM disk storage.

VM 100 is running on node1. node2 and node3 have no VM running.

Let's assume node1 fails (poweroff because power supply failure) and won't be repaired immediately (no spare handy), so I want to manually start VM 100 on node2 but I don't want to remove node1 from the cluster (because it will be repaired in the coming days).

I noticed on node2 and node3 there is a copy of the VM 100 info on node2 and node3 in /etc/pve/node1/qemu-server/100.conf and the disk is on shared NFS so nothing is lost even if node1 if offline.

I didn't find an obvious way to do it via the proxmox VE WebUI, did I miss something?

Thanks!
 
The function you are looking for is "High Availability". You can use it as you have three nodes, which is the minimum.

Please read https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_ha_manager

Best regards

Hi, thanks for your answer.

I read this document and I'm not sure I'll be able to test a realistic set of conditions with HA as proposed as my time is limited, that's why I asked how to "manually" restart a VM from a known for sure failed/offline node on another one.

On our current non proxmox hypervisor no VM are autostart and when a failure occurs (hypervisor hardware, network storage, network, out of ressources, etc...) we can afford some time for a human to analyze the situation and decide what to do, our users are ok with reasonable downtimes (server hardware nowadays is quite plentiful and reliable).

I know for experience than recovering from a failed automatic HA restart (because of misconfiguration - plenty of options to get wrong, or just unanticipated failure mode) can be really painful and cause very long downtimes (restore from backups because of corruption, etc...).

Hence I'm looking for a simpler manual process to recover from a failure.
 
Hi,
Hi, thanks for your answer.

I read this document and I'm not sure I'll be able to test a realistic set of conditions with HA as proposed as my time is limited, that's why I asked how to "manually" restart a VM from a known for sure failed/offline node on another one.
If there are no local resources for the VM and nodeA is down, you can simply move the configuration file from /etc/pve/nodes/nodeA/qemu-server/<ID>.conf to /etc/pve/nodes/nodeB/qemu-server/<ID>.conf and then start it on node B.

I know for experience than recovering from a failed automatic HA restart (because of misconfiguration - plenty of options to get wrong, or just unanticipated failure mode) can be really painful and cause very long downtimes (restore from backups because of corruption, etc...).
I'm not sure how corruption would occur from using HA? I think corruption is much more likely to occur when a VM or it's node crashes. When HA "steals" a VM to recover it, it basically does the same (moving the configuration file and starting it on the other node).
 
Hi,

If there are no local resources for the VM and nodeA is down, you can simply move the configuration file from /etc/pve/nodes/nodeA/qemu-server/<ID>.conf to /etc/pve/nodes/nodeB/qemu-server/<ID>.conf and then start it on node B.


I'm not sure how corruption would occur from using HA? I think corruption is much more likely to occur when a VM or it's node crashes. When HA "steals" a VM to recover it, it basically does the same (moving the configuration file and starting it on the other node).
do you plan to add this managable trough the GUI ?

a failed node can occur anytime and it would be more effective to handle this automaticaly / or been able to switch the configuratoin trough the GUI in case of a failed node, intseas of doing CLI to move the configuration file..
 
do you plan to add this managable trough the GUI ?
Why? HA is already doing exactly what it is supposed to do ... and doing it for YEARS.

a failed node can occur anytime and it would be more effective to handle this automaticaly / or been able to switch the configuratoin trough the GUI in case of a failed node, intseas of doing CLI to move the configuration file..
The CLI stuff is only the way to do if you do NOT have setup HA properly. With HA setup properly, you don't need to worry about it.... again ... has been like this for YEARS.
 
  • Like
Reactions: fiona
Why? HA is already doing exactly what it is supposed to do ... and doing it for YEARS.


The CLI stuff is only the way to do if you do NOT have setup HA properly. With HA setup properly, you don't need to worry about it.... again ... has been like this for YEARS.
That is good to know then.
we have HA enable on each VM and disabled the Failback as we need to mount GFS2 Shared on each Host manually per our policy to pint point why a node as actually Failed prior to bringing it back in service.
3 years ago we faced a similar issue and i tough my team had to manually move the VM to another Host ,
 
I tried GFS2 years ago and it was not stable enough in our tests. We've been running a dedicated shared storage via LVM on 5 nodes for 8 years without any (storage) problems. We of course had a few node failures, but everything migrated perfectly to the other nodes and all services were up again after a few minutes.
 
I tried GFS2 years ago and it was not stable enough in our tests. We've been running a dedicated shared storage via LVM on 5 nodes for 8 years without any (storage) problems. We of course had a few node failures, but everything migrated perfectly to the other nodes and all services were up again after a few minutes.
hi what kind of shared storage do you refer to ? over Network or it was SAS shared storage ?
 
hi what kind of shared storage do you refer to ? over Network or it was SAS shared storage ?
I would prefer a shared storage with snapshot capability in PVE, but that is still a dream.

Depending on the hardware available, I almost exclusively used LVM (so block storage) in a cluster environment:
- first NBD decades ago
- then for a a couple of years DRBD
- also tried SAS shared storage, but that is/was limited to two machines, so no "real" cluster.
- and finally SAN (mostly FC, but also iSCSI)
(I also played around with ZFS-over-iSCSI, but this is currently not available as an HA option due to my own hardware restrictions)

On top of that I also tried GFS and OCFS2 als filesystems, but I had regular crashes with them (but that was in 2015).
 
I've setup HA for 1 VM and 1 LXC to test (also my main servers).
PVE1 is my main server on which these VM/LXC were running on. I shut down PVE1 to see what would happen.
PVE2 launched my LXC and PVE3 launched my VM. All works as expected.
Now I restarted PVE1 and expected that my LXC and VM would move back to this server, which didn't happen. I got the following log entries :

Code:
Dec 01 17:01:53 pve1 pmxcfs[1802]: [quorum] crit: quorum_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [quorum] crit: can't initialize service
Dec 01 17:01:53 pve1 pmxcfs[1802]: [confdb] crit: cmap_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [confdb] crit: can't initialize service
Dec 01 17:01:53 pve1 pmxcfs[1802]: [dcdb] crit: cpg_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [dcdb] crit: can't initialize service
Dec 01 17:01:53 pve1 pmxcfs[1802]: [status] crit: cpg_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [status] crit: can't initialize service

Not sure if this is the behaviour I should expect or not ? And when the VM/LXC will move back to my main server ?
 
Now I restarted PVE1 and expected that my LXC and VM would move back to this server, which didn't happen.
No, that is not expected. You have to do it manually. No logic in the world can anticipate what you want to do. Therefore, you need to do it manually.
 
I've setup HA for 1 VM and 1 LXC to test (also my main servers).
PVE1 is my main server on which these VM/LXC were running on. I shut down PVE1 to see what would happen.
PVE2 launched my LXC and PVE3 launched my VM. All works as expected.
Now I restarted PVE1 and expected that my LXC and VM would move back to this server, which didn't happen. I got the following log entries :

Code:
Dec 01 17:01:53 pve1 pmxcfs[1802]: [quorum] crit: quorum_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [quorum] crit: can't initialize service
Dec 01 17:01:53 pve1 pmxcfs[1802]: [confdb] crit: cmap_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [confdb] crit: can't initialize service
Dec 01 17:01:53 pve1 pmxcfs[1802]: [dcdb] crit: cpg_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [dcdb] crit: can't initialize service
Dec 01 17:01:53 pve1 pmxcfs[1802]: [status] crit: cpg_initialize failed: 2
Dec 01 17:01:53 pve1 pmxcfs[1802]: [status] crit: can't initialize service

Not sure if this is the behaviour I should expect or not ? And when the VM/LXC will move back to my main server ?
you need to create HA groups, with different priority (higher priority on PVE1). Like this will auto failback to original node .

This log is maybe when the node is starting and you don't have yet quorum ? (this is the pve-cluster service managing the /etc/pve directory).
If this log is not flooding and it's only at start, you can ignore it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!