Quorum During Disaster

tprice

New Member
Mar 27, 2025
5
0
1
Hi all,

I currently have a Proxmox cluster of 7 devices, 4 are at one location with the rest being at another. In the event of a disaster where 4 of the nodes are lost, how will this affect Quorum? Is there anything I can do to prevent issues in the event that the cluster is down to 3 nodes?

Thanks for the help.
 
3/7 votes is less than 50% so you would not have a quorum, and the cluster would be down. At that point I suppose you could remove the four nodes from the cluster.
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_remove_a_cluster_node

Can you set this up as two different clusters?

It's also possible to give certain servers more votes but if it's a disaster then the original four are gone...?

Note this scenario is also possible if the link between the two locations is down.
 
Setting up two different clusters could be possible, just a little painful.

For giving servers more votes, is there any issues with giving one of the 3 servers 2 votes? So the cluster of 3 now has 4 total votes.

Something I am now thinking of is what will happen if say the connection between the two sites goes down with this this new vote setup, will there be any issues with each site having its separate quorum? Will the cluster continue to operate normally when the connection is restored?

I appreciate the insight.
 
Avoid streching a PVE cluster, it isn't that good of an idea.

At the very least you will need to make sure each side have the same amount of votes, and add a QDevice in a third location that helps with quorum. Get ready to deal with nodes with different votes (i.e. when a server is down for any reason). Also, the latency requirements between nodes [1] must be taken into account.
Something I am now thinking of is what will happen if say the connection between the two sites goes down with this this new vote setup
Both sides of the cluster will be out of quorum, given they both now have exactly 50% of votes (4 of 8). If you have HA enabled, both sides will fence the hosts by a reboot and all your VM/CT will go down [2]. A QDevice in a third location can help here, as it will give it's bote to one of the sides randomly if the locations can't reach each other but both can reach the QDevice.

If you just want easier management, use PDM [3].

If you just want to manually recover quorum in case of a disaster, there's always the option to use pvecm expected , althoug it comes with it's own requirements (i.e. make 100% sure you use it on one side of the cluster only and while the other is out of quorum, or risk painful pmxcfs errors when the sides reach each other again).

[1] https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network
[2] https://pve.proxmox.com/wiki/High_Availability#ha_manager_fencing
[3] https://pve.proxmox.com/wiki/Proxmox_Datacenter_Manager_Roadmap
 
Currently the one with 4 votes would continue and the other group with 3 votes would be offline. They immediately restart to try to reconnect, for instance, but then would be unable to.

Giving them four votes normally would not help either as you would now have 8 votes and neither side would have 5 votes (over 50%). I was musing about giving them four votes to get the cluster back online. I am not sure what happens if that is due to a temporary outage...both clusters have now continued operating, separately.

The Datacenter Manager may help as it facilitates migration between clusters, AFAIK.
 
  • Like
Reactions: tprice
Avoid streching a PVE cluster, it isn't that good of an idea.

At the very least you will need to make sure each side have the same amount of votes, and add a QDevice in a third location that helps with quorum. Get ready to deal with nodes with different votes (i.e. when a server is down for any reason). Also, the latency requirements between nodes [1] must be taken into account.

Both sides of the cluster will be out of quorum, given they both now have exactly 50% of votes (4 of 8). If you have HA enabled, both sides will fence the hosts by a reboot and all your VM/CT will go down [2]. A QDevice in a third location can help here, as it will give it's bote to one of the sides randomly if the locations can't reach each other but both can reach the QDevice.

If you just want easier management, use PDM [3].

If you just want to manually recover quorum in case of a disaster, there's always the option to use pvecm expected , althoug it comes with it's own requirements (i.e. make 100% sure you use it on one side of the cluster only and while the other is out of quorum, or risk painful pmxcfs errors when the sides reach each other again).

[1] https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network
[2] https://pve.proxmox.com/wiki/High_Availability#ha_manager_fencing
[3] https://pve.proxmox.com/wiki/Proxmox_Datacenter_Manager_Roadmap
It's becoming evident that two clusters is likely the way to go. From the looks of it migration and replication between clusters is not supported at the moment? These are my biggest concerns moving to a two cluster setup.
 
  • Like
Reactions: Johannes S