What appen if a Cluster Split in 2 ?

Jan 16, 2022
195
8
23
38
Hi let say we have a 2x 8 nodes in 2 DC backed with a Bonding for redudancy between each DC but for some reason it fail .

what appen to each 8 nodes ? will each side remain active with VM alive or ..?
 
I didn't try this myself, but the thing with "Quorum" is you need to stay above 50% to be able to manage the cluster.
  • both "halfs" of the cluster will lose quorum and will not accept any management tasks via the web interface
  • you have no HA enabled? In that case everything stays up, all VMs continue to work
  • with HA "fencing" will happen and some Nodes will reboot. All VMs on that node will stay turned off for now
If you know what you're doing you can "pvecm expected 8" on each half to re-enable management. Now be careful not to create inconsistencies manually - you disabled the safeguard! Pressure for "Get up and running again! Quick!" will lead to errors.

You need to have a 17.th device to clear up the situation. If that Quorum device is reachable out-of-band from both halfs the cluster stays fully intact. This is not easy to achieve as the latency between those three locations need to be very low.

Again: I did not test this; anyone please correct me if I've described it wrong.

For reference: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum -- 5.10. Corosync External Vote Support
 
I didn't try this myself, but the thing with "Quorum" is you need to stay above 50% to be able to manage the cluster.
  • both "halfs" of the cluster will lose quorum and will not accept any management tasks via the web interface
but actives VMs will still work on those nodes right if no HA is present ?
  • you have no HA enabled? In that case everything stays up, all VMs continue to work
for this to appen 100% of the cluster need to have 0 HA enabled anywhere right ?
  • with HA "fencing" will happen and some Nodes will reboot. All VMs on that node will stay turned off for now
If you know what you're doing you can "pvecm expected 8" on each half to re-enable management. Now be careful not to create inconsistencies manually - you disabled the safeguard! Pressure for "Get up and running again! Quick!" will lead to errors.
this command will re-gain to management as each side will then have more than 50% up , correct ?
You need to have a 17.th device to clear up the situation. If that Quorum device is reachable out-of-band from both halfs the cluster stays fully intact. This is not easy to achieve as the latency between those three locations need to be very low.
So per my understanding , the only way to avoid 1 side to crash completly is to not setup HA.. otherwise all running service at those multples location will fails and reboot.. we try to understand correctly we dont see cross clustering function .
so it might be better to build seperated cluster for each Datacenter and hope proxmox complete the management trough webui of multi cluster functions .


Again: I did not test this; anyone please correct me if I've described it wrong.

For reference: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum -- 5.10. Corosync External Vote Support
 
but actives VMs will still work on those nodes right if no HA is present ?
for this to appen 100% of the cluster need to have 0 HA enabled anywhere right ?
I've not tested this but I assume only nodes with VMs with active HA would get affected. You can recognize the state of each node via "Datacenter --> HA": "<node> - lrm idle" means zero VMs are configured for HA.
this command will re-gain to management as each side will then have more than 50% up , correct ?
Yes. But is is fake. You may create conflicts which lead to trouble when you try to re-unite both halfs.
so it might be better to build seperated cluster for each Datacenter and hope proxmox complete the management trough webui of multi cluster functions .
Yes. It is on the current roadmap and CLI has already been implemented while GUI is missing: https://pve.proxmox.com/wiki/Roadmap
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!