Cluster Quorum Question when all nodes go offline

jose-pr

New Member
May 4, 2024
8
2
3
How does proxmox handle quorum/split-brain situations when all devices go offline. Does it need to bring all devices online to achieve quorum again and determine the latest correct cluster configuration (/etc/pve)?

In a case with 2 nodes (node-1, node-2) and a qdevice, if i lose node-1 and changes happen in the configuration (/etc/pve) in node-2 and then we lose turn off node-2.
If the first node to come online is node-1, would it obtain quorum with the qdevice, or would it require node-2 to come back online before initial quorum can be achieved again.
 
How does proxmox handle quorum/split-brain situations when all devices go offline. Does it need to bring all devices online to achieve quorum again and determine the latest correct cluster configuration (/etc/pve)?
Proxmox prevents split-brain using quorum (more than half the nodes agree that they can reach each other): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum
In a case with 2 nodes (node-1, node-2) and a qdevice, if i lose node-1 and changes happen in the configuration (/etc/pve) in node-2 and then we lose turn off node-2.
If the first node to come online is node-1, would it obtain quorum with the qdevice, or would it require node-2 to come back online before initial quorum can be achieved again.
Changes happen on all nodes that are past of the quorum (more than half), not just one node. If changes cannot be make on all nodes of the quorum then the change does not happen. Nodes that are not part of the quorum are read-only and cannot make changes.
 
Proxmox prevents split-brain using quorum (more than half the nodes agree that they can reach each other): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum

Changes happen on all nodes that are past of the quorum (more than half), not just one node. If changes cannot be make on all nodes of the quorum then the change does not happen. Nodes that are not part of the quorum are read-only and cannot make changes.
But from what i understand the qdevice doesnt have any data on it, so no changes would happen on it, or does it keep a hash/checksum of the sqlite database or last modified timestamp that node-1 when it boots would be able to read and determine that its configuration its out of sync and it cant form a quorum with the qdevice?
 
But from what i understand the qdevice doesnt have any data on it, so no changes would happen on it, or does it keep a hash/checksum of the sqlite database or last modified timestamp that node-1 when it boots would be able to read and determine that its configuration its out of data and it cant form a quorum with the qdevice?
If I understand it correctly: If node-X and the qdevice see each other, then that determines the configuration. If node-Y joins later, it will will use the node-X configuration.

You can probably indeed be malicious and turn off node-1 and make changes on node-2 (which is fine) and then turn off node-2 also and then turn on node-1.
If both nodes come up then I don't know if the Qdevice prefers one over the other. I think I remember reading somewhere that a QDevice gives their vote to only one subset of the cluster, if they detect multiple subsets. Maybe someone else knows or can read the documentation and/or the source code.

I would suggest not getting in this situation (by making important changes when one of your nodes is down). Running only two-nodes (even with a QDevice) is an edge case anyway.
 
From.what I understand it will use lowest node ID to select who to give its vote to. But hadnt check what happens when it boots, restore connections and a race situation when the nodes are coming back online.
 
I would suggest not getting in this situation (by making important changes when one of your nodes is down). Running only two-nodes (even with a QDevice) is an edge case anyway.
You are correct, the OP is describing a double-fault in the cluster. I have not tried this particular situation, but if I were designing for such a condition - I'd not let the system recover automatically and force the administrator to pick and force one of the sides to be "primary" (have preference in rebuilding cluster).


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I think what i want can be provided by the:
wait_for_all option in corosync, initial sync would rely on all nodes being online, which is what i want. Keep quorum while a device has been online all the time but if not required all devices online to create sync.

The only think i am left thinking is if there are any hooks that can be attached with corosync/ha code to get called when sync is lost to change the votes of the remaining node to be higher until it the missing node comes back and syncs and then setting votes back to 1 per node.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!