Why even have a voting based quorum-system?

roggeb · Dec 20, 2023

We are currently in the process of moving our virtualized infrastructure from VMware to Proxmox. We have two data centers with our clusters split up 50/50 across them. One high-level question is still bothering me and I would like to clarify it in order to consolidate my deeper understanding of Proxmox clustering:
Why does the voting system in a Proxmox cluster insist on an odd number of votes?

I have noticed the following disadvantages with this voting system:
- Suppose we have 4 nodes and one of them fails, then the cluster is frozen
- Even if you have an odd number of nodes, you must not allow two to fail in quick succession (which has happened to us in the past, but was no problem because of a working failover)
- If you split the cluster into two rooms (as we do), with a Q-Device in a third room, you add two points of failure (the connections from the two rooms to the Q-Device)

These points are worrying me. vSphere works in the cluster with a master-slave principle, where you don't have to worry about something like this. Why has Proxmox gone for a quorum system here?

Best regards

ness1602 · Dec 20, 2023

Quorum-based system is the golden standard in pretty much anything, and it always was 2n+1 .
In your case, ideally you would have a third room with quorum device, and it would solve it all.

Maximiliano · Dec 20, 2023

Hello,

- Suppose we have 4 nodes and one of them fails, then the cluster is frozen

The quorum with 4 nodes is 3, you can have one node down without issues.

Even if you have an odd number of nodes, you must not allow two to fail in quick succession (which has happened to us in the past, but was no problem because of a working failover)

To be precise, if you have 2n nodes, you can have at most n-1 nodes down and quorum requires n+1 notes. With 6 nodes, 2 can fail.

If you split the cluster into two rooms (as we do), with a Q-Device in a third room, you add two points of failure (the connections from the two rooms to the Q-Device)

Yes, but the alternative is worse. Suppose you have 4 nodes separated in two partitions of nodes each, and suppose the connection between the partition breaks.

without a QDevice, there is no way to break a tie, hence the operation would present issues on the entire cluster, instead of just 2 nodes.

If there wasn't for a strict requirement of 3 nodes to have quorum you would have the two partitions running simultaneously without communication, which allows to run out of sync, access the same resource from two different places at the same time (which leads to data corruption), among other things.

roggeb · Dec 20, 2023

To be precise, if you have 2n nodes, you can have at most n-1 nodes down and quorum requires n+1 notes. With 6 nodes, 2 can fail.

Ok. But still: Why an odd number?

If there wasn't for a strict requirement of 3 nodes to have quorum you would have the two partitions running simultaneously without communication

A split brain scenario. And I get to have a q-device as a witness. But why does the number of nodes have to be odd? The q-device sees the connection to one half of the cluster fail and should just failover to the other half. No matter if odd or even.

Maybe I don't get the voting algorithm? Wouldn't mind an example.

Sorry to be so hung up about this, but it just escapes me and I want to understand a technology fully.

Maximiliano · Dec 20, 2023

You need an odd number of votes to prevent the split brain scenario, you can't have two halves of the cluster with the same number of nodes if you have an odd number of nodes.

If you have an even number, you add a qdevice to make it odd again. The QDevice is no different than a regular node when it comes to making a vote if you have 5 voters (4 nodes + qdevice), if two nodes fail then the qdevice will just add its vote to the other 2, making it 3 votes. Enough for quorum.

Note that for 4 and 5 voters the quorum requirement is the same (3), so you are not really adding a failure point.

Search

Search

Why even have a voting based quorum-system?

roggeb

Member

ness1602

Famous Member

Maximiliano

Proxmox Staff Member

roggeb

Member

Maximiliano

Proxmox Staff Member

We value your privacy