Proxmox HA automatic quorum

sttal

New Member
Jan 10, 2024
3
0
1
Hi,

My current setup is made of 3 proxmox nodes in a cluster where one of the nodes(let call it NODE1) has far more resources than the others.
My objective is that I'm able to shutdown NODE1 and let the HA migrate the critical CT and VM's to the other nodes.
My question goes in the sense on how to handle the quorum(meaning not losing quorum) of the proxmox cluster when i go from 3 active nodes to 2 active nodes without losing availability.
I have the possibility of adding an external qdevice that i can add to the quorum but i can only do it when there is 2 nodes on the cluster being unable to do it with 3 nodes.

Sorry if this is a dumb question but I still could not understand how to automatically keep the quorum (meaning the number of voting devices odd(2n+1)) the cluster goes from 3 to 2 nodes and from 2 to 3 nodes.

Can anyone help me understand if this is possible?
Thanks in advance.
 
Hi,
not sure I understand the question. If you have 3 nodes with one vote each, a quorum can be reached as long as you have at least two active nodes/votes. There is no issue with quorum if you turn off just one node. You also don't need an external qdevice, that should only be used when you have an even number of nodes in total: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_supported_setups
 
Hi,
@fiona Your first statement seems incorrect accordingly with the documentation. The cluster loses quorum and enters read only mode if it breaks the 2n+1 required for the quorum.(also I've tested it on my cluster and CT's got down due to this)
Nonetheless your link was useful and also a with bit o chatgpt i was able to figure it out last night: the trick is in the external qdevice will help with that and will provide the automatic tie break. Meaning that if a cluster enters a possible split brain state a external qdevice will provide automatic tie breaking (it will automatically adjust its number of votes) and in the case of a cluster of 3 nodes + 1 qdevice if a node goes down the qdevice will vote twice.
With one node down now it correctly keeps the HA as seen bellow:
Code:
Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      4
Quorum:           3 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000002          1    A,V,NMW 192.168.0.1
0x00000003          1    A,V,NMW 192.168.0.2 (local)
0x00000000          2            Qdevice

The more puzzling thing is why is there a big fat warning saying that it doesn't support adding( pvecm qdevice setup <ip> ) a external qdevice to a odd numbered cluster when indeed works and it help in this case. Is there any technical reason? In order to do this i just forced it with -f

That's how it solves this problem. Maybe this detail should be a bit more fleshed out in the documentation as its quite important.

Anyway thanks for the help and i hope the post helps someone out with the same questions.
 
I don't know what you mean with '2n+1'. A cluster always needs an odd number of votes, that way it is not possible to split into two sets of equal votes with no side having a majority. A qdevice is only needed if you have an even number of nodes (like 2 nodes).

You seem to have three nodes, so there is no need for a qdevice in your setup. If one of your three nodes vanishes, the remaining two nodes will have a majority and should still work. If they don't, then there has to be another problem in your configuration.
 
  • Like
Reactions: fiona
If NODE1 votes A and NODE2 votes B then there is no majority that is in computer science the definition of a split brain and thus the reason you need 2n+1 to keep a quorum. its always a odd number :D
You have to keep this property in some way to keep a quorum. its clear that when the node number gets down to 2 you have to do something and thats a reason to add a external qdevice (supported by this). My question was how would you keep this property in a 3 node cluster as if you add a external qdevice i would guess the votes would become 4 which is even number (breaks the 2n+1)? the answer is that to keep this property the cluster increases the value of the external qdevice, meaning it votes 2 times keeping the number of votes on an odd number.
 
I think you are way overcomplicating the whole 'vote-quorum' thing. The nodes are not 'voting A' or 'voting B', they just figure out how many other hosts they can contact and look whether they together are part of a majority (>50% of total votes) or not. If they do have the majority they stay online, otherwise they fence themself.

In a cluster with three nodes there are three votes. In order to get a majority you need two votes. If a node fails each of the remaining two hosts ask the same question: Am I part of the majority or not? They then check how many other hosts they can contact and figure out that they are in a group of two hosts. Two is enough to get majority, so they stay up and running.


Failure of one host is not the critical scenario, as a dead host never causes a split-brain (killed VMs yes, but not global configuration garbage). The really bad situation is when all hosts are running but the connection between the hosts fails. If all hosts just keep running, then each group does their own thing and each writes their own thing into the shared configuration. So there needs to be a globally agreed way for each host to figure out on their own if they should stay online or fence themself.

So this is done using the Quorum mechanic. Every host knows how many votes are needed in order to get a majority of the cluster. Each host counts how many other hosts they can still contact, and if that number is greater than the Quorum number, then they stay online. If not, they fence themself.

A 3-node cluster can only split 1-2. The single node in the first group cannot contact any other host, so they are in a 1-host group and fence itself because 1 < 2. The other two nodes in the second group can contact eachother, so they are in a 2-host group. Because 2 > 1, each chooses to stay online.

A 4-node cluster works fine if it splits 1-3 (the 3 nodes stay up and the 1 node fences). But since 4 is even, it can also split into 2-2 and then we have a problem: Each group has two votes, and that is not enough to get over 50% of all votes, so each group decides to fence themself, and thus every host shuts down.


TLDR:
In order to have a cluster that works in any split scenario, the number of votes in a healthy cluster needs to be odd. So a cluster with 3, 5, 7 or 31 hosts works, but not a cluster with 2, 4 or 20 hosts. If you insist on having an even number of hosts, then you need to add a qdevice, because this adds one single vote to the total vote count, making the even vote count odd. If you have an odd number of hosts, you must not add a qdevice.
 
Last edited:
  • Like
Reactions: fiona
If NODE1 votes A and NODE2 votes B then there is no majority that is in computer science the definition of a split brain and thus the reason you need 2n+1 to keep a quorum. its always a odd number
As long as nodes are in the same quorate partition, they will vote the same way (they talk to each other ;)). It's only problematic if you have an even number of nodes in total, because then you could have two different partitions with half the votes at the same time.
 
You also don't need an external qdevice, that should only be used when you have an even number of nodes in total: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_supported_setups

This is your own documentation, but it's only valid for the default QD fifty-fifty algo, OP's use case is perfect example for using LMS for the QD with tiebreaker. I do not find the we "discourage" helpful, sure it cannot be all tested and guaranteed by PVE team, but why discourage something rather than refer to extra docs for those "other" setups?
 
You seem to have three nodes, so there is no need for a qdevice in your setup. If one of your three nodes vanishes, the remaining two nodes will have a majority and should still work. If they don't, then there has to be another problem in your configuration.

This is wrong, he can keep going with 1 down, but with extra QD he could keep going with 2 down even. I believe this is what OP was after.
 
TLDR:
In order to have a cluster that works in any split scenario, the number of votes in a healthy cluster needs to be odd. So a cluster with 3, 5, 7 or 31 hosts works, but not a cluster with 2, 4 or 20 hosts. If you insist on having an even number of hosts, then you need to add a qdevice, because this adds one single vote to the total vote count, making the even vote count odd.

This reasoning falls apart from 5 nodes up, when you get one failing, you are left with "even" and then you are back to square one.

If you have an odd number of hosts, you must not add a qdevice.

Reference to official docs?
 
@sttal I would look for a proper overview e.g. here:

https://access.redhat.com/documenta...uring-and-managing-high-availability-clusters

EDIT: (After comment #13) NB The reason I picked the RHEL page for the overview for the sake of OP's was to let him understand the:

The LMS algorithm allows the cluster to remain quorate even with only one remaining node, but it also means that the voting power of the quorum device is great since it is the same as number_of_nodes - 1. Losing connection with the quorum device means losing number_of_nodes - 1 votes, which means that only a cluster with all nodes active can remain quorate (by overvoting the quorum device); any other cluster becomes inquorate.
 
Last edited:
That's documentation for RHEL, not Proxmox VE ;) Just because Corosync allows for different configurations, that doesn't mean those are supported or recommended for Proxmox VE...
No no, let's be proper here, this is talking qdevice (and qnetd potentially), specifically config options which have bearing on how the quorum is established. Proxmox is not maintainer of that code, I do not know if you ship it patched I do not believe so.

I understand you have not tested it or wish to support it, but saying on the docs that something is "discouraged" without why when it is in fact the most common tester (a.k.a. homelab user) setup, is not cool. It would be fair to add why something is discouraged when there's a valid reason (e.g. this is untested by PVE) and for further details refer to more exhaustive sources (you do so with Debian topics elsewhere).

PVE is Debian, so I could have just posted man 8 corosync-qdevice [1].

I suppose liking RHEL's quick summary* of something does not make for a good excuse to stop / avoid this relevant discussion.

* Similarly SUSE, they simply have nice docs. The OP may wish to search his own source of overview on how to config corosync-qdevice, it was just for his convenience.

[1] https://manpages.debian.org/testing/corosync-qdevice/corosync-qdevice.8.en.html
 
Last edited:
EDIT: (After comment #13) NB The reason I picked the RHEL page for the overview for the sake of OP's was to let him understand the:
The original question in this thread was not about keeping quorum with just one node running...
If you want to use that configuration, you're free to do so. I'm just mentioning it's not supported in Proxmox VE.
 
The original question in this thread was not about keeping quorum with just one node running...

I really do not mean to argue here, I understand someone might find it argumentative, but the actual argument to be made here is that the OP asked on:

how to handle the quorum(meaning not losing quorum) of the proxmox cluster when i go from 3 active nodes to 2 active nodes without losing availability.

Which then turned into a bizarre pieces of random advice (not from you, but it's on the thread) of the sorts of things like:

If you have an odd number of hosts, you must not add a qdevice.

These myths are then supported tacitly by the PVE docs by "discouraging", without reason, what corosync notably supports.

It is very relevant for the OP because his next source of wondering was using or not using a QD with even odd number of nodes.

If you want to use that configuration, you're free to do so. I'm just mentioning it's not supported in Proxmox VE.

I do not buy this reply (you do not owe us any, so do not get me wrong) on a public forum, with all due respect - what does that mean in the context of the OP being non-subscriber? This is a valid reply for a subscriber raising a support ticket when he demands support from you on something you have no experience with, again with all due respect, I do not expect you to be testing especially 3-node cluster scenarios when e.g. most paying customers run many many more as they should.

My issue with "not supported" expression on a forum (it's okay in the docs) is that it makes one think it would not run or that it runs into problems. If the latter is the case, please specify what kind. RHEL or SUSE support it just fine, PVE just uses corosync. No one wants to support unusual (i.e. different from majority of the user base) configs, the software supports it. It is definitely production feature. It's not like I a telling the OP to go set algos to 2nodelms.
 
Last edited:
Hi,
not sure I understand the question. If you have 3 nodes with one vote each, a quorum can be reached as long as you have at least two active nodes/votes. There is no issue with quorum if you turn off just one node. You also don't need an external qdevice, that should only be used when you have an even number of nodes in total: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_supported_setups

Essentially, I was just saying the above (emphasis mine) is wrong piece of advice when put as a blanket statement, which completely overstretches even the original statement in the docs ("we currently discourage the use of QDevices") and creates myths that others believe. I do not know how to put this nicer, I do not mean offend anyone, but I think it's good to understand how things work under the hood, or what can cause problems and what sorts of.
 
Because qdevice ist recommended to be used in even number of nodes. In odd nodes that is discourage

There's another problem with that "terminology" - until I see a logical answer here, I would find it dubious:
https://forum.proxmox.com/threads/q...-discourage-from-perfectly-safe-setup.139809/

EDIT: What I meant by "terminology" is that if a user says he found something to "be discouraged" in the (PVE) docs, that's fine, but a PVE staff member cannot give this as an explanation to a valid inquiry. Why? Because it's like me publishing a book on something where I state my e.g. best practice but omit reasoning for it (that does not mean there's none though). Then subsequently when asked about it, I revert back to my very own book as the source for my own wisdom (that makes my statement backed by nothing whatsoever). This is a problem. I accept there are requirements, but those should have owners and there should be no problem to articulate reasons for design choices. The reason as of today looks like "we just do not test that to feel comfortable enough to encourage it." That's fine, it should be spelled out though.
 
Last edited: