There's a great deal of misleading piece of argument to be found in currently official PVE docs on QDevices [1] and then some conscious effort to take it even further.
Under "Supported setups" [1] the following is advised (emphasis mine):
It continues to provide an absurd piece of reasoning (only focusing on the odd, i.e. "discouraged" setup) in such case regarding that "QDevice provides (N-1) votes" and therefore "QDevice acts almost as a single point of failure" (emphasis mine).
This is, of course, blatantly wrong, it is mixing up even/odd nodes scenarios with algos (
What PVE does is that first, it makes the impression QD cannot be installed for odd number of nodes by throwing [3] a tantrum:
What it does not advise is that there's a
So unsure how this is a dangerous operation, but when executed with
And finally, going back to the docs to top it off:
First of all, HA has nothing to do with corosync (there's probably more users running clusters w/o HA than those w/HA enabled) and that the HA stack is immature when it comes to CRS is not the fault of corosync. Even if HA is used (on more remaining nodes), this is clearly known to the user (from observing the behaviour of HA even in case of single node fail) that it may overload the surviving nodes (however many) and there's absolutely no relation of this piece of advice to
I do note that one is free to decide for themselves in the conclusion of the same para of the docs [1], however currently there's literally everything done to prevent the default QD setup for odd number of nodes - you cannot get this unless manually overriding what PVE scripting did to have:
I really do not understand what was the purpose of all of these concerted efforts, especially it's completely undocumented.
[1] https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support
[2] https://manpages.debian.org/testing/corosync-qdevice/corosync-qdevice.8.en.html
[3] https://github.com/proxmox/pve-clus...4c05a11b0f864f5b9dc/src/PVE/CLI/pvecm.pm#L136
[4] https://pve.proxmox.com/pve-docs/pvecm.1.html
Under "Supported setups" [1] the following is advised (emphasis mine):
We support QDevices for clusters with an even number of nodes and recommend it for 2 node clusters, if they should provide higher availability. For clusters with an odd node count, we currently discourage the use of QDevices.
It continues to provide an absurd piece of reasoning (only focusing on the odd, i.e. "discouraged" setup) in such case regarding that "QDevice provides (N-1) votes" and therefore "QDevice acts almost as a single point of failure" (emphasis mine).
This is, of course, blatantly wrong, it is mixing up even/odd nodes scenarios with algos (
ffsplit
and lms
) when it comes to QD - but the ffsplit
is default [2]. In ffsplit
, the QD definitely does not have N-1 votes, it has exactly 1 vote whether it's even or odd number of votes - except it does not when it comes to PVE.What PVE does is that first, it makes the impression QD cannot be installed for odd number of nodes by throwing [3] a tantrum:
Code:
Clusters with an odd node count are not officially supported!
What it does not advise is that there's a
--force
switch, but at least it is (somewhat) in the docs [4]:
Code:
--force <boolean>
Do not throw error on possible dangerous operations.
So unsure how this is a dangerous operation, but when executed with
--force
, PVE explicitly changes the algo to lms
(without advising this time, also it is not mentioned anywhere in the docs). In that case, it would indeed be that QD gets N-1 votes, except this has nothing to do with odd number of nodes, let alone default setup. Ironically, there's no manual switch to tell pvecm
what algo one wants.And finally, going back to the docs to top it off:
The fact that all but one node plus QDevice may fail sounds promising at first, but this may result in a mass recovery of HA services, which could overload the single remaining node. Furthermore, a Ceph server will stop providing services if only ((N-1)/2) nodes or less remain online.
First of all, HA has nothing to do with corosync (there's probably more users running clusters w/o HA than those w/HA enabled) and that the HA stack is immature when it comes to CRS is not the fault of corosync. Even if HA is used (on more remaining nodes), this is clearly known to the user (from observing the behaviour of HA even in case of single node fail) that it may overload the surviving nodes (however many) and there's absolutely no relation of this piece of advice to
lms
, let alone to QD as such, and completely irrespective of whether it's odd or even nodes cluster. The CEPH part of course can be accounted for by setting expected to 2, but again this is out of scope and it has nothing to do with QD setup on odd number of nodes, it's to do with lms
, which was brought in by PVE script in the first place.I do note that one is free to decide for themselves in the conclusion of the same para of the docs [1], however currently there's literally everything done to prevent the default QD setup for odd number of nodes - you cannot get this unless manually overriding what PVE scripting did to have:
Code:
quorum {
device {
model: net
net {
algorithm: ffsplit
host: 1.2.3.4
tls: on
}
votes: 1
}
provider: corosync_votequorum
}
I really do not understand what was the purpose of all of these concerted efforts, especially it's completely undocumented.
[1] https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support
[2] https://manpages.debian.org/testing/corosync-qdevice/corosync-qdevice.8.en.html
[3] https://github.com/proxmox/pve-clus...4c05a11b0f864f5b9dc/src/PVE/CLI/pvecm.pm#L136
[4] https://pve.proxmox.com/pve-docs/pvecm.1.html
Last edited: