4+ node clusters with qdevices

tempacc375924

Member
Nov 18, 2023
103
11
18
Does anyone have experience in various scenarios when e.g. starting with 4 regular PVE nodes (anything more than 3 and even really) and adding a Q device (+1) to get odd corosync members?

Having a node go down in 4+1 scenario, you one is back to 3+1, hence 4. It would be good to have the Q device disengage at that point again by its own observation, wouldn't it? Does this cause any practical problems when running with one node down like that?

Even worse, say 6+1 scenario and your connectivity issues split the nodes into 3 and 3 group (where nodes can only see each other within the group, but not across) and the Q device being accessible from both. How does that play out for the cluster integrity?
 
The text on https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support says that the qdevice will hand out the quorum only to one partition of a split cluster.

Thanks, thought that is the issue (or maybe not) and why I asked about experiences. Situation say 3 nodes (group A) and 3 nodes (group B) get split by a network disruption, external to them is a Q device that chooses to give its vote to group A, after it happens, the connection between group A and Q is severed. Will the group A self-fence as they can't get the Q device vote? What's the latency there (as there's no low latency requirement on Q)? The Q cannot give vote to group B (or will it) unless it's sure group A at least self-fenced...
 
Will the group A self-fence as they can't get the Q device vote?
Yes, if active HA services are present on a node, and no quorum can be reached for a prolonged time, self-fence is always triggered.
What's the latency there (as there's no low latency requirement on Q)?
From https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_cluster_resource_manager
When a cluster member determines that it is no longer in the clusterquorum, the LRM waits for a new quorum to form. As long as there is noquorum the node cannot reset the watchdog. This will trigger a rebootafter the watchdog times out (this happens after 60 seconds).
So, roughly 60s given or take, i.e., updates are done every 10s, so if you're unlucky and the last successful update happened 9.999s before quorum was lost then it's 50.0001s.
The Q cannot give vote to group B (or will it) unless it's sure group A at least self-fenced...
The quorum-device has no knowledge about self-fencing or how the HA stack works, that's all managed by HA, corosync doesn't bothers with that at all, that's just for quorum and providing the closed process group for cluster-wide data communication.

And in the HA stack the current active CRM manager will only start to recover services from a dead node once it could successfully acquire the node-specific lock of a node that went offline, and that can only happen once self-fencing definitively happened (cluster lock time out is 120s, so in the worst case that, but maybe faster).
See https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_fencing

IOW., the QDevice will choose a partition, if there's a split, e.g., due cluster network being cut between a set of three nodes each, and that will get quorum immediately. IIRC the qdevice re-query max-period is 20s and on partition change, so that will happen relatively fast.
 
Yes, if active HA services are present on a node, and no quorum can be reached for a prolonged time, self-fence is always triggered.

From https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_cluster_resource_manager

So, roughly 60s given or take, i.e., updates are done every 10s, so if you're unlucky and the last successful update happened 9.999s before quorum was lost then it's 50.0001s.

The quorum-device has no knowledge about self-fencing or how the HA stack works, that's all managed by HA, corosync doesn't bothers with that at all, that's just for quorum and providing the closed process group for cluster-wide data communication.

And in the HA stack the current active CRM manager will only start to recover services from a dead node once it could successfully acquire the node-specific lock of a node that went offline, and that can only happen once self-fencing definitively happened (cluster lock time out is 120s, so in the worst case that, but maybe faster).
See https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_fencing

IOW., the QDevice will choose a partition, if there's a split, e.g., due cluster network being cut between a set of three nodes each, and that will get quorum immediately. IIRC the qdevice re-query max-period is 20s and on partition change, so that will happen relatively fast.
Thank you VERY MUCH for the detailed answer (and sorry for missing the timeout piece when browsing the docs myself).

So basically I can think of pve-ha-crm/lrm as of equivalent of pacemaker. Corosync (to which q device is a voter to) keeps the common data intact, I get that, I was just wondering that when the group A in the example above self-fences (as a result of the HA functionality), it will also stop giving its vote towards the quorum. All the rest clear, will conduct more tests going forward. Thanks again!
 
So basically I can think of pve-ha-crm/lrm as of equivalent of pacemaker
Basically, but way simpler (in usage but also flexibility/feature-set, pacemaker is huge).
Before Proxmox VE 4 the rgmanager project was used as ha-manager, and it worked quite well, as in small feature set and simple implementation - i.e., what one likes from a HA manager. But, rgmanager was deprecated since a bit then and basically dead, which otoh isn't also the best thing for a HA-manager.
We already favored rgmanager over pacemaker due to the huge complexity of the latter, and when getting the zugzwang to choose again, we still felt that pacemaker was the wrong choice for us and our users (it's surely a magnificent project and has many happy users), so we wrote ha-manager, which orients itself on the rgmanager features, but took a few different design decisions. Depending on your use cases there are a few features that are still missing (mostly affinity and anti-affinity for groups, and also expanding cluster-resource-scheduling, but at least the latter isn't really related to HA directly and some other convenience functionality).
 
  • Like
Reactions: tempacc375924
Basically, but way simpler (in usage but also flexibility/feature-set, pacemaker is huge).

Before Proxmox VE 4 the rgmanager project was used as ha-manager, and it worked quite well, as in small feature set and simple implementation - i.e., what one likes from a HA manager. But, rgmanager was deprecated since a bit then and basically dead, which otoh isn't also the best thing for a HA-manager.

Thanks for the detailed explanation, I had asked before and had someone mostly say it was before their time and unsure. It felt a bit odd, I then found a bit of the rationale in a readme in GIT, but it was mostly just mentioning that rgmanager went dead, not why the rest...

We already favored rgmanager over pacemaker due to the huge complexity of the latter, and when getting the zugzwang to choose again, we still felt that

One must be a German speaker or chess player to appreciate this expression. :D

pacemaker was the wrong choice for us and our users (it's surely a magnificent project and has many happy users), so we wrote ha-manager, which orients itself on the rgmanager features, but took a few different design decisions. Depending on your use cases there are a few features that are still missing (mostly affinity and anti-affinity for groups, and also expanding cluster-resource-scheduling, but at least the latter isn't really related to HA directly and some other convenience functionality).

No worries, it's mostly the familiarity factor and easier understand why I asked to begin with. I also wanted to know better how the watchdog interplays with ha-crm and ha-lrm there, but I can see more myself, also re-read better some of the doc parts for HA.

Cheers!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!