Best-Practice where to place quorum in multi-node and multi-clusters

buchey · May 2, 2024

Hello folks,

we're planning to enroll three new clusters:
1. 12 nodes
2. 13 nodes
3. 2 nodes

Cluster 1 and 2 should cover HA residing its quorum. I'd like to fetch your opinions, where's best practice to place their quorum. First idea was to place quorum of cluster 1 in cluster 2 and vice-versa.

Both clusters are holding different networks and are standing on their own (no fencing etc.). We just need to know, if cluster 1 is good to go or we have to switch to cluster 2.

Cluster 3 is standing for itself, but could possible host the quorum.

Any opinion on this is appreciated.

aaron · May 2, 2024

I am a bit confused. The large clusters should be fine regarding quorum, up until 6 nodes are failed.

The only cluster that really needs an external QDevice is cluster 3 to get up to 3 votes, and therefore maintaining a majority if one of the two nodes is down. And the external part of the QDevice (corosync-qnetd) needs to run on hardware different than cluster 3.

leesteken · May 2, 2024

buchey said:
Cluster 3 is standing for itself, but could possible host the quorum.

Don't run two-node clusters, as they automatically lose quorum when one of the nodes goes down. Lots of threads here on this forum about issues with only two nodes.

buchey said:
Any opinion on this is appreciated.

What do yo mean by "where to place quorum"? Maybe it's a language difference, but quorum not a thing that can be placed. Quorum is achieved by having more than half of the votes of the nodes: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum

buchey · May 2, 2024

leesteken said:
Don't run two-node clusters, as they automatically lose quorum when one of the nodes goes down. Lots of threads here on this forum about issues with only two nodes.

Noted already and came along those threads. This is just for a small workload and our customer takes the risk.

leesteken said:
What do yo mean by "where to place quorum"? Maybe it's a language difference, but quorum not a thing that can be placed. Quorum is achieved by having more than half of the votes of the nodes: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum

Probably a language thing, yes. Referencing "more than half of the votes", that's the reason why to have 2*x+1 nodes, right?

What we'd like to prevent is a split-brain, due the (partitial) loss of quorum. Maybe a short example helps:

Cluster 1 is hosting 5 VMs of an app, Cluster 2 is hosting 5 VMs of an app. The clusters are "on their own", but they share the same networks so this 10 VMs are able to communicate.

If several hosts fail in cluster 1, we assume, the VMs are moved to other hosts. But what, if the "failed" hosts doesn't know, they're failed and still running the VM and we got the same VM twice running.

How can this be prevented? Is a quorum disk (in the background there'll be a big SAN) an option?

aaron · May 2, 2024

buchey said:
If several hosts fail in cluster 1, we assume, the VMs are moved to other hosts. But what, if the "failed" hosts doesn't know, they're failed and still running the VM and we got the same VM twice running.

That won't be possible. If the node on which the HA VM is running is losing the connection to the rest of the cluster for longer than 1 minute, then it will fence (hard reset, similar to pushing the reset button) to make sure it is powered off, before the (hopefully) remaining rest of the cluster will recover the VM. This will happen after about 2 minutes.

bbgeek17 · May 2, 2024

buchey said:
we're planning to enroll three new clusters:
1. 12 nodes
2. 13 nodes
3. 2 nodes

Ideally, the clusters should consist of odd number of nodes. That way you will not have a 6/6 or 1/1 split.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

gfngfn256 · May 2, 2024

buchey said:
we're planning to enroll three new clusters:
1. 12 nodes
2. 13 nodes
3. 2 nodes

Cluster 1 and 2 should cover HA residing its quorum. I'd like to fetch your opinions, where's best practice to place their quorum. First idea was to place quorum of cluster 1 in cluster 2 and vice-versa.

Both clusters are holding different networks and are standing on their own (no fencing etc.). We just need to know, if cluster 1 is good to go or we have to switch to cluster 2.

I fail to understand why you're trying to "mix clusters".
Just run each cluster on its own in HA (High availability) - shared storage & your good to go.

For example:

Cluster 1: Up to 5 nodes can go down & Proxmox will migrate the VMs to the other running nodes of cluster 1.
Cluster 2: Up to 6 nodes can go down & Proxmox will migrate the VMs to the other running nodes of cluster 2.
Cluster 3: Is a split-brain & you apparently don't care about it.

Maybe I've misunderstood something here about Cluster vs Node (in translation?).

buchey · May 3, 2024

aaron said:
That won't be possible. If the node on which the HA VM is running is losing the connection to the rest of the cluster for longer than 1 minute, then it will fence (hard reset, similar to pushing the reset button) to make sure it is powered off, before the (hopefully) remaining rest of the cluster will recover the VM. This will happen after about 2 minutes.

This means, in best-case the VMs on a broken host are for 2-3 minutes unavailable?

Just run each cluster on its own in HA (High availability) - shared storage & your good to go.

That's exactly what it's planned. We're just curious about serious split brain if several host will fail and transition to the other host / cluster fails.

bbgeek17 · May 3, 2024

buchey said:
We're just curious about serious split brain if several host will fail and transition to the other host / cluster fails.

Any split-brain is serious. To avoid problems different fencing technologies can be implemented:
- Node self-reboot on the loss of quorum. When it comes up it will not be able to achieve quorum and services will not be started.
- Hardware watchdog reboot for cases where the node is hung and OS/Kernel is not responsive.
- STONITH - shoot the other node in the head. It ensures that the disconnected node goes through a reboot and releases the services.

There is a chicken and an egg dilemma here: the more servers you add to protect against multi-node failure, the bigger the statistical chance of multi-node failure is.
As was mentioned before: in a 13-node cluster where 5 nodes failed, those 5 nodes know that they don't have a majority. Services will not be started, barring any software bug. Given that PVE employs Corosync, the leading clustering package in the world, I'd say chances are low.

Stay away from even-node clusters (6/6, 1/1) and you should be ok.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

Best-Practice where to place quorum in multi-node and multi-clusters

buchey

New Member

aaron

Proxmox Staff Member

leesteken

Distinguished Member

buchey

New Member

aaron

Proxmox Staff Member

bbgeek17

Distinguished Member

gfngfn256

Well-Known Member

buchey

New Member

bbgeek17

Distinguished Member