Best-Practice where to place quorum in multi-node and multi-clusters

buchey

New Member
May 2, 2024
3
0
1
Hello folks,

we're planning to enroll three new clusters:
1. 12 nodes
2. 13 nodes
3. 2 nodes

Cluster 1 and 2 should cover HA residing its quorum. I'd like to fetch your opinions, where's best practice to place their quorum. First idea was to place quorum of cluster 1 in cluster 2 and vice-versa.

Both clusters are holding different networks and are standing on their own (no fencing etc.). We just need to know, if cluster 1 is good to go or we have to switch to cluster 2.

Cluster 3 is standing for itself, but could possible host the quorum.

Any opinion on this is appreciated.
 
I am a bit confused. The large clusters should be fine regarding quorum, up until 6 nodes are failed.

The only cluster that really needs an external QDevice is cluster 3 to get up to 3 votes, and therefore maintaining a majority if one of the two nodes is down. And the external part of the QDevice (corosync-qnetd) needs to run on hardware different than cluster 3.
 
  • Like
Reactions: Kingneutron
Cluster 3 is standing for itself, but could possible host the quorum.
Don't run two-node clusters, as they automatically lose quorum when one of the nodes goes down. Lots of threads here on this forum about issues with only two nodes.
Any opinion on this is appreciated.
What do yo mean by "where to place quorum"? Maybe it's a language difference, but quorum not a thing that can be placed. Quorum is achieved by having more than half of the votes of the nodes: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum
 
  • Like
Reactions: Kingneutron
Don't run two-node clusters, as they automatically lose quorum when one of the nodes goes down. Lots of threads here on this forum about issues with only two nodes.
Noted already and came along those threads. This is just for a small workload and our customer takes the risk.

What do yo mean by "where to place quorum"? Maybe it's a language difference, but quorum not a thing that can be placed. Quorum is achieved by having more than half of the votes of the nodes: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum
Probably a language thing, yes. Referencing "more than half of the votes", that's the reason why to have 2*x+1 nodes, right?

What we'd like to prevent is a split-brain, due the (partitial) loss of quorum. Maybe a short example helps:

Cluster 1 is hosting 5 VMs of an app, Cluster 2 is hosting 5 VMs of an app. The clusters are "on their own", but they share the same networks so this 10 VMs are able to communicate.

If several hosts fail in cluster 1, we assume, the VMs are moved to other hosts. But what, if the "failed" hosts doesn't know, they're failed and still running the VM and we got the same VM twice running.

How can this be prevented? Is a quorum disk (in the background there'll be a big SAN) an option?
 
If several hosts fail in cluster 1, we assume, the VMs are moved to other hosts. But what, if the "failed" hosts doesn't know, they're failed and still running the VM and we got the same VM twice running.
That won't be possible. If the node on which the HA VM is running is losing the connection to the rest of the cluster for longer than 1 minute, then it will fence (hard reset, similar to pushing the reset button) to make sure it is powered off, before the (hopefully) remaining rest of the cluster will recover the VM. This will happen after about 2 minutes.
 
  • Like
Reactions: Kingneutron
we're planning to enroll three new clusters:
1. 12 nodes
2. 13 nodes
3. 2 nodes

Cluster 1 and 2 should cover HA residing its quorum. I'd like to fetch your opinions, where's best practice to place their quorum. First idea was to place quorum of cluster 1 in cluster 2 and vice-versa.

Both clusters are holding different networks and are standing on their own (no fencing etc.). We just need to know, if cluster 1 is good to go or we have to switch to cluster 2.

I fail to understand why you're trying to "mix clusters".
Just run each cluster on its own in HA (High availability) - shared storage & your good to go.

For example:

Cluster 1: Up to 5 nodes can go down & Proxmox will migrate the VMs to the other running nodes of cluster 1.
Cluster 2: Up to 6 nodes can go down & Proxmox will migrate the VMs to the other running nodes of cluster 2.
Cluster 3: Is a split-brain & you apparently don't care about it.

Maybe I've misunderstood something here about Cluster vs Node (in translation?).
 
Last edited:
That won't be possible. If the node on which the HA VM is running is losing the connection to the rest of the cluster for longer than 1 minute, then it will fence (hard reset, similar to pushing the reset button) to make sure it is powered off, before the (hopefully) remaining rest of the cluster will recover the VM. This will happen after about 2 minutes.

This means, in best-case the VMs on a broken host are for 2-3 minutes unavailable?


Just run each cluster on its own in HA (High availability) - shared storage & your good to go.
That's exactly what it's planned. We're just curious about serious split brain if several host will fail and transition to the other host / cluster fails.
 
We're just curious about serious split brain if several host will fail and transition to the other host / cluster fails.
Any split-brain is serious. To avoid problems different fencing technologies can be implemented:
- Node self-reboot on the loss of quorum. When it comes up it will not be able to achieve quorum and services will not be started.
- Hardware watchdog reboot for cases where the node is hung and OS/Kernel is not responsive.
- STONITH - shoot the other node in the head. It ensures that the disconnected node goes through a reboot and releases the services.

There is a chicken and an egg dilemma here: the more servers you add to protect against multi-node failure, the bigger the statistical chance of multi-node failure is.
As was mentioned before: in a 13-node cluster where 5 nodes failed, those 5 nodes know that they don't have a majority. Services will not be started, barring any software bug. Given that PVE employs Corosync, the leading clustering package in the world, I'd say chances are low.

Stay away from even-node clusters (6/6, 1/1) and you should be ok.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!