Question on the principle of HA operation.

Maksimus

Member
May 16, 2022
78
3
13
Theoretical situation that can happen in practice.There is a cluster of 5 nodes with replication of 4 nodes to 1 node, the situation is that the rack with 4 nodes is turned off by accident.1. How does the remaining node behave?2. We manually start all the virtual machines on the remaining node, some time passes and the previously turned off 4 nodes start (they remember that VMs were running on them) what will happen to the VMs on these 4 nodes and what will happen to the VMs on the 5th node?
 
Theoretical situation that can happen in practice.There is a cluster of 5 nodes with replication of 4 nodes to 1 node

EDIT: Without the High Availablity on...

, the situation is that the rack with 4 nodes is turned off by accident.1. How does the remaining node behave?

It will remain inquorate (if you lose 3+ nodes in cluster of 5), the remaining nodes are basically stuck where they were at the moment of the quorum lost.

2. We manually start all the virtual machines on the remaining node,

You would have to override the no quorum situation first to be able to start them manually.

some time passes and the previously turned off 4 nodes start (they remember that VMs were running on them) what will happen to the VMs on these 4 nodes and what will happen to the VMs on the 5th node?

The only situation in which this could happen is if you artificially set the quorum on the sole node to 1. Then the remaining nodes as they start up will reach quorum (once the 3rd one is on). I am not sure if the sole node would be able to rejoin the cluster at that moment given its state (might be not deterministic). If not, you have split situation in which you risk the VMs starting up additionally on those 4 nodes while they remain running of the sole 5th.

EDIT: However, with High Availability feature on, you would end up with 4 nodes down and 1 constantly rebooting (looking for the opportunity to get quorum upon next reboot).
 
Last edited:
EDIT: Without the High Availablity on...



It will remain inquorate (if you lose 3+ nodes in cluster of 5), the remaining nodes are basically stuck where they were at the moment of the quorum lost.



You would have to override the no quorum situation first to be able to start them manually.



The only situation in which this could happen is if you artificially set the quorum on the sole node to 1. Then the remaining nodes as they start up will reach quorum (once the 3rd one is on). I am not sure if the sole node would be able to rejoin the cluster at that moment given its state (might be not deterministic). If not, you have split situation in which you risk the VMs starting up additionally on those 4 nodes while they remain running of the sole 5th.

EDIT: However, with High Availability feature on, you would end up with 4 nodes down and 1 constantly rebooting (looking for the opportunity to get quorum upon next reboot).

it turns out that in order to have a quorum when I lose 4 nodes, I need to add at least 5 witnesses to the cluster? Then I have 10 quorum members, when I lose 4 nodes, I will have 1 node and 5 witnesses, a total of 6 out of 10 quorum members, that is, there will be a quorum and the HA will work normally and start all the VMs on the 5th node.

One more question. Now the RAM capacity of 4 nodes gives a total of 1.5 TB, but the 5th node has a capacity of only 512 GB. Ballooning is enabled on all of them, how will the HA decide who to give priority to or will they all start and whoever has enough memory will work, whoever does not, will get an error?
 
it turns out that in order to have a quorum when I lose 4 nodes, I need to add at least 5 witnesses to the cluster?

PVE does not have the concept of "witness" nodes. All nodes are equivalent and each is a voter in the quorum. You may have some low hardware node. There's a bit of an exception in the form of a quorum device [1], but that's a different concept and you only have one at a time for (any) cluster, if you you want to have one. It would however (ordinarily) do nothing for a regular (out of the box) setup of PVE - 4 nodes down in a total of 5 (with or without additional QD) would deem your 1 node inquorate.

Then I have 10 quorum members, when I lose 4 nodes, I will have 1 node and 5 witnesses, a total of 6 out of 10 quorum members, that is, there will be a quorum and the HA will work normally and start all the VMs on the 5th node.

So suppose you have 10 nodes (explanation see above), then indeed with 4 nodes down you are left with 6 which is just 1 short of losing quorum. The HA will work in a quorate segment of the cluster, however, you may want to read more on the HA stack of PVE, in particular the scheduler [2]. You will have to make sure to define cluster group that it places your VMs on not to include your "dummy" nodes (with presumably hardware not able to support them).

I am not sure how you plan about running those extra 5 nodes, but I have the hunch they would e.g. VMs somewhere which is not what you want to really do, at least not in my opinion.

There are other options to go about quorum (without the above mentioned gymnastics), but I doubt PVE staff will want to provide support on those. Namely, there's corosync setups where you can have dynamically changing what constitutes a quorum, look for last_man_standing [3]:

The general behaviour of votequorum is to set expected_votes and quorum at startup (unless modified by the user at runtime, see below) and use those values during the whole lifetime of the cluster.

Using for example an 8 node cluster where each node has 1 vote, expected_votes is set to 8 and quorum to 5. This condition allows a total failure of 3 nodes. If a 4th node fails, the cluster becomes inquorate and it will stop providing services.

Enabling LMS allows the cluster to dynamically recalculate expected_votes and quorum under specific circumstances. It is essential to enable WFA when using LMS in High Availability clusters.

Using the above 8 node cluster example, with LMS enabled the cluster can retain quorum and continue operating by losing, in a cascade fashion, up to 6 nodes with only 2 remaining active.

Example chain of events:

1) cluster is fully operational with 8 nodes.
(expected_votes: 8 quorum: 5)
2) 3 nodes die, cluster is quorate with 5 nodes.
3) after last_man_standing_window timer expires,
expected_votes and quorum are recalculated.
(expected_votes: 5 quorum: 3)
4) at this point, 2 more nodes can die and
cluster will still be quorate with 3.
5) once again, after last_man_standing_window
timer expires expected_votes and quorum are
recalculated.
(expected_votes: 3 quorum: 2)
6) at this point, 1 more node can die and
cluster will still be quorate with 2.
7) one more last_man_standing_window timer
(expected_votes: 2 quorum: 2)
NOTES: In order for the cluster to downgrade automatically from 2 nodes to a 1 node cluster, the auto_tie_breaker feature must also be enabled (see below). If auto_tie_breaker is not enabled, and one more failure occurs, the remaining node will not be quorate.

It works reasonably well, but you need to heed those pieces of advice on "wait for all" and "auto tie breaker" features as well for your case.


One more question. Now the RAM capacity of 4 nodes gives a total of 1.5 TB, but the 5th node has a capacity of only 512 GB. Ballooning is enabled on all of them, how will the HA decide who to give priority to or will they all start and whoever has enough memory will work, whoever does not, will get an error?

It's pretty primitive, you can read on it more in the linked docs [2] - you may want to exclude such from a group.

EDIT: Worse yet, I just noticed the Static-Load Scheduler is still considered a "technology preview".

[1] https://manpages.debian.org/bookworm/corosync-qnetd/corosync-qnetd.8.en.html
[2] https://pve.proxmox.com/wiki/High_Availability#ha_manager_crs
[3] https://manpages.debian.org/bookworm/corosync/votequorum.5.en.html
 
Last edited: