CLUSTER - Rebooting node2, loosing node1 too because rebooting too!

Oct 5, 2023
6
1
3
Good morning guys,

I don't understand why, but I have a cluster with 2 Proxmox 8.0.4.

When I'm trying to do maintenance on node2 (updates/reboot) either I switch node2 in “maintenance mode” and I reboot it, the first node is rebooting too after a minute, losing all communications and sadly all of our ONLINE PROD VM/CT.

Any ideas of what's going wrong?

EA: We have HA activated too.

Best regards,

Stephan
 
Last edited:
You either need a third node, a qdevice or Set the Quorum to one

pvecm expected 1

This Is dangerous because you can have a Split brain, but OK for maintenance
 
  • Like
Reactions: liberti
Hi,

this is a classic "split-brain" situation. With only two nodes, the cluster (aka. the remaining node) loses quorum and reboots due to fencing.

Please see the documentation about external vote support. You can also set the quorum to 1 as ubu mentioned, but the time the cluster is in this state should obviously kept as short as possible and can be dangerous, but can be done for maintenance reasons.
 
  • Like
Reactions: liberti
That's why it is recommended to have a third Proxmox node - or at least a qdevice, especially when using HA.

A cluster always needs majority (51%+) to work, or else the remaining nodes can't determine if they are isolated or in charge. With only two nodes, once one node is unreachable, the other node will not have majority anymore.

In a normal Proxmox cluster, not having majority results in not being able to change any configuration. But when you have HA resources enabled, then the remaining host will self-fence and thus reboot itself to prevent split-brain.
 
  • Like
Reactions: liberti
Hi,

this is a classic "split-brain" situation. With only two nodes, the cluster (aka. the remaining node) loses quorum and reboots due to fencing.

Please see the documentation about external vote support. You can also set the quorum to 1 as ubu mentioned, but the time the cluster is in this state should obviously kept as short as possible and can be dangerous, but can be done for maintenance reasons.
Thanks for the detailed answer, I am presently working to build/joined a third member node.
 
That's why it is recommended to have a third Proxmox node - or at least a qdevice, especially when using HA.

A cluster always needs majority (51%+) to work, or else the remaining nodes can't determine if they are isolated or in charge. With only two nodes, once one node is unreachable, the other node will not have majority anymore.

In a normal Proxmox cluster, not having majority results in not being able to change any configuration. But when you have HA resources enabled, then the remaining host will self-fence and thus reboot itself to prevent split-brain.
Thanks for the detailed answer, I am presently working to build/joined a third member node.
 
Good morning everyone,

After reading all the answers (thanks everyone), for stability like suggested by members/staffs, I'm trying to add a new third “external node” to my existing cluster.

I was able to connect them, (the third node), the node appears in the CLUSTER on Node #1 and #2, but always get the same "bug” certificate SSL error, connection timed out, etc. Not able to create a certificate /etc/..., access denied, bla bla .

Then I lose web access to the third node until I delete it from the CLUSTER and put it back in standalone mode.

Any Idea?

Best regards,

Stephane