Why does 3 3-node HA cluster die when 2 nodes are down?

Mayank006

Member
Dec 6, 2023
56
0
6
I am using Ceph as a shared storage on all 3 servers in an HA system. I powered off 2 servers (i.e. servers 2 & 3) out of 3.

I was expecting that all VMs will be migrated to the 1st running server. It happened that server 1 was non-responsive for 1-2 min and then I could access the proxmox on server 1, but none of the VMs were running. Even the VMs already present on server 1 are dead.
 
I am using Ceph as a shared storage on all 3 servers in an HA system. I powered off 2 servers (i.e. servers 2 & 3) out of 3.

I was expecting that all VMs will be migrated to the 1st running server.
That's not how Proxmox works: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum . The single server does not find quorum and assumes it's part of the non-functioning part of the cluster (and becomes read-only after a reboot, I believe).

EDIT: Always keep more than half of the nodes running. This also always bites people with two-node clusters on this forum.
 
Last edited: