Cluster stop running after one of the member die.

nwongrat

Member
Feb 16, 2023
34
0
6
I just tried to use clustering without HA. Just add 2 servers in the same basic cluster. I did not do any other configuration. Then one of them die. The other server was running as usual until I restart them.

After I restart the remaining server. It could not be started. Showed error about looking for the dying server. I could not remember what was the exact error. In breif, I could not turn on the other server anymore and I have to re-install proxmox for those 2 servers. After that, I kinda fear of doing clustering.

Basic, question. If any of the member in the cluster die, is there a way to prevent the problem I mention above?

Thank for your help.
 
You need minimum of 3 members to get quorum.

With 2 it works until it does not.
That was what I afraid of. Is there a way to remove the remaining server from cluster without re-installing proxmox like I did.

Thank you for your reply.
 
You need minimum of 3 members to get quorum.

With 2 it works until it does not.

Correct me if I am wrong. In my case I have totally 4 nodes. If it goes down 2 of them. The other 2 can still function. However, if they were down 3 of them, just 1 remaining. It will work until I shut down or restart. Then it will not wake up again. Am I correct?

If I was correct. Can I assume that, try my best to have at least 2 nodes alive.
 
You need minimum 51% (exactly anything over 50%) remaining nodes, otherwise the cluster stops.
2/3=0,66
2/4=0,5 X (this is exactly 50% and NOT over 50%, so same as 1/2 as you had)
2/5=0,4 X
3/4=0,75
3/5=0,6

Then it will not wake up again. Am I correct?
It will wake up again if you have over 50%, regardless in which order you reboot the nodes. So yes, if you have 4 nodes and 2 remaining it stops. It will wake up again if a (any) third nodes comes back online.
 
Last edited:
I guessed, the best way to do, I will have to wait until I have 6 nodes.

Thanks a lot for your help.
 
try my best to have at least 2 nodes alive.
This only works with 3 nodes -> 2/3
With more nodes you never come over 50%, because 2/6=0,33 X

Thanks a lot for your help.
;)

Think about the other way round, how many should be allowed to die.
One dead node needs 3 in total
Two dead nodes need 5 in total
Three dead nodes need 7 in total
...9,11,13
In all these examples the cluster keeps running!
 
Last edited:
What if, I installed Proxmox and VM to separate disk. When I do have to reinstall proxmox. Will the VMs come back without reinstall VMs? In case that I did not do the back up.
 
With 3 nodes you can set ceph on top and with HA you can then set that if a node goes down, the vms will be bootet on another node.
 
With 3 nodes you can set ceph on top and with HA you can then set that if a node goes down, the vms will be bootet on another node.
I understand that, however, I would not dare to touch HA and ceph with my tiny knowledge for now. Basically, I just need one console for all 4 nodes that was why I tried clustering. If in case that when node die and I need to re-install proxmox. If it could be done just proxmox (not the VM) it would be ok for now. Otherwise, I will stay single node until I am stronger.

After that day, The day that I have to reinstall everything. I kinda fear for trying something new without enough knowledge on that. I was just done setup everything after migrate from ESXi. Then.......xxxxxx happen.... :(

Thanks you.
 
The shutdown of the remaining nodes in case that they have no majority it's just a security feature to avoid VM's running twice, in case that a cluster is broken. If you know that this is not the case, you can just decrease the number of expected nodes with "pvecm expected 1" and even a single node will stay alive.
Another option would be to disable the HA services (pve-ha-lrm, pve-ha-crm) to avoid the fencing
 
The shutdown of the remaining nodes in case that they have no majority it's just a security feature to avoid VM's running twice, in case that a cluster is broken. If you know that this is not the case, you can just decrease the number of expected nodes with "pvecm expected 1" and even a single node will stay alive.
Another option would be to disable the HA services (pve-ha-lrm, pve-ha-crm) to avoid the fencing
Thanks, I will try it.