Node reboot causes other node reboot when in cluster

genesio

New Member
Jul 25, 2018
18
0
1
43
Hello,
I have a cluster set up for experimenting with Proxmox.

The cluster is composed of two nodes, pve1 and pve2.
I have created a cluster and the nodes are joined in this cluster.
pve1 is the master.

I have also set up a shared NFS storage and today I was experimenting live migrations.
I have successfully moved a running VM from pve1 to pve2 and then I wanted to restart pve1 (after a software upgrade)

I noticed that also pve2 node rebooted moments later.

I could not find any pointers in the docs to understand what I am missing.

Please note that I have tried to set up HA service but I am not sure I configured it correctly.
 
I noticed that also pve2 node rebooted moments later.

Sound like self-fencing. I assume you configured a HA resource? Please not that HA needs at leasth 3 nodes to avoid that behavior (HA does self- fencing if a node loose quorum).
 
I tried to set up HA (just to understand how it works) but I don't need this functionality at all.
How can I completely remove it?

This is the output of ha-manager status:

root@pve1:~# ha-manager status
quorum OK
master pve2 (idle, Thu Jul 26 12:30:41 2018)
lrm pve1 (idle, Fri Jul 27 15:15:06 2018)
lrm pve2 (idle, Fri Jul 27 15:15:06 2018)
service vm:101 (pve2, ignored)



And this is what happens if I try to disable it:

root@pve1:~# ha-manager set vm:101 --state disabled
update resource failed: error with cfs lock 'domain-ha': no such resource 'vm:101'
 
I am not sure it was already in "disabled" status when I restarted the node.
(a lot of trial and error on my side)

So I just restarted pve2 and pve1 did not automatically reboot.

How can I totally disable HA services?
I don't need them and I think it will be dangerous for my scenario

thank you for your help
 
I have exactly the same issue. Did you find a solution?

I want to disable HA in my cluster. Disabling HA for the VM and deleting it from "ressources" is not enough.
 
I'm afraid I did not find a "solution", it just didn't happen again
But in the meantime I added a third node, I think this had some implications
 
In my cluster there are 30 nodes. Maybe my issue is caused by other problems.
Thank you for your answer.