Node reboot causes other node reboot when in cluster

genesio · Jul 26, 2018

Hello,
I have a cluster set up for experimenting with Proxmox.

The cluster is composed of two nodes, pve1 and pve2.
I have created a cluster and the nodes are joined in this cluster.
pve1 is the master.

I have also set up a shared NFS storage and today I was experimenting live migrations.
I have successfully moved a running VM from pve1 to pve2 and then I wanted to restart pve1 (after a software upgrade)

I noticed that also pve2 node rebooted moments later.

I could not find any pointers in the docs to understand what I am missing.

Please note that I have tried to set up HA service but I am not sure I configured it correctly.

dietmar · Jul 27, 2018

genesio said:
I noticed that also pve2 node rebooted moments later.

Sound like self-fencing. I assume you configured a HA resource? Please not that HA needs at leasth 3 nodes to avoid that behavior (HA does self- fencing if a node loose quorum).

genesio · Jul 27, 2018

I tried to set up HA (just to understand how it works) but I don't need this functionality at all.
How can I completely remove it?

This is the output of ha-manager status:

root@pve1:~# ha-manager status
quorum OK
master pve2 (idle, Thu Jul 26 12:30:41 2018)
lrm pve1 (idle, Fri Jul 27 15:15:06 2018)
lrm pve2 (idle, Fri Jul 27 15:15:06 2018)
service vm:101 (pve2, ignored)

And this is what happens if I try to disable it:

root@pve1:~# ha-manager set vm:101 --state disabled
update resource failed: error with cfs lock 'domain-ha': no such resource 'vm:101'

dietmar · Jul 27, 2018

seems its already disabled

genesio · Jul 27, 2018

I am not sure it was already in "disabled" status when I restarted the node.
(a lot of trial and error on my side)

So I just restarted pve2 and pve1 did not automatically reboot.

How can I totally disable HA services?
I don't need them and I think it will be dangerous for my scenario

thank you for your help

Sascha72036 · Nov 25, 2019

I have exactly the same issue. Did you find a solution?

I want to disable HA in my cluster. Disabling HA for the VM and deleting it from "ressources" is not enough.

genesio · Nov 25, 2019

I'm afraid I did not find a "solution", it just didn't happen again
But in the meantime I added a third node, I think this had some implications

Sascha72036 · Nov 25, 2019

In my cluster there are 30 nodes. Maybe my issue is caused by other problems.
Thank you for your answer.

Search

Search

Node reboot causes other node reboot when in cluster

genesio

New Member

dietmar

Proxmox Staff Member

genesio

New Member

dietmar

Proxmox Staff Member

genesio

New Member

Sascha72036

Renowned Member

genesio

New Member

Sascha72036

Renowned Member

We value your privacy