Node reboot causes other node reboot when in cluster

genesio

New Member
Jul 25, 2018
18
0
1
42
Hello,
I have a cluster set up for experimenting with Proxmox.

The cluster is composed of two nodes, pve1 and pve2.
I have created a cluster and the nodes are joined in this cluster.
pve1 is the master.

I have also set up a shared NFS storage and today I was experimenting live migrations.
I have successfully moved a running VM from pve1 to pve2 and then I wanted to restart pve1 (after a software upgrade)

I noticed that also pve2 node rebooted moments later.

I could not find any pointers in the docs to understand what I am missing.

Please note that I have tried to set up HA service but I am not sure I configured it correctly.
 
I noticed that also pve2 node rebooted moments later.

Sound like self-fencing. I assume you configured a HA resource? Please not that HA needs at leasth 3 nodes to avoid that behavior (HA does self- fencing if a node loose quorum).
 
I tried to set up HA (just to understand how it works) but I don't need this functionality at all.
How can I completely remove it?

This is the output of ha-manager status:

root@pve1:~# ha-manager status
quorum OK
master pve2 (idle, Thu Jul 26 12:30:41 2018)
lrm pve1 (idle, Fri Jul 27 15:15:06 2018)
lrm pve2 (idle, Fri Jul 27 15:15:06 2018)
service vm:101 (pve2, ignored)



And this is what happens if I try to disable it:

root@pve1:~# ha-manager set vm:101 --state disabled
update resource failed: error with cfs lock 'domain-ha': no such resource 'vm:101'
 
I am not sure it was already in "disabled" status when I restarted the node.
(a lot of trial and error on my side)

So I just restarted pve2 and pve1 did not automatically reboot.

How can I totally disable HA services?
I don't need them and I think it will be dangerous for my scenario

thank you for your help
 
I have exactly the same issue. Did you find a solution?

I want to disable HA in my cluster. Disabling HA for the VM and deleting it from "ressources" is not enough.
 
I'm afraid I did not find a "solution", it just didn't happen again
But in the meantime I added a third node, I think this had some implications
 
In my cluster there are 30 nodes. Maybe my issue is caused by other problems.
Thank you for your answer.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!