How to disable reboot in HA cluster?

Nov 19, 2018
4
0
6
Hi,
I've had some problem on the network and suddenly one of my 7 host rebooted because of HA.
This is the standad behavior, but tu understand what is going on I'd like to keep HA running and disable the standar reboot procedure as in

When a cluster member determines that it is no longer in the cluster quorum, the LRM waits for a new quorum to form. As long as there is no quorum the node cannot reset the watchdog. This will trigger a reboot after the watchdog then times out, this happens after 60 seconds.

Is there a way to do this?
Thank you
 
Did you tried to stop watchdog-mux service? But maybe when it stopped, its the same and restart the node because it not receiving trigger.
 
This is a bad advise. If you stop the watchdog, you can run into a split brain situation. The reboot is necessary to avid that...
 
how do I stop watchdog-mux service? I found no info abuot that.
Anyway what I'd like to reach is to change the default reboot command to someting else like a log line or an email, is it possible to do that?
 
If I stop these services I loose HA, al the VM on that node go into freeze state.....waiting for the rebbot.
But what I need for debug purposes is :
HA running and
in place of the reboot command I'd like to write someting to syslog.

Which command use proxmox to restart the node? It's poossible to bypass it?

Thank you again
 
As dietmar said, you are shooting yourself in the foot, if not in the head. (STONITH reference here :).
What will happen if two of the same VMs run at the same time? Total f*****g clusterf**k.

Again do not disable internal watchdog when using HA, if you value data at all.

Rather create redundant cluster communications channels, either on network level (MC LAG with LACP) or channel bonding or even better inside corosync itself: https://pve.proxmox.com/wiki/Separate_Cluster_Network . Just buy a cheap switch, add network cards, and connect all nodes via it also and then add another corosync communication ring. Also do not use that network for anything else. This way both networks have to go dow, before your nodes start to reboot.
 
Got it, I won't debug on the production cluster at all.
So the final answer is: it's not possible to disable the reboot.
Thank everybody
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!