Hardware watchdog leads to reboots all the time

st6f9n · Jan 31, 2020

Hello,

I've configured the three nodes (Supermicro Server) of my Proxmox/Ceph Cluster (newest version) for using hardware watchdog
(https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#Hardware_Watchdogs):

1.) Enable watchdog in BIOS

2.) /etc/modprobe.d/ipmi_watchdog.conf:
options ipmi_watchdog action=power_cycle panic_wdt_timeout=10

3.) /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"

4.) /etc/default/pve-ha-manager:
WATCHDOG_MODULE=ipmi_watchdog

After that the three servers were rebooting all the time, so I have to undo these configuration steps.

Any ideas ?

Thanks

Stefan

Richard · Feb 5, 2020

We recommend to use iTCO watchdog, see more https://pve.proxmox.com/wiki/High_Availability#ha_manager_fencing

st6f9n · Feb 10, 2020

I'm new to HA and watchdogs and I do not really understand how to
determine which hardware watchdog my servers support. I thought
that in case of my Supermicro servers which have IPMI the IPMI watchdog
is the obvious solution.

st6f9n · Feb 10, 2020

Another question: There is a Proxmox Book which strongly recommends to set the BIOS feature "restore on ac power loss" to "Power on". In my opinion "Stay off" is the reasonable default in case of voltage fluctuations, UPS problems and so on. But I can't find the recommendation "Power on" in the Proxmox manual. Why ?

st6f9n · Feb 21, 2020

unholy · Jun 29, 2023

I tried different kernels, tried reboot=efi and none of these helped. I solved the problem by adding to /etc/default/grub the parameter nomodeset to the line GRUB_CMDLINE_LINUX_DEFAULT

Search

Search

Hardware watchdog leads to reboots all the time

st6f9n

Active Member

Richard

Renowned Member

st6f9n

Active Member

st6f9n

Active Member

st6f9n

Active Member

unholy

Member