Hardware watchdog leads to reboots all the time

st6f9n

Active Member
Feb 15, 2019
23
1
43
Hello,

I've configured the three nodes (Supermicro Server) of my Proxmox/Ceph Cluster (newest version) for using hardware watchdog
(https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#Hardware_Watchdogs):

1.) Enable watchdog in BIOS

2.) /etc/modprobe.d/ipmi_watchdog.conf:
options ipmi_watchdog action=power_cycle panic_wdt_timeout=10

3.) /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"

4.) /etc/default/pve-ha-manager:
WATCHDOG_MODULE=ipmi_watchdog

After that the three servers were rebooting all the time, so I have to undo these configuration steps.

Any ideas ?

Thanks

Stefan
 
I'm new to HA and watchdogs and I do not really understand how to
determine which hardware watchdog my servers support. I thought
that in case of my Supermicro servers which have IPMI the IPMI watchdog
is the obvious solution.
 
Another question: There is a Proxmox Book which strongly recommends to set the BIOS feature "restore on ac power loss" to "Power on". In my opinion "Stay off" is the reasonable default in case of voltage fluctuations, UPS problems and so on. But I can't find the recommendation "Power on" in the Proxmox manual. Why ?
 
I tried different kernels, tried reboot=efi and none of these helped. I solved the problem by adding to /etc/default/grub the parameter nomodeset to the line GRUB_CMDLINE_LINUX_DEFAULT
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!