[SOLVED] Problems with watchdog-mux and systemd on some HPE Prolian DL360 Gen10

Jan 10, 2025
3
0
1
Dear Proxmox Community,

we are new users of the Proxmox VE solution and quite happy with everything at our beginner level usage. Now we want to experiment and test the HA usage. During these first steps we have one server on which the services pve-ha-crm and pve-ha-lrm failed after a couple of minutes because the watchdog-mux services was not running. These reason for it's failure to start is that systemd hoggs the device /dev/watchdog.

Code:
Nov 21 16:37:25 pve-cit1-hv-2-test systemd[1]: Started watchdog-mux.service - Proxmox VE watchdog multiplexer.
Nov 21 16:37:25 pve-cit1-hv-2-test systemd[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog
Nov 21 16:37:25 pve-cit1-hv-2-test systemd[1]: Watchdog running with a hardware timeout of 30s.
Nov 21 16:37:25 pve-cit1-hv-2-test watchdog-mux[2029]: watchdog open: Device or resource busy

Could You help me identify the cause for this behaviour?

In /etc/systemd/system.conf we have only this line active/not commented out:

Code:
/etc/systemd/system.conf
[Manager]
RuntimeWatchdogSec=30

I thought that it must be deviating BIOS settings but I found the same configuration looking with

Code:
ipmitool mc watchdog get


Watchdog Timer Use:     Reserved (0x00)
Watchdog Timer Is:      Stopped
Watchdog Timer Logging: On
Watchdog Timer Action:  No action (0x00)
Pre-timeout interrupt:  None
Pre-timeout interval:   0 seconds
Timer Expiration Flags: None (0x00)
Initial Countdown:      0.0 sec
Present Countdown:      0.0 sec

On the problematic system lsof /dev/watchdog shows:

Code:
sudo lsof /dev/watchdog
COMMAND PID USER FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd   1 root 59w   CHR 10,130      0t0  892 /dev/watchdog

on the workings systems

Code:
sudo lsof /dev/watchdog
COMMAND    PID USER FD   TYPE DEVICE SIZE/OFF NODE NAME
watchdog- 2096 root 3w   CHR 10,130      0t0  864 /dev/watchdog

Where could I find further configuration possibilities for watchdog-mux / systemd regarding /dev/watchdog?

Also: I found the information under https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x quite interesting. Is this still relevant for watchdog configurations? The newer docs are not that verbose regarding watchdog (https://pve.proxmox.com/wiki/High_Availability).

in anyway: thanks for Your attention and have a nice day, Y'all :)

Regards,
Martin

P.S. I will proceed search for the cause and I am happy to provide further informations.


Code:
proxmox-ve: 9.0.0 (running kernel: 6.14.11-4-pve)
pve-manager: 9.0.10 (running version: 9.0.10/deb1ca707ec72a89)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph: 19.2.3-pve2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
ifupdown2: 3.3.0-1+pmx10
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.16-1
proxmox-backup-file-restore: 4.0.16-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.0
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.2
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.12
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-2
pve-ha-manager: 5.0.4
pve-i18n: 3.6.0
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.22
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
 
the cause is the config file you referenced - you told systemd to own the watchdog, and that's why PVE can't.
 
I commented the line out of "/etc/systemd/system.conf" and watchdog-mux can use /dev/watchdog after reboot. Thank You for your input. Funny that the other systems, that should be configured identical, show different behavior. Ghost in the machine or smth :). I will update if I should find an explanation.