IPMI watchdog not working

mo_

Member
Oct 27, 2011
399
3
18
Germany
Hey guys, long time no see, but I'm back with a fresh problem.

There's a new HA cluster I'm setting up and I am having issues with the IPMI watchdog on one of the three servers. The kern.log once every second says:

Apr 4 17:42:14 prox5 kernel: [ 5105.588943] IPMI Watchdog: response: Error d5 on cmd 22

From strace I got:

epoll_wait(5, {}, 10, 1000) = 0
ioctl(3, WDIOC_KEEPALIVE, 0) = -1 EINVAL (Invalid argument)
write(2, "watchdog update failed: Invalid "..., 41watchdog update failed:
Invalid argument
) = 41

These are all HP blades, this one being a slightly different version (can only get details about that tomorrow).

lsmod does not show hpwdt by the way even though I havent blacklisted it:


root@prox5:~# lsmod|grep hp
hpilo 20480 0
shpchp 36864 0
hpsa 98304 1
scsi_transport_sas 45056 1 hpsa

Turns out the error spam stops if I stop the watchdog-mux service but that also stops the ipmi watchdog from setting and resetting timers. Since watchdog-mux has no documentation I could find anywhere... how do I find out which watchdog other than ipmi it is trying to use thats not actually there? And how do I make it stop that...

Any hints are greatly appreciated.

/edit: forgot pveversion:
Code:
root@prox5:~# pveversion -v
proxmox-ve: 4.4-86 (running kernel: 4.4.49-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.10-1-pve: 4.4.10-54
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-97
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
 
Last edited:

mo_

Member
Oct 27, 2011
399
3
18
Germany
>These are all HP blades, this one being a slightly different version (can only get details about that tomorrow).

So... turns out theyre all HP BL460C Gen8s. CPUs are different tho. Problematic prox5 has: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Other 2 have: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
 

mo_

Member
Oct 27, 2011
399
3
18
Germany
things just got weirder still. This error spam from the IPMI watchdog *stops* if you ENable HP ASR (their own watchdog thingie) and then reboot:

Code:
hpasmcli -s "enable asr";reboot
The wiki says to DISable all other watchdogs that may be active but here you have to ENable the HP watchdog so that ipmi_watchdog then uses it instead of the actual IPMI systems? Im confused.
 
Last edited:

Alexander Barton

New Member
Nov 15, 2017
2
0
1
42
Freiburg, Germany
Were you able to find a solution or workaround? I'm seeing this error as well on one(!) out of 6 HP DL 360 Gen9 servers:
Code:
kernel: IPMI Watchdog: response: Error d5 on cmd 24
Versions:
Code:
proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
pve-kernel-4.13.13-5-pve: 4.13.13-38
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-20
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-6
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!