Hey guys, long time no see, but I'm back with a fresh problem.
There's a new HA cluster I'm setting up and I am having issues with the IPMI watchdog on one of the three servers. The kern.log once every second says:
Apr 4 17:42:14 prox5 kernel: [ 5105.588943] IPMI Watchdog: response: Error d5 on cmd 22
From strace I got:
epoll_wait(5, {}, 10, 1000) = 0
ioctl(3, WDIOC_KEEPALIVE, 0) = -1 EINVAL (Invalid argument)
write(2, "watchdog update failed: Invalid "..., 41watchdog update failed:
Invalid argument
) = 41
These are all HP blades, this one being a slightly different version (can only get details about that tomorrow).
lsmod does not show hpwdt by the way even though I havent blacklisted it:
root@prox5:~# lsmod|grep hp
hpilo 20480 0
shpchp 36864 0
hpsa 98304 1
scsi_transport_sas 45056 1 hpsa
Turns out the error spam stops if I stop the watchdog-mux service but that also stops the ipmi watchdog from setting and resetting timers. Since watchdog-mux has no documentation I could find anywhere... how do I find out which watchdog other than ipmi it is trying to use thats not actually there? And how do I make it stop that...
Any hints are greatly appreciated.
/edit: forgot pveversion:
There's a new HA cluster I'm setting up and I am having issues with the IPMI watchdog on one of the three servers. The kern.log once every second says:
Apr 4 17:42:14 prox5 kernel: [ 5105.588943] IPMI Watchdog: response: Error d5 on cmd 22
From strace I got:
epoll_wait(5, {}, 10, 1000) = 0
ioctl(3, WDIOC_KEEPALIVE, 0) = -1 EINVAL (Invalid argument)
write(2, "watchdog update failed: Invalid "..., 41watchdog update failed:
Invalid argument
) = 41
These are all HP blades, this one being a slightly different version (can only get details about that tomorrow).
lsmod does not show hpwdt by the way even though I havent blacklisted it:
root@prox5:~# lsmod|grep hp
hpilo 20480 0
shpchp 36864 0
hpsa 98304 1
scsi_transport_sas 45056 1 hpsa
Turns out the error spam stops if I stop the watchdog-mux service but that also stops the ipmi watchdog from setting and resetting timers. Since watchdog-mux has no documentation I could find anywhere... how do I find out which watchdog other than ipmi it is trying to use thats not actually there? And how do I make it stop that...
Any hints are greatly appreciated.
/edit: forgot pveversion:
Code:
root@prox5:~# pveversion -v
proxmox-ve: 4.4-86 (running kernel: 4.4.49-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.10-1-pve: 4.4.10-54
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-97
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
Last edited: