I've been using Proxmox on very diverse hardware for some years now. A couple of years ago I got an HPE Proliant Microserver gen 10. BIOS is up to date. Every time I launched a big rsync transfer (from local ZRAID-1 encrypted pool to local USB3 LUKS-encrypted disk), the system would have a kernel panic / CPU softlock and become unavailable.
I couldn't dedicate time to install/compile debug kernels and get more information on this but occasionally looked for information on this.
While researching this it seems this was a known bug around 2015 :
* https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837
«[...] We have been seeing random crashs from various HP systems, this has been tracked to loading of the hpwdt watchdog modules. Basically these modules are a loaded gun and unless you know exactly what you are doing you are likely to take off your own head. For this reason we already blacklist "all" of these modules in kmod/module-in-tools blacklists.
Unfortuantly these have not been kept in sync with the kernel leading to the module loading.[...]»
This is consistent with Proxmox apparently blacklisting the HP Ilo ILO watchdog module hpwdt. In my recent Proxmox install I only found the hpilo which I blacklisted. Since then I've been using rsync without problems - for a few hours now, I'll report back if I see any problems.
Relevant discussions here:
* https://forum.proxmox.com/threads/ve-4-0-kernel-panic-on-hp-proliant-servers.24015/
Also see this HPE advisory (2019):
* https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=emr_na-a00088210en_us
EDIT:
Still losing access to the system while doing a big rsync :
I couldn't dedicate time to install/compile debug kernels and get more information on this but occasionally looked for information on this.
While researching this it seems this was a known bug around 2015 :
* https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837
«[...] We have been seeing random crashs from various HP systems, this has been tracked to loading of the hpwdt watchdog modules. Basically these modules are a loaded gun and unless you know exactly what you are doing you are likely to take off your own head. For this reason we already blacklist "all" of these modules in kmod/module-in-tools blacklists.
Unfortuantly these have not been kept in sync with the kernel leading to the module loading.[...]»
This is consistent with Proxmox apparently blacklisting the HP Ilo ILO watchdog module hpwdt. In my recent Proxmox install I only found the hpilo which I blacklisted. Since then I've been using rsync without problems - for a few hours now, I'll report back if I see any problems.
Relevant discussions here:
* https://forum.proxmox.com/threads/ve-4-0-kernel-panic-on-hp-proliant-servers.24015/
Also see this HPE advisory (2019):
* https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=emr_na-a00088210en_us
EDIT:
Still losing access to the system while doing a big rsync :
Code:
Message from syslogd@pve at May 18 08:50:21 ...
kernel:[ 2680.256627] watchdog: BUG: soft lockup - CPU#1 stuck for 187s! [rsync:186673]
Message from syslogd@pve at May 18 08:50:41 ...
kernel:[ 2700.256701] watchdog: BUG: soft lockup - CPU#2 stuck for 152s! [pvescheduler:271554]
Message from syslogd@pve at May 18 08:50:41 ...
kernel:[ 2700.256701] watchdog: BUG: soft lockup - CPU#0 stuck for 152s! [pvescheduler:271555]
Message from syslogd@pve at May 18 08:50:49 ...
kernel:[ 2708.256731] watchdog: BUG: soft lockup - CPU#1 stuck for 213s! [rsync:186673]
Last edited: