Kernel panic when transferring several TB of data with rsync -avz on HPE Proliant Microserver

I've been using Proxmox on very diverse hardware for some years now. A couple of years ago I got an HPE Proliant Microserver gen 10. BIOS is up to date. Every time I launched a big rsync transfer (from local ZRAID-1 encrypted pool to local USB3 LUKS-encrypted disk), the system would have a kernel panic / CPU softlock and become unavailable.
I couldn't dedicate time to install/compile debug kernels and get more information on this but occasionally looked for information on this.

While researching this it seems this was a known bug around 2015 :

* https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837

«[...] We have been seeing random crashs from various HP systems, this has been tracked to loading of the hpwdt watchdog modules. Basically these modules are a loaded gun and unless you know exactly what you are doing you are likely to take off your own head. For this reason we already blacklist "all" of these modules in kmod/module-in-tools blacklists.
Unfortuantly these have not been kept in sync with the kernel leading to the module loading.[...]»

This is consistent with Proxmox apparently blacklisting the HP Ilo ILO watchdog module hpwdt. In my recent Proxmox install I only found the hpilo which I blacklisted. Since then I've been using rsync without problems - for a few hours now, I'll report back if I see any problems.

Relevant discussions here:

* https://forum.proxmox.com/threads/ve-4-0-kernel-panic-on-hp-proliant-servers.24015/

Also see this HPE advisory (2019):

* https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=emr_na-a00088210en_us

EDIT:
Still losing access to the system while doing a big rsync :

Code:
Message from syslogd@pve at May 18 08:50:21 ...
 kernel:[ 2680.256627] watchdog: BUG: soft lockup - CPU#1 stuck for 187s! [rsync:186673]

Message from syslogd@pve at May 18 08:50:41 ...
 kernel:[ 2700.256701] watchdog: BUG: soft lockup - CPU#2 stuck for 152s! [pvescheduler:271554]

Message from syslogd@pve at May 18 08:50:41 ...
 kernel:[ 2700.256701] watchdog: BUG: soft lockup - CPU#0 stuck for 152s! [pvescheduler:271555]

Message from syslogd@pve at May 18 08:50:49 ...
 kernel:[ 2708.256731] watchdog: BUG: soft lockup - CPU#1 stuck for 213s! [rsync:186673]
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!