[SOLVED] Proliant DL360e GEN8 freezes with kernel 5.4.44

rholighaus

Well-Known Member
Dec 15, 2016
97
8
48
61
Berlin
After rebooting, PVE completely stops working after some seconds when booted on the new kernel 5.4.44 - even on the console.
No panic, no nothing. Just frozen. Ping stops replying, too.

When booting on kernel 5.4.41, the system runs fine. Can I help debug? The mentioned system is our backup and synchronization target, so it's needed but not mission critical...

proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-7
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

I attach the output of an lshw as a text file.
 

Attachments

After rebooting, PVE completely stops working after some seconds when booted on the new kernel 5.4.44 - even on the console.
No panic, no nothing. Just frozen. Ping stops replying, too.
does the system finish booting up? - do you get a login prompt?
if not (or in any case) - try removing the 'quiet' from the kernel commandline - maybe this helps in identifying where the problem is.

One other option would be to configure remote syslogging via udp - sometimes the relevant syslog message still makes it to the syslog-server


I hope this helps!
 
The system boots up fine, I can even login if I am fast. Both on console or via ssh. But after a while - or maybe when traffic hits it - it just freezes.
I reverted to kernel 5.4.41 for now - there must be some change that triggers this behaviour, however...
 
I have an HP Proliant ML350 G6, kernel panic after about 1 day. now the 2nd one and because of this, also booted with 5.4.41-1-pve
The last 3 lines output of the console:
Code:
Kernel panic - not syncing: fatal excepetion in interrupt
Kernel offset [some hex values here]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
 
Last edited:
The last 3 lines output of the console:
The rest of the panic log would probably indicate where the error originated.

Apart from that - make sure to have installed the latest available BIOS/Firmware for the machine - this sometimes helps with issues like these
 
Hi Stoiko,

Unfortunately, HPE only offeres BIOS/Firmware upgrades if you have a support contract for the hardware.
5.4.41 works well, so there must be a change triggering it in 5.4.44. I haven't yet tried 5.4.44.2.

Is there a way to keep the machine from booting 5.4.44 until this problem is solved? So far I have to manually boot that kernel...