Proxmox crashes > kernel 5.4.44-pve2 on starting vm or LXC

Glowsome

Renowned Member
Jul 25, 2017
185
74
93
52
The Netherlands
www.comsolve.nl
Dear all,
i am experiencing kernel panics since releases of kernel above version 5.4.44-pve2

Behaviour :

- update all packages ( including kernel)
- restart node
- node comes up fine/ communicates with cluster (4-node)
- node will kernel-panic after it starts either an LXC or VM guest
- issue seems to revolve around gfs2 package

Further detail :

- running a 4-node cluster ( 4x DL360Gen7 boxes)
- SAS shared storage (21TB)
- storage is GFS2 filesystem made available to Proxmox ( see my setup, as its been setup : https://forum.proxmox.com/threads/p...-lvm-lv-with-msa2040-sas-partial-howto.57536/ )
- Kernel 5.4.44-2-pve works ok, all higher will lead to a kernel-panic once you start a vm or LXC

so i'm kinda stuck .. i mean up untill mentioned kernel version i'm ok, and everything just works fine .... after that its a big mess... so where did this go bad?
 
Last edited:
@matrix as written above the 'latest' kernel is installed on system but if i boot from that we get the explained kernel panic.
If i however on my boot screen select the previous ( 5.4.44-2 kernel) then i'm running just fine.

pveversion -v proxmox-ve: 6.2-2 (running kernel: 5.4.44-2-pve) pve-manager: 6.2-12 (running version: 6.2-12/b287dd27) pve-kernel-5.4: 6.2-7 pve-kernel-helper: 6.2-7 pve-kernel-5.3: 6.1-6 pve-kernel-5.4.65-1-pve: 5.4.65-1 pve-kernel-5.4.44-2-pve: 5.4.44-2 pve-kernel-5.4.41-1-pve: 5.4.41-1 pve-kernel-5.3.18-3-pve: 5.3.18-3
 
@matrix Motivate that ? .. i mean booting into a selected kernel even tho old kernels are present wont solve the issues i am having.
Next to that i am not removing older kernels from which i am sure makes my setup work

i have enabled :
kernel.core_pattern = /var/crash/core.%t.%p kernel.panic = 10 kernel.unknown_nmi_panic = 1
in /etc/sysctl.conf

Still if i boot from latest kernel there is _nothing_ recorded ....
The node is free ( as in no VM or LXC is started) as soon as i start either a VM OR LXC the node coredumps.

So you will need to convince me really 100% before removing old kernels .... as that would mean i might end up on a non-working situation ....

So to re-run it for you :

- update was done ( meaning latest kernel is on the system)
- reboot is done to initialize the new kernel
- system boots up without issues (running latest kernel)
- cluster config is initialized ( dlm connect successfully to the other (3) nodes)

=> as long as i do not start either a VM OR LXC the node will keep running just fine.
However that would take away the whole purpose of having a proxmox host :p

=> As soon as i start either a VM OR LXC the box kernel-dumps .. or in simple terms it goes B00m !

=> if i on the loading of the box (after reset) select the lower kernel as mentioned above i end up with a working box as in OS ... and am able to launch/start VMs OR LXC containers ....

So again ... motivate your answer where removing 'old kernels' would make a difference in the described behavior as its making no sense to me at all ?
 
Last edited:
Update on the issue : Since i updated to 6.3.x the issue is no longer present :)

Good Job PM developers !

@matrix as you have not responded on my reply i posted back like 1.5 months ago i will assume you were unable to motivate your suggested action, which if i would have followed your course of action would probably ended me up in a non-functional situation as i described in previous posts.

I would urge you to reconsider posting such suggestions as they can have a huge impact on the poster's issues. - in my case your suggestion would have left me with an inoperable 4-node cluster with 30+ guests on it.
 
Last edited: