Proxmox Kernel 6.8.12-2 Freezes (again)

Decco1337

Member
Apr 12, 2023
42
3
8
Hello there,

I’ve updated to Proxmox Kernel 6.8.12-2 and have freezes again on my servers. With 6.8.12.-1 it was nearly stable (just one server had issues).

Unfortunately nothing in the logs, the server is dead and need to be rebooted manually.

Anyone with the same experience? Any further idea to fix that issue to 100%?

This solution https://www.thomas-krenn.com/de/wiki/Known_Issues_Proxmox_VE_8.2 fixed the issue to 99%, as I said one host had an issue. I replaced it and after a week the issue came back.

Now with the newest kernel the issue occurred on multiple updated hosts.
 
  • Like
Reactions: boxcee
I did the same tweaks from Thomas Krenn. And hope it fixes it. Perhaps now, that kernel 6.11 is out, there is a chance that 6.10 comes to Proxmox.
Hopefully! Honestly I am very careful right now with the kernel updates. Just upate the kernel and the hosts are going to freeze or crash again...
 
I observed the same behavior at one customer of mine, is there a tracking ticket at https://bugzilla.proxmox.com/ already (I couldn't find any), where one could provide further information that might be useful for Proxmox team/developers?
Thanks for sharing your experience! As far as I know there is no ticket yet. Unfortunately also a bit tricky as I have no issues in the logs...
 
Quiet unbelievable as we didn't observed any freezes (or other bugs) with any 6.8 kernel version on our 4 (dell t620/T430) pve's yet, actual running 6.8.12-2.
 
  • Like
Reactions: Decco1337
Quiet unbelievable as we didn't observed any freezes (or other bugs) with any 6.8 kernel version on our 4 (dell t620/T430) pve's yet, actual running 6.8.12-2.
Which CPU type do you use for the VMs, are you running Windows VMs? I found out that it also depends which CPU type you use and if you have Windows VMs. Also nested virt made issues with the kernel. But I also had these issues with the 6.5 kernel.

I am also sure that it cannot be a hardware issue as I use multiple servers and already replaced a lot :D I also have no strange configs on the servers, just fresh installed Proxmox, the changes from Thomas Krenn (https://www.thomas-krenn.com/de/wiki/Known_Issues_Proxmox_VE_8.2) and VMs.

Also running a bit more than 4 servers, I also have a huge amount of servers which don't have any issues.
 
Last edited:
Puuh, >50 vm's/lxc's, mostly default cpu unchanged and some have host selected and yes about 7x win (server+10+11) included. There were some nested virt tests but these are off yet but don't produce errors/problems that time when running.
 
There are many threads about Proxmox VE crashes with kernel 6.8 in the last months.

We operate several thousand Proxmox VE systems and had an outage rate of approx. 10% of the systems with kernel 6.8.
The crashes are not specifically reproducible. Regardless of the load, the barebone (several HPE, Lenovo and Supermicro servers in use) or specific VMs (it needs always a group of VMs, doesn't matter which ones).
Some systems crashed several times a day, others only once a month. Some hosts had no crashes at all.
The only commonality: AMD Epyc and Genoa CPUs

We tried various configurations for ZFS, memory management, process management, power efficiency modes, KVM CPU models, with/without hyperthreading, etc. Nothing helped.

In the end, we took the 6.11 kernel from the Ubuntu repo (Proxmox is based on the Ubuntu kernel anyway) and since then the systems have been stable.
The 6.8 kernel definitely has a problem.

@proxmox Team
You should seriously consider retiring the 6.8 kernel and switching to 6.11.
 
Which makes sense as we just have intel cpus ... so we cannot see crashes with amd's on 6.8.*.
 
There are many threads about Proxmox VE crashes with kernel 6.8 in the last months.

We operate several thousand Proxmox VE systems and had an outage rate of approx. 10% of the systems with kernel 6.8.
The crashes are not specifically reproducible. Regardless of the load, the barebone (several HPE, Lenovo and Supermicro servers in use) or specific VMs (it needs always a group of VMs, doesn't matter which ones).
Some systems crashed several times a day, others only once a month. Some hosts had no crashes at all.
The only commonality: AMD Epyc and Genoa CPUs

We tried various configurations for ZFS, memory management, process management, power efficiency modes, KVM CPU models, with/without hyperthreading, etc. Nothing helped.

In the end, we took the 6.11 kernel from the Ubuntu repo (Proxmox is based on the Ubuntu kernel anyway) and since then the systems have been stable.
The 6.8 kernel definitely has a problem.

@proxmox Team
You should seriously consider retiring the 6.8 kernel and switching to 6.11.
Thanks for sharing your experience! I would try it out, do you have any good instructions how to do it?
 
Just download the Kernel packages from an Ubuntu mirror (picked mirror.plusserver.com randomly):
It's ok for testing. For productive hosts you should think about own apt repos where you can store these packages. Or wait and hope that Proxmox will provide 6.11 kernels soon.

There is a package dependency for wireless-regdb. It's exists also in the Debian repos. Just install it before with apt.
Than install the deb packages directly.

You should load br_netfilter module explicitly:
Just add a line "br_netfilter" to "/etc/modules-load.d/br_netfilter.conf"

Because pve-firewall tries to read /proc/sys/net/bridge/bridge-nf-call-iptables which doesn't exists if the module isn't loaded.
Seems Proxmox patched this in their own version of the Ubuntu kernel. But the workaround is simple enough. :)

update-grub or the proxmox-boot-tool should set the 6.11 kernel as default because it's the highest number. So just reboot and hope :)

Of course you could also add the whole Ubuntu repo to your system. But that needs also some apt pinning configuration. Otherwise you probaly will fuck up your system if you start to mix all packages from Debian, Ubuntu and Proxmox. :P
 
  • Like
Reactions: Decco1337
Just download the Kernel packages from an Ubuntu mirror (picked mirror.plusserver.com randomly):
It's ok for testing. For productive hosts you should think about own apt repos where you can store these packages. Or wait and hope that Proxmox will provide 6.11 kernels soon.

There is a package dependency for wireless-regdb. It's exists also in the Debian repos. Just install it before with apt.
Than install the deb packages directly.

You should load br_netfilter module explicitly:
Just add a line "br_netfilter" to "/etc/modules-load.d/br_netfilter.conf"

Because pve-firewall tries to read /proc/sys/net/bridge/bridge-nf-call-iptables which doesn't exists if the module isn't loaded.
Seems Proxmox patched this in their own version of the Ubuntu kernel. But the workaround is simple enough. :)

update-grub or the proxmox-boot-tool should set the 6.11 kernel as default because it's the highest number. So just reboot and hope :)

Of course you could also add the whole Ubuntu repo to your system. But that needs also some apt pinning configuration. Otherwise you probaly will fuck up your system if you start to mix all packages from Debian, Ubuntu and Proxmox. :p
Thanks for that! I'll test it, I have enough servers which are empty to fuck them up haha :D Also need to test this for some days, I think it could be faster than waiting for Proxmox. I have issues since months with the 6.8 kernel and I have no patience anymore :D
 
Just download the Kernel packages from an Ubuntu mirror (picked mirror.plusserver.com randomly):
It's ok for testing. For productive hosts you should think about own apt repos where you can store these packages. Or wait and hope that Proxmox will provide 6.11 kernels soon.

There is a package dependency for wireless-regdb. It's exists also in the Debian repos. Just install it before with apt.
Than install the deb packages directly.

You should load br_netfilter module explicitly:
Just add a line "br_netfilter" to "/etc/modules-load.d/br_netfilter.conf"

Because pve-firewall tries to read /proc/sys/net/bridge/bridge-nf-call-iptables which doesn't exists if the module isn't loaded.
Seems Proxmox patched this in their own version of the Ubuntu kernel. But the workaround is simple enough. :)

update-grub or the proxmox-boot-tool should set the 6.11 kernel as default because it's the highest number. So just reboot and hope :)

Of course you could also add the whole Ubuntu repo to your system. But that needs also some apt pinning configuration. Otherwise you probaly will fuck up your system if you start to mix all packages from Debian, Ubuntu and Proxmox. :p
but zfs is not included, or?
 
  • Like
Reactions: Decco1337
Downgrade to 6.5 doesn't help. We tried this also on several systems.
If you want to prevent freezes with a downgrade, you could try a 5.x kernel. We had no issues with them on the same hosts. But upgrade to 6.11 was our preferred solution rather than downgrade to 5.x ;)
 
  • Like
Reactions: boxcee
Hi, i have the same crash on kernel.


Code:
general protection fault, probably for non-canonical address 0x155cc038a1c126e2: 0000 [#81] PREEMPT SMP PTI
CPU: 1 PID: 15631 Comm: cron Tainted: P      D W  O       6.8.12-2-pve #1
Hardware name: MSI MS-7788/H61M-P31/W8 (MS-7788), BIOS V3.6 09/29/2013
RIP: 0010:kmem_cache_alloc+0xce/0x370
Code: 83 78 10 00 48 8b 38 0f 84 48 02 00 00 48 85 ff 0f 84 3f 02 00 00 41 8b 44 24 28 49 8b 9c 24 b8 00 00 00 49 8b 34 24 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 20 00 00
0000:ffffacf949493cf0 EFLAGS: 00010206
RAX: 155cc038a1c126e2 RBX: f9dbc0bf50aa865b RCX: 0000000000000000
RDX: 00000000d20a6001 RSI: 000000000003cef0 RDI: 155cc038a1c12682
RBP: ffffacf949493d40 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9ae114a249c0 R11: 0000000000000000 R12: ffff9ae1001e1700
R13: 0000000000000cc0 R14: 0000000000000060 R15: ffffffff9c1f3fa1
FS:  00007691ee81b840(0000) GS:ffff9ae217280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007691ee32008e CR3: 000000010b426002 CR4: 00000000000606f0
Call Trace:
<TASK>
? show_regs+0x6d/0x80
? die_addr+0x37/0xa0
? exc_general_protection+0x1db/0x480
? asm_exc_general_protection+0x27/0x30
? __anon_vma_prepare+0xf1/0x180
? kmem_cache_alloc+0xce/0x370
__anon_vma_prepare+0xf1/0x180
do_fault+0x73/0x4c0
__handle_mm_fault+0x895/0xed0
handle_mm_fault+0x18d/0x380
do_user_addr_fault+0x1f8/0x660
exc_page_fault+0x83/0x1b0
asm_exc_page_fault+0x27/0x30
RIP: 0033:0x7691eeb3e772
Code: 0f 60 c0 66 0f 61 c0 66 0f 70 c0 00 48 83 fa 10 72 76 48 83 fa 20 77 12 0f 11 44 17 f0 0f 11 07 c3 0f 11 47 e0 0f 11 47 f0 c3 <0f> 11 07 0f 11 47 10 48 01 d7 48 83 fa 40 76 e7 0f 11 40 20 0f 11
RSP: 002b:00007fff5218ad88 EFLAGS: 00010202
RAX: 00007691ee32008e RBX: 0000000000000004 RCX: 00007691ee320a40
RDX: 00000000000009b2 RSI: 0000000000000000 RDI: 00007691ee32008e
RBP: 00007fff5218b110 R08: 00007691ee32008e R09: 000000000004b000
R10: 0000000000000003 R11: 0000000000000246 R12: 00007fff5218ae38
R13: 000059ecd224fc70 R14: 00007fff5218b1b0 R15: 00007691ee320a40
</TASK>

I can reproduce this error at any time, just click on 'Refresh' and wait for the 'TASK OK' several times.

1727327578465.png
1727327871968.png
1727327901447.png

and after that I am forced to restart manually.
 
Downgrade to 6.5 doesn't help. We tried this also on several systems.
If you want to prevent freezes with a downgrade, you could try a 5.x kernel. We had no issues with them on the same hosts. But upgrade to 6.11 was our preferred solution rather than downgrade to 5.x ;)
I followed your guide. Running 6.11.0-8-generic now. Let's see. Thank you!