Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

VM still crashing with the newer 6.14 kernel


Apr 11 17:28:32 pve QEMU[92966]: error: kvm run failed Bad address
Apr 11 17:28:32 pve QEMU[92966]: RAX=000018cc0e382840 RBX=0000000000000780 RCX=0000000000000780 RDX=0000000000000780
Apr 11 17:28:32 pve QEMU[92966]: RSI=000018cc0e382840 RDI=00007a71e5d1f000 RBP=00007a71eb854960 RSP=00007a71eb854960
Apr 11 17:28:32 pve QEMU[92966]: R8 =0000000000000780 R9 =0000000000000090 R10=00000000000003c0 R11=0000000000000800
Apr 11 17:28:32 pve QEMU[92966]: R12=0000000000000090 R13=00005597198ef358 R14=00007a71e5d1f000 R15=000018cc0e382840
Apr 11 17:28:32 pve QEMU[92966]: RIP=000055971ff041d0 RFL=00010206 [-----P-] CPL=3 II=0 A20=1 SMM=0 HLT=0
Apr 11 17:28:32 pve QEMU[92966]: ES =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: CS =0033 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
Apr 11 17:28:32 pve QEMU[92966]: SS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
Apr 11 17:28:32 pve QEMU[92966]: DS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: FS =0000 00007a71eb8576c0 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: GS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: TR =0040 fffffe7be1176000 00004087 00008b00 DPL=0 TSS64-busy
Apr 11 17:28:32 pve QEMU[92966]: GDT= fffffe7be1174000 0000007f
Apr 11 17:28:32 pve QEMU[92966]: IDT= fffffe0000000000 00000fff
Apr 11 17:28:32 pve QEMU[92966]: CR0=80050033 CR2=00007a71e5d1f000 CR3=000000011ec9a006 CR4=00772ef0
Apr 11 17:28:32 pve QEMU[92966]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 11 17:28:32 pve QEMU[92966]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 11 17:28:32 pve QEMU[92966]: EFER=0000000000000d01
Apr 11 17:28:32 pve QEMU[92966]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f
Apr 11 17:28:32 pve QEMU[92966]: RAX=000018cc0e303040 RBX=0000000000000780 RCX=0000000000000780 RDX=0000000000000780
Apr 11 17:28:32 pve QEMU[92966]: RSI=000018cc0e303040 RDI=00007a71e5c97000 RBP=00007a72793fa960 RSP=00007a72793fa960
Apr 11 17:28:32 pve QEMU[92966]: R8 =0000000000000780 R9 =0000000000000110 R10=00000000000003c0 R11=0000000000000800
Apr 11 17:28:32 pve QEMU[92966]: R12=0000000000000110 R13=00005597198ef358 R14=00007a71e5c97000 R15=000018cc0e303040
Apr 11 17:28:32 pve QEMU[92966]: RIP=000055971ff041d0 RFL=00010202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
Apr 11 17:28:32 pve QEMU[92966]: ES =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: CS =0033 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
Apr 11 17:28:32 pve QEMU[92966]: SS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
Apr 11 17:28:32 pve QEMU[92966]: DS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: FS =0000 00007a72793fd6c0 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: GS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: TR =0040 fffffe698ad4f000 00004087 00008b00 DPL=0 TSS64-busy
Apr 11 17:28:32 pve QEMU[92966]: GDT= fffffe698ad4d000 0000007f
Apr 11 17:28:32 pve QEMU[92966]: IDT= fffffe0000000000 00000fff
Apr 11 17:28:32 pve QEMU[92966]: CR0=80050033 CR2=00007a71e5c97000 CR3=000000011ec9a003 CR4=00772ef0
Apr 11 17:28:32 pve QEMU[92966]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 11 17:28:32 pve QEMU[92966]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 11 17:28:32 pve QEMU[92966]: EFER=0000000000000d01
Apr 11 17:28:32 pve QEMU[92966]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f
Apr 11 17:28:32 pve QEMU[92966]: RAX=000018cc0e283840 RBX=0000000000000780 RCX=0000000000000780 RDX=0000000000000780
Apr 11 17:28:32 pve QEMU[92966]: RSI=000018cc0e283840 RDI=00007a71e5c0f000 RBP=00007a71ed884960 RSP=00007a71ed884960
Apr 11 17:28:32 pve QEMU[92966]: R8 =0000000000000780 R9 =0000000000000110 R10=00000000000003c0 R11=0000000000000800
Apr 11 17:28:32 pve QEMU[92966]: R12=0000000000000110 R13=00005597198ef358 R14=00007a71e5c0f000 R15=000018cc0e283840
Apr 11 17:28:32 pve QEMU[92966]: RIP=000055971ff041d0 RFL=00010202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
Apr 11 17:28:32 pve QEMU[92966]: ES =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: CS =0033 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
Apr 11 17:28:32 pve QEMU[92966]: SS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
Apr 11 17:28:32 pve QEMU[92966]: DS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: FS =0000 00007a71ed8876c0 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: GS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: TR =0040 fffffe1d5342d000 00004087 00008b00 DPL=0 TSS64-busy
Apr 11 17:28:32 pve QEMU[92966]: GDT= fffffe1d5342b000 0000007f
Apr 11 17:28:32 pve QEMU[92966]: IDT= fffffe0000000000 00000fff
Apr 11 17:28:32 pve QEMU[92966]: CR0=80050033 CR2=00007a71e5c0f000 CR3=000000011ec9a004 CR4=00772ef0
Apr 11 17:28:32 pve QEMU[92966]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 11 17:28:32 pve QEMU[92966]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 11 17:28:32 pve QEMU[92966]: EFER=0000000000000d01
Apr 11 17:28:32 pve QEMU[92966]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f
 
https://bugzilla.proxmox.com/show_bug.cgi?id=6273

E1000 (Intel I219-LM) hang exists also on 6.14.0-2 - reminder to all those who wished that it would fix it. Host collapses in hours now. :D
 
Why is the performance loss of AMD-9950X virtualization greatly increased, and the 6.11 kernel does not have this problem
I'm having the same performance problem with my AMD 9950X3D in 6.14. Going back to 6.11 the performance is greatly increased again: In a Windows 11 VM with 8 cores/16 threads I get 61 FPS in X-Plane, while I only get 50 FPS in 6.14. In Cinebench R23 I get around 31.100 pts in 6.11 and only 28.900 pts in 6.14. I've checked that there is no thermal throttling. Going back and forth between the two kernels, the results are reliable repeatable.

The weird thing is that the clocks in my Windows VM in 6.14 are even higher under load (and in idle) than in 6.11, so theoretically the performance should be increased, not decreased. Maybe this is a timing problem with all the changes to the AMD P-State Driver? Sadly, I don't know how to get to the bottom of the problem.

Edit: Manually pinning the CPU Affinity of the 8 cores/16 threads of the VM to CCD0 (0-7,16-23) in 6.14 brings back the performance of 6.11 - at least in X-Plane (61 fps). However, in this scenario I only get 22.000 pts in Cinebench R23 instead of 31.100 pts. No CPU pinning for any scenario (gaming/benchmarking) was needed in 6.11 in order to get the best results. I wonder what changed under the hood.
 
Last edited:
FYI: There's a newer proxmox-kernel in version 6.14.0-2-pve available on the pvetest repo, it should address the issues with AMD EPYC Genoa and the ae4dma module and also reduce the log spam for some Intel N CPU based system.
nice, yeah it looks like I guess I don't have ECC errors ;)

Code:
# dmesg | egrep -i 'edac|igen'
[    0.957784] EDAC MC: Ver: 3.0.0
[    3.629302] caller igen6_probe+0x1bc/0x8e0 [igen6_edac] mapping multiple BARs
[    3.629592] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (POLLED)
[    3.629636] EDAC igen6: v2.5.1

Still higher idle than latest 6.8.12 though
 
Same here. I'm still unable to boot my Epyc 9554 system on 6.14.0-2-pve without blacklisting the ae4dma driver.
I can confirm, still not booting without black listing the ae4dma on EPYC 9124.
Which boards/BIOS versions do both of you have?


I'll see that we include the patch 1/3 https://lore.kernel.org/all/20250203162511.911946-1-Basavaraj.Natikar@amd.com/ in our kernel as well - but
narrowing this down further might help upstream to also include it

Thanks
 
This was posted in the following thread, and although it is a duplicate, I believe it is related to this thread so I am posting it.

https://forum.proxmox.com/threads/working-amd-rx-9070xt-support.163370/page-2#post-761510

It has been seen that launching applications that use the GPU results in errors and crashes the virtual machine.

internal-error

Intel Core Ultra 265k
Asrock Z890 Pro RS WiFi White
Hellhound Spectral White AMD Radeon RX 9070 XT 16GB GDDR6

This occurs in an environment where the RX 9070 series is passed through.

This issue does not occur with kernel 6.11, but occurs when switching to kernel 6.14.

Pinning to kernel 6.11 fixes it.

I don't know if it has any impact, but it seems similar to the impact of the KASLR fix.

Crashes, slow performance, increased loading times...
 

Attachments

  • IMG_3067.jpeg
    IMG_3067.jpeg
    24.6 KB · Views: 10
  • IMG_3068.jpeg
    IMG_3068.jpeg
    493 KB · Views: 10
  • IMG_3065.jpeg
    IMG_3065.jpeg
    314.8 KB · Views: 10
Last edited:
Sorry, in the mean time i've found a workaroud.

My Controller was only passed through in the VM by "RAW Device" - every Option is 1:1 working with 6.8. and 6.11 - now i've created a mapped device and pass this mapped Device through to the VM the VM starts normally.
Running into a similar issue here. Although I'm running Ubuntu, not Truenas in my VM). I too created a mapped device, but this not seem to make a difference.

the complete journal of a failing boot (`journalctl -b` after you tried to start the VM and it fails) would help for beginning to debug this.
Added that here, although the error message is rather generic in nature. Let me know if there's anything else that is needed.


Aside from that, was hoping to gain some insight from others regarding WOL: It seems my wake on lan on proxmox host is no longer working since the upgrade. At first I thought it was due to the 6.14 kernel, but when I reverted to 6.11 issue persists. wakeonlan is still enabled via the /etc/interfaces post-up commands and I can see that the server keeps the nic active after shutdown. RTL8125 nic so it might have something todo with the realtek driver update?
 

Attachments

Maybe check it with ethtool as shown here.

Disclaimer: I don't use WOL myself.
Hi, thanks for the suggestion. I should’ve mentioned that I validated that via ethtool too. It shows as working, and, did work previous to the upgrade of proxmox. I have not made changes to my network but did validate that magic packets are received by the system.
 
This was posted in the following thread, and although it is a duplicate, I believe it is related to this thread so I am posting it.

https://forum.proxmox.com/threads/working-amd-rx-9070xt-support.163370/page-2#post-761510

It has been seen that launching applications that use the GPU results in errors and crashes the virtual machine.

internal-error

Intel Core Ultra 265k
Asrock Z890 Pro RS WiFi White
Hellhound Spectral White AMD Radeon RX 9070 XT 16GB GDDR6

This occurs in an environment where the RX 9070 series is passed through.

This issue does not occur with kernel 6.11, but occurs when switching to kernel 6.14.

Pinning to kernel 6.11 fixes it.

I don't know if it has any impact, but it seems similar to the impact of the KASLR fix.

Crashes, slow performance, increased loading times...
I also reported something similar. On my case simply opening youtube with video acceleration results in an instant crash.
VM instant crash
 
Last edited:
  • Like
Reactions: uzumo