Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

I have no idea why but I have retested this several times now. My cachyos VM with Nvidia 4060TI passthrough crashes on youtube with brave when I use proxmox 6.14 kernel. I'm scratching my head over this one. Nvidia driver is correctly installed on proxmox, VM hook unloads nvidia driver and vfio takes over correctly but as soon I open brave and youtube I have a instant crash. When using proxmox kernel 6.11, The VM is working perfectly. How is this possible/related ?!!
Someone ?
My cachyos vm is running kernel 6.14. I need some suggestions :rolleyes:o_O !!

Edit: I found this with journalctl

This shows up when starting up my VMs, also in 6.11, on this case OPNsense and my cachyos VMs.

Apr 09 21:12:19 pve QEMU[5740]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 09 21:12:19 pve QEMU[5740]: kvm: vfio_container_dma_map(0x61bb50aef680, 0x380800000000, 0x200000000, 0x774780000000) = -22 (Invalid argument)
Apr 09 21:12:19 pve QEMU[5740]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 09 21:12:19 pve QEMU[5740]: kvm: vfio_container_dma_map(0x61bb50aef680, 0x380a00000000, 0x2000000, 0x7751c4000000) = -22 (Invalid argument)
Apr 09 21:12:19 pve QEMU[5740]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 09 21:12:19 pve QEMU[5740]: kvm: vfio_container_dma_map(0x61bb50aef680, 0x380000000000, 0x10000, 0x7751efc30000) = -22 (Invalid argument)
Apr 09 21:12:19 pve QEMU[5740]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 09 21:12:19 pve QEMU[5740]: kvm: vfio_container_dma_map(0x61bb50aef680, 0x380000011000, 0x3000, 0x7751efc49000) = -22 (Invalid argument)
Apr 09 21:12:19 pve QEMU[5740]: kvm: VFIO_MAP_DMA failed: Invalid argument

--------------------------------------------------------------------------------------------
This is my cachyos VM crash:

09 21:17:54 pve QEMU[5740]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 09 21:17:54 pve QEMU[5740]: TR =0040 fffffe70d5a4a000 00004087 00008b00 DPL=0 TSS64-busy
Apr 09 21:17:54 pve QEMU[5740]: GDT= fffffe70d5a48000 0000007f
Apr 09 21:17:54 pve QEMU[5740]: IDT= fffffe0000000000 00000fff
Apr 09 21:17:54 pve QEMU[5740]: CR0=80050033 CR2=00007d3164758000 CR3=0000000169210001 CR4=00772ef0
Apr 09 21:17:54 pve QEMU[5740]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 09 21:17:54 pve QEMU[5740]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 09 21:17:54 pve QEMU[5740]: EFER=0000000000000d01
Apr 09 21:17:54 pve QEMU[5740]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f
Apr 09 21:17:54 pve QEMU[5740]: RAX=00002d0c0bb03040 RBX=0000000000000780 RCX=0000000000000780 RDX=0000000000000780
Apr 09 21:17:54 pve QEMU[5740]: RSI=00002d0c0bb03040 RDI=00007d31647e0000 RBP=00007d31f93fa960 RSP=00007d31f93fa960
Apr 09 21:17:54 pve QEMU[5740]: R8 =0000000000000780 R9 =0000000000000110 R10=00000000000003c0 R11=0000000000000800
Apr 09 21:17:54 pve QEMU[5740]: R12=0000000000000110 R13=00005fd38b7cf358 R14=00007d31647e0000 R15=00002d0c0bb03040
Apr 09 21:17:54 pve QEMU[5740]: RIP=00005fd391de41d0 RFL=00010202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
Apr 09 21:17:54 pve QEMU[5740]: ES =0000 0000000000000000 ffffffff 00c00000
Apr 09 21:17:54 pve QEMU[5740]: CS =0033 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
Apr 09 21:17:54 pve QEMU[5740]: SS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
Apr 09 21:17:54 pve QEMU[5740]: DS =0000 0000000000000000 ffffffff 00c00000
Apr 09 21:17:54 pve QEMU[5740]: FS =0000 00007d31f93fd6c0 ffffffff 00c00000
Apr 09 21:17:54 pve QEMU[5740]: GS =0000 0000000000000000 ffffffff 00c00000
Apr 09 21:17:54 pve QEMU[5740]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 09 21:17:54 pve QEMU[5740]: TR =0040 fffffe592b35d000 00004087 00008b00 DPL=0 TSS64-busy
Apr 09 21:17:54 pve QEMU[5740]: GDT= fffffe592b35b000 0000007f
Apr 09 21:17:54 pve QEMU[5740]: IDT= fffffe0000000000 00000fff
Apr 09 21:17:54 pve QEMU[5740]: CR0=80050033 CR2=00007d31647e0000 CR3=0000000169210006 CR4=00772ef0
Apr 09 21:17:54 pve QEMU[5740]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 09 21:17:54 pve QEMU[5740]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 09 21:17:54 pve QEMU[5740]: EFER=0000000000000d01
Apr 09 21:17:54 pve QEMU[5740]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f

---------------------------------
This is the cachyos VM crash once again, just in case, seams similar:

Apr 09 21:36:55 pve QEMU[2387]: error: kvm run failed Bad address
Apr 09 21:36:55 pve QEMU[2387]: RAX=000025cc0e697420 RBX=0000000000000780 RCX=0000000000000780 RDX=0000000000000780
Apr 09 21:36:55 pve QEMU[2387]: RSI=000025cc0e697420 RDI=000077b7b9b29000 RBP=000077b8b55fa960 RSP=000077b8b55fa960
Apr 09 21:36:55 pve QEMU[2387]: R8 =0000000000000780 R9 =0000000000000110 R10=00000000000003e0 R11=0000000000000800
Apr 09 21:36:55 pve QEMU[2387]: R12=0000000000000110 R13=000061e0f6cdd358 R14=000077b7b9b29000 R15=000025cc0e697420
Apr 09 21:36:55 pve QEMU[2387]: RIP=000061e0fd2f21d0 RFL=00010202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
Apr 09 21:36:55 pve QEMU[2387]: ES =0000 0000000000000000 ffffffff 00c00000
Apr 09 21:36:55 pve QEMU[2387]: CS =0033 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
Apr 09 21:36:55 pve QEMU[2387]: SS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
Apr 09 21:36:55 pve QEMU[2387]: DS =0000 0000000000000000 ffffffff 00c00000
Apr 09 21:36:55 pve QEMU[2387]: FS =0000 000077b8b55fd6c0 ffffffff 00c00000
Apr 09 21:36:55 pve QEMU[2387]: GS =0000 0000000000000000 ffffffff 00c00000
Apr 09 21:36:55 pve QEMU[2387]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 09 21:36:55 pve QEMU[2387]: TR =0040 fffffe66ff6d2000 00004087 00008b00 DPL=0 TSS64-busy
Apr 09 21:36:55 pve QEMU[2387]: GDT= fffffe66ff6d0000 0000007f
Apr 09 21:36:55 pve QEMU[2387]: IDT= fffffe0000000000 00000fff
Apr 09 21:36:55 pve QEMU[2387]: CR0=80050033 CR2=000077b7b9b29000 CR3=0000000196a58004 CR4=00772ef0
Apr 09 21:36:55 pve QEMU[2387]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 09 21:36:55 pve QEMU[2387]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 09 21:36:55 pve QEMU[2387]: EFER=0000000000000d01
Apr 09 21:36:55 pve QEMU[2387]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f


Back to 6.11 for me.
 
Last edited:
Some good news, updating to kernel 6.14 dropped the power usage in my rack by ~15 watts.

Two proxmox systems, a 3700x and 5700g CPU.

You can see when I rebooted both of them into the 6.14 kernel.

1744256600474.png
 
6.14.0-1-pve is reporting some kind of ECC errors (I'm not running ECC memory with N100 lol). My dmesg is completely filled with these errors:

Code:
# dmesg | grep igen6
[    3.775370] caller igen6_probe+0x1bc/0x8e0 [igen6_edac] mapping multiple BARs
[    3.775412] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (POLLED)
[    3.775430] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    3.775431] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    3.775469] EDAC igen6: v2.5.1
[    4.824379] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    4.824382] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    5.852327] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    5.852335] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    6.872532] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    6.872536] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    7.896351] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    7.896355] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    8.920317] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    8.920320] EDAC igen6 MC0: ADDR 0x7fffffffe0

I've blacklist the igen6_edac module, but still with this 6.14 kernel my idle CPU frequency is still higher than with 6.8, so I've reverted.

Not sure about anyone else, but when running 6.8.12-9-pve, the igen6_edac module also loads, but the only dmesg entries are these:
Code:
# dmesg | grep igen6
[    3.455293] caller igen6_probe+0x193/0x8b0 [igen6_edac] mapping multiple BARs
[    3.455337] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
[    3.455389] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    3.455391] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    3.455431] EDAC igen6: v2.5.1

I tried to report a bug to the `linux-edac` mailing list, but I'm not sure the post made it through.

I've got an UP7000 embedded board, and kernel 6.13.x doesn't show this behaviour, but 6.14.1 does. This is using Fedora 41 as the distro - so this certainly isn't a proxmox issue.

EDIT: Link to email: https://lore.kernel.org/linux-edac/de007cb2-e64d-46b8-89d0-a064a7ab369b@crc.id.au/T/#u
 
Last edited:
I have one strange error:
Kernel 6.14 ,lxc with mysql on it.
/var/log/kern.log:3651:2025-04-10T10:35:40.646702+02:00 sp19 kernel: [236703.525309] audit: type=1400 audit(1744274140.640:2244): apparmor="DENIED" operation="create" class="net" namespace="root//lxc-1193_<-var-lib-lxc>" profile="/usr/sbin/mysqld" pid=2674395 comm="mysqld" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none
This doesnt happen with 6.11 kernel.
 
We have the same issue with the AMD EPYC 9754...
I had the same problem, that an ADM EPYC system rebooted while booting the 6.14 kernel.
After bisecting i found the problem with the new a4edma driver introduced in the 6.14 kernel.

The easiest workaround is to disable the module with
Code:
blacklist ae4dma
e.g. in the /etc/modprobe.d/pve-blacklist.conf - than the 6.14 kernel will boot on EPYC 9754 systems.

There is also a patch around, that disable the driver by removing some PCI IDs (for epyc 9754 the 14dc) so the driver would not be loaded.
 
I had the same problem, that an ADM EPYC system rebooted while booting the 6.14 kernel.
After bisecting i found the problem with the new a4edma driver introduced in the 6.14 kernel.
Thanks for the effort and work!

We found a system where we could potentially reproduce the issue - and will try to backport the patches you mentioned:
There is also a patch around, that disable the driver by removing some PCI IDs (for epyc 9754 the 14dc) so the driver would not be loaded.

https://lore.kernel.org/all/20250203162511.911946-1-Basavaraj.Natikar@amd.com/

Will post an update if there's any progress
 
I had the same problem, that an ADM EPYC system rebooted while booting the 6.14 kernel.
After bisecting i found the problem with the new a4edma driver introduced in the 6.14 kernel.

The easiest workaround is to disable the module with
Code:
blacklist ae4dma
e.g. in the /etc/modprobe.d/pve-blacklist.conf - than the 6.14 kernel will boot on EPYC 9754 systems.

There is also a patch around, that disable the driver by removing some PCI IDs (for epyc 9754 the 14dc) so the driver would not be loaded.

Awesome. Thank you!
 
I had the same problem, that an ADM EPYC system rebooted while booting the 6.14 kernel.
After bisecting i found the problem with the new a4edma driver introduced in the 6.14 kernel.

The easiest workaround is to disable the module with
Code:
blacklist ae4dma
e.g. in the /etc/modprobe.d/pve-blacklist.conf - than the 6.14 kernel will boot on EPYC 9754 systems.

There is also a patch around, that disable the driver by removing some PCI IDs (for epyc 9754 the 14dc) so the driver would not be loaded.
this one is working for me. --> AMD EPYC 9554 on a Supermicro H13SSL-NT Board.
 
  • Like
Reactions: Stoiko Ivanov
We found a system where we could potentially reproduce the issue - and will try to backport the patches you mentioned:

https://lore.kernel.org/all/20250203162511.911946-1-Basavaraj.Natikar@amd.com/

Will post an update if there's any progress
with 2 patches from this series applied (those that should also be applied in kernel.org 6.14.2) - the system where we ran into the issue booted up ok:
https://lore.proxmox.com/all/20250410130834.1745644-1-s.ivanov@proxmox.com/T/#u

once this is applied (or our kernel is updated to >= 6.14.2) the issue should be resolved.

Thanks again @benesch for the analysis!
 
  • Like
Reactions: danmcq and waltar
FYI: There's a newer proxmox-kernel in version 6.14.0-2-pve available on the pvetest repo, it should address the issues with AMD EPYC Genoa and the ae4dma module and also reduce the log spam for some Intel N CPU based system.
 
also reduce the log spam for some Intel N CPU based system.

Do you have details on what this is?

As I posted, I saw it on an N100 CPU embedded board - which in my case has no proxmox or ubuntu on it - so would like to investigate.

EDIT: Ah - just updating to kernel 6.14.2 did the trick.
 
Last edited:
FYI: There's a newer proxmox-kernel in version 6.14.0-2-pve available on the pvetest repo, it should address the issues with AMD EPYC Genoa and the ae4dma module and also reduce the log spam for some Intel N CPU based system.
I've tested the 6.14.0-2-pve kernel on my AMD EPYC 9754 system. The problem still exists on my system, when the ae4dma module is loaded.

I think the first patch "dmaengine: ae4dma: Remove deprecated PCI IDs" from
https://lore.kernel.org/all/20250203162511.911946-1-Basavaraj.Natikar@amd.com/
should also be applied to make the kernel bootable without the ae4dma module blacklisted.

Applied on a 6.14.2 vanilla kernel, the system booted without module blacklisting, without that patch the system crashed like before.

Some system information:
AMD EPYC 9754 system
Code:
lspci -n | grep 14dc

02:00.1 0880: 1022:14dc
21:00.1 0880: 1022:14dc
43:00.1 0880: 1022:14dc
64:00.1 0880: 1022:14dc
83:00.1 0880: 1022:14dc
a1:00.1 0880: 1022:14dc
c1:00.1 0880: 1022:14dc
e1:00.1 0880: 1022:14dc

lspci -d 1022:14dc
02:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
21:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
43:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
64:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
83:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
a1:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
c1:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
e1:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
 
I've tested the 6.14.0-2-pve kernel on my AMD EPYC 9754 system. The problem still exists on my system, when the ae4dma module is loaded.

I think the first patch "dmaengine: ae4dma: Remove deprecated PCI IDs" from
https://lore.kernel.org/all/20250203162511.911946-1-Basavaraj.Natikar@amd.com/
should also be applied to make the kernel bootable without the ae4dma module blacklisted.

Applied on a 6.14.2 vanilla kernel, the system booted without module blacklisting, without that patch the system crashed like before.

Some system information:
AMD EPYC 9754 system
Code:
lspci -n | grep 14dc

02:00.1 0880: 1022:14dc
21:00.1 0880: 1022:14dc
43:00.1 0880: 1022:14dc
64:00.1 0880: 1022:14dc
83:00.1 0880: 1022:14dc
a1:00.1 0880: 1022:14dc
c1:00.1 0880: 1022:14dc
e1:00.1 0880: 1022:14dc

lspci -d 1022:14dc
02:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
21:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
43:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
64:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
83:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
a1:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
c1:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
e1:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
Same here on an EPYC 9474F/Supermicro H13SSL. I end in a bootloop with 6.14.0-2
EDIT: im running latest BIOS from Feb 2025
 
Last edited:
  • Like
Reactions: Stoiko Ivanov
I've tested the 6.14.0-2-pve kernel on my AMD EPYC 9754 system. The problem still exists on my system, when the ae4dma module is loaded.

I think the first patch "dmaengine: ae4dma: Remove deprecated PCI IDs" from
https://lore.kernel.org/all/20250203162511.911946-1-Basavaraj.Natikar@amd.com/
should also be applied to make the kernel bootable without the ae4dma module blacklisted.

Applied on a 6.14.2 vanilla kernel, the system booted without module blacklisting, without that patch the system crashed like before.
hm - the updated kernel in our repo does boot successfully on our testsystem (ASUS RS500 based - I just reverified this with the kernel from our repos):

Code:
root@pve-test:~# uname -r
6.14.0-2-pve
root@pve-test:~# lspci -d 1022:14dc
04:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
42:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
83:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
ca:00.1 System peripheral: Advanced Micro Devices, Inc. [AMD] SDXI
root@pve-test:~# lsmod|grep ae4dma
ae4dma                 12288  0
ptdma                  28672  1 ae4dma

Just to not miss anything - does it work with `proxmox-kernel-6.14.0-2-pve-signed` for you?
Do you mean with 6.14.2 vanilla kernel the one from kernel.org released yesterday? (that one should have the 2 patches)

apart from that - is there maybe a BIOS update available?
 
Last edited: