Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

Hey,
thanks for your reply.

1. Yes, I just checked and in fact ASM1166 is always on the 04:00:.0 bus on kernel 6.8, 6.11 and 6.14
2. Yes, ASM1166 is always in it's own IOMMU group for all aforementioned kernel versions

Changing kernel version seems to have no effect on point 1 and 2 in my setup.

Despite this I still have the problem that with kernel 6.14 my TrueNAS VM that has PCIe passthrough configured for ASM1166 SATA controller can not be started.

I can see this log message getting "spammed" to journalctl repeatedly:
VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries


Like mentioned before, on both kernel 6.8 and 6.11 the setup works just fine.

It appears that kernel 6.14 has issues with PCIe passthrough.
I have this same issue, however with a twist. Due to how the IOMMU groups are setup and how bad the motherboard is (i'm using an gigabyte a520i) i need to override ACS: `pcie_acs_override=downstream,multifunction`
I am passing 4 devices to the VM, 2 sata controllers, 1 nvme drive and a vf network card.

LE: Turns out i needed to disable rombar on the sata controllers.
 
Last edited:
I have now upgraded my SIENAD8-2L2T/Epyc 8434P from Promox 8 to 9. WIth that, I also moved to latest 6.14.11-3-pve kernel and the system got kernel panic twice in 24 hours. So I had to revert to 6.11 again, stable for a few days now. I will try the new opt-in 6.17 later.

Anyone experienced a similar issue when upgrading on Epyc?

I still think this is PCIe passthrough related. I pass through the SATA controller of PCI7 to a VM. If it's IOMMU related, here is IOMMU group 6 with the SATA controllers passed through (all 16 lanes, shown as 2 SATA controllers). Devices shared are the two "SATA Controller" for 8+8 SATA ports, see image.

Does anyone here know if these devices are an issue or not?

Code:
root@kosmos:~# pvesh get /nodes/kosmos/hardware/pci --pci-class-blacklist "" | grep " 6 "
│ 0x010601 │ 0x7901 │ 0000:c6:00.0 │          6 │ 0x1022 │ FCH SATA Controller [AHCI mode]                     │      │ 0x7901           │                       │ 0x1022           │ Advanced Micro Devices, Inc. [AMD] │ Advanced Micro Devices, Inc. [AMD] │
│ 0x010601 │ 0x7901 │ 0000:c6:00.1 │          6 │ 0x1022 │ FCH SATA Controller [AHCI mode]                     │      │ 0x7901           │                       │ 0x1022           │ Advanced Micro Devices, Inc. [AMD] │ Advanced Micro Devices, Inc. [AMD] │
│ 0x060000 │ 0x149f │ 0000:c0:07.0 │          6 │ 0x1022 │ Genoa/Bergamo Dummy Host Bridge                     │      │ 0x0000           │                       │ 0x0000           │                                    │ Advanced Micro Devices, Inc. [AMD] │
│ 0x060400 │ 0x14a7 │ 0000:c0:07.1 │          6 │ 0x1022 │ Genoa/Bergamo Internal PCIe GPP Bridge to Bus [D:B] │      │ 0x14a4           │                       │ 0x1022           │ Advanced Micro Devices, Inc. [AMD] │ Advanced Micro Devices, Inc. [AMD] │
│ 0x060400 │ 0x14a7 │ 0000:c0:07.2 │          6 │ 0x1022 │ Genoa/Bergamo Internal PCIe GPP Bridge to Bus [D:B] │      │ 0x14a4           │                       │ 0x1022           │ Advanced Micro Devices, Inc. [AMD] │ Advanced Micro Devices, Inc. [AMD] │
│ 0x088000 │ 0x14dc │ 0000:c5:00.1 │          6 │ 0x1022 │ SDXI                                                │      │ 0x14dc           │                       │ 0x1022           │ Advanced Micro Devices, Inc. [AMD] │ Advanced Micro Devices, Inc. [AMD] │
│ 0x0c0330 │ 0x14c9 │ 0000:c5:00.4 │          6 │ 0x1022 │                                                     │      │ 0x14c9           │                       │ 0x1022           │ Advanced Micro Devices, Inc. [AMD] │ Advanced Micro Devices, Inc. [AMD] │
│ 0x130000 │ 0x14ac │ 0000:c5:00.0 │          6 │ 0x1022 │ Genoa/Bergamo Dummy Function                        │      │ 0x14ac           │                       │ 0x1022           │ Advanced Micro Devices, Inc. [AMD] │ Advanced Micro Devices, Inc. [AMD] │

1760370286206.png
 
Prepare for the upcoming:

If you using Nvidia card and Passthrough:
( blacklist the new Nvidia driver: NOVA )
Info:
https://wiki.archlinux.org/title/NVIDIA
https://docs.kernel.org/gpu/nova/index.html

Code:
/etc/modprobe.d/nvidia.conf

blacklist nvidiafb
blacklist nouveau
#
blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
#
blacklist nova_core
blacklist nova_drm

Can you talk more about this? I lost hardware transcoding...

I blacklisted the new nova drivers, updated to kernel 6.17, updated NVIDIA drivers from 570.x series to latest 580.x and I don't have any hardware transcoding, even though nvidia-smi shows GPU information on both host and in Plex container.


Code:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:08:00.0 Off |                  N/A |
| 31%   33C    P8              8W /   70W |       1MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Edited: It was a change in the assigned (cgroup?) number found below. Prior to updating, it was 510, now it's 509. I had to edit the container's [pid].conf file and change it to match, transcoding immediately started working again.

Code:
root@thesystem:~# ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Oct 14 16:04 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 14 16:04 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Oct 14 16:04 /dev/nvidia-modeset
crw-rw-rw- 1 root root 509,   0 Oct 14 16:05 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509,   1 Oct 14 16:05 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
cr-------- 1 root root 234, 1 Oct 14 16:05 nvidia-cap1
cr--r--r-- 1 root root 234, 2 Oct 14 16:05 nvidia-cap2

I added 509 and left in the (old/wrong) 510 number in case it changes itself back:

Code:
lxc.cgroup2.devices.allow: c 509:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
 
Last edited:
After upgrading to PVE 9.0.6 (kernel 6.14.11-1-pve), I encountered amdgpu related issues. My GPU is an AMD RX 6600 XT.

First:
PHP:
Sep 07 09:36:46 pve kernel: Linux version 6.14.11-1-pve (tom@alp1) (gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC PMX 6.14.11-1 (2025-08-26T16:06Z) ()
Sep 07 09:36:46 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.11-1-pve root=/dev/mapper/pve-root ro quiet iommu=pt
...
Sep 07 09:37:01 pve kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Sep 07 09:37:12 pve kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:85:crtc-0] flip_done timed out
According to the help here, I solved it by adding amdgpu.dcdebugmask=0x10.

Second:
PHP:
Sep 07 12:21:51 pve kernel: Linux version 6.14.11-1-pve (tom@alp1) (gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC PMX 6.14.11-1 (2025-08-26T16:06Z) ()
Sep 07 12:21:51 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.11-1-pve root=/dev/mapper/pve-root ro quiet iommu=pt
...
Sep 07 12:22:07 pve kernel: [drm:amdgpu_discovery_set_ip_blocks [amdgpu]] *ERROR* amdgpu_discovery_init failed
Sep 07 12:22:07 pve kernel: amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
Sep 07 12:22:07 pve kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
Sep 07 12:22:07 pve kernel: amdgpu 0000:03:00.0: probe with driver amdgpu failed with error -22
This seems to be a kernel bug, Related issues.

Unfortunately, the above issues will cause a green screen freeze, whereas 6.8.12-14-pve does not exhibit these problems.
Both have been fixed in the 6.17 kernel.
 
Can you talk more about this? I lost hardware transcoding...

I blacklisted the new nova drivers, updated to kernel 6.17, updated NVIDIA drivers from 570.x series to latest 580.x and I don't have any hardware transcoding, even though nvidia-smi shows GPU information on both host and in Plex container.


Code:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:08:00.0 Off |                  N/A |
| 31%   33C    P8              8W /   70W |       1MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Edited: It was a change in the assigned (cgroup?) number found below. Prior to updating, it was 510, now it's 509. I had to edit the container's [pid].conf file and change it to match, transcoding immediately started working again.

Code:
root@thesystem:~# ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Oct 14 16:04 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 14 16:04 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Oct 14 16:04 /dev/nvidia-modeset
crw-rw-rw- 1 root root 509,   0 Oct 14 16:05 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509,   1 Oct 14 16:05 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
cr-------- 1 root root 234, 1 Oct 14 16:05 nvidia-cap1
cr--r--r-- 1 root root 234, 2 Oct 14 16:05 nvidia-cap2

I added 509 and left in the (old/wrong) 510 number in case it changes itself back:

Code:
lxc.cgroup2.devices.allow: c 509:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
If you intend to use the gpu on the host or in LXC containers, as opposed to passing through to a VM, don’t blacklist nvidia*