Regression. Cannot start VMs with kernel 6.8.12-12-pve and 6.14.8-1-bpo12-pve and pci-passthrough sr-iov X710 VF

adolfotregosa

Well-Known Member
Jan 8, 2019
39
11
48
42
With the Proxmox kernels 6.14.4-1-pve, 6.14.5-1-bpo12-pve, and my self-compiled 6.14.8 kernel, I can successfully start any VM that uses PCI passthrough for a VF from my Intel X710 NIC.


However, with the 6.14.8-1-bpo12-pve kernel, any VM that attempts to use PCI passthrough for a VF from the X710 NIC fails to start.
(I just checked, the same is also happening with 6.8.12-12-pve so something was backported ???)

Example:

➜ ~ qm start 100
GUEST HOOK: 100 pre-start
100 is starting, doing preparations.
kvm: -device vfio-pci,host=0000:03:02.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0,rombar=0: vfio 0000:03:02.0: error getting device from group 19: Permission denied
Verify all devices in group 19 are bound to vfio-<bus> or pci-stub and not already in use
start failed: QEMU exited with code 1


The IOMMU groups are correct and isolated, and VFIO appears to be working fine. VMs start fine as long as they don’t use any VF from the X710. File permissions are set correctly (owned by root) across both working and non-working kernels.


Again, with my self-compiled 6.14.8 and 6.14.4-1-pve, 6.14.5-1-bpo12-pve kernels, VMs start without issues. The problem only occurs with the Proxmox 6.14.8-1-bpo12-pve and 6.8.12-12-pve kernels when attempting to passthrough a VF from the X710.

Intel 13900, z790 chipset, Intel X710 nic.


What am I missing or what might I have misconfigured?


Thank you.
 
Last edited:
Ok.
Proxmox 6.8.12-11 works fine but 6.8.12-12 fails. There's definitely some regression for me on 6.8.12-12 and 6.14.8-1-bpo12-pve.
 
Just got the same error when upgrading from 6.14.5-1-bpo12-pve to 6.14.8-1-bpo12-pve on my 2 ms-01 nodes.
sr-iov vfs stopped working with the same permission error on the IOMMU-group of the first vf I tried to assign to my VM.
For the moment I pinned kernel 6.14.5-1-bpo12-pve to have my VM up & running again.
 
  • Like
Reactions: depeo
Not Intel X710, but virtual machine using i915-sriov-dkms and using arrow lake vf will not boot with the same error.

As far as I know, 6.14.6-1 and 6.14.8-1 do not boot and 6.14.5-1 boots fine.

It did not work with 6.14.8-1 (vfio patch) cloned from git.

git://git.proxmox.com/git/pve-kernel.git

If it is compiling and working without pve kernel, is there something wrong with pve patching... But I don't know how to revert it.

https://github.com/proxmox/pve-kernel/tree/master/patches/kernel
 
Last edited:
Just got the same error when upgrading from 6.14.5-1-bpo12-pve to 6.14.8-1-bpo12-pve on my 2 ms-01 nodes.
sr-iov vfs stopped working with the same permission error on the IOMMU-group of the first vf I tried to assign to my VM.
For the moment I pinned kernel 6.14.5-1-bpo12-pve to have my VM up & running again.

Same with x710 sr-iov on 6.14.8-1-bpo12-pve
Pinning to 6.14.5-1-bpo12-pve solved it.
 
Delete the pve-kernel/pches/kernel directory and try make.

I just thought it might be patches 0009 (sriov) and 0012 (iommu) that were updated a month ago, since I have been experiencing this since 6.14.6-1 a month ago.

I haven't finished making the kernel yet, so I'll try to apply it as soon as I'm done.

 
Last edited:
Delete the pve-kernel/pches/kernel directory and try make.

I just thought it might be patches 0009 (sriov) and 0012 (iommu) that were updated a month ago, since I have been experiencing this since 6.14.6-1 a month ago.

I haven't finished making the kernel yet, so I'll try to apply it as soon as I'm done.
It didn't work..