RTX 5080 (Blackwell GB203) VFIO Passthrough Failing on PVE 9.1 / Kernel 6.17 — Has Anyone Solved This?

liferollson91

New Member
Feb 27, 2026
1
0
1
Hi everyone,


I've spent a significant amount of time trying to get GPU passthrough working for an NVIDIA RTX 5080 on a fresh PVE 9.1 install and have hit a wall that I believe is a kernel-level incompatibility specific to PVE 9.1 / kernel 6.17. I'm posting here to share my findings, ask if anyone has solved this, and point to a bug report I've filed.




My Hardware​


  • CPU: Intel Core Ultra 9 285K (Arrow Lake, LGA 1851)
  • Motherboard: GIGABYTE Z890 AORUS MASTER AI TOP (latest BIOS)
  • GPU: GIGABYTE WINDFORCE GeForce RTX 5080 16GB (Blackwell GB203)
  • PVE Version: 9.1.6
  • Kernel: 6.17.9-1-pve
  • QEMU: 10.1.2



The Error​


When attempting to start a Windows 11 VM with the RTX 5080 passed through, QEMU exits immediately with:

Code:
error writing '1' to '/sys/bus/pci/devices/0000:02:00.0/reset': Inappropriate ioctl for device
failed to reset PCI device '0000:02:00.0', but trying to continue as not all devices need a reset
kvm: -device vfio-pci,host=0000:02:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:02:00.0: error getting device from group 14: No such device
Verify all devices in group 14 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1




What I've Already Verified (Everything Looks Correct)​


  • intel_iommu=on is set in kernel cmdline, IOMMU confirmed enabled in dmesg
  • Both 0000:02:00.0 (GPU) and 0000:02:00.1 (HDMI audio) are bound to vfio-pci
  • /dev/vfio/14 exists and nothing holds it open (lsof /dev/vfio/14 returns nothing)
  • IOMMU group 14 contains only the two GPU devices — no contamination
  • Group type is DMA-FQ (without iommu=pt)
  • All other hostpci devices are in separate groups with correct bindings
  • Tried with and without iommu=pt, rombar on/off, vga: none, reduced memory — no change



Root Cause I've Identified​


Kernel 6.17.9-1-pve is built with CONFIG_VFIO_DEVICE_CDEV=y, which causes the VFIO subsystem to use iommufd as its primary backend instead of the legacy vfio_iommu_type1 group backend. You can confirm this yourself:

Bash:
grep -i vfio_device_cdev /boot/config-$(uname -r)
# Returns: CONFIG_VFIO_DEVICE_CDEV=y

lsmod | grep vfio
# vfio_iommu_type1   49152  0     ← use count 0, not active
# iommufd           126976  1 vfio ← iommufd is the active backend

Proxmox's qemu-server generates QEMU arguments using the legacy group API (host=0000:02:00.0), but in kernel 6.17 the device is registered to the cdev/iommufd interface. This causes VFIO_GROUP_GET_DEVICE_FD to return ENODEV even though everything else is correctly configured.




What I've Tried to Fix It​


Attempt 1 — Patch PCI.pm to use iommufd API:I patched /usr/share/perl5/PVE/QemuServer/PCI.pm to inject -object iommufd,id=iommufd0 and replace host= with sysfsdev= + iommufd=iommufd0. QEMU then gets further and reaches the iommufd bind step, but fails with:

Code:
vfio 0000:02:00.0: error bind device fd=65 to iommufd=64: No such device

Attempt 2 — NVIDIA driver pre-initialization:Based on community reports that Blackwell GPUs require NVIDIA firmware initialization before VFIO can claim them, I installed nvidia-kernel-open-dkms 590.48.01, let nvidia load at boot to initialize the GPU, then unloaded it so vfio-pci could claim it. The rebind works correctly, but the iommufd bind still fails with the same ENODEV.


Attempt 3 — Removing iommu=pt:The group type changes from identity to DMA-FQ without iommu=pt, which should be compatible with iommufd. Still fails.


In all cases, the VFIO_DEVICE_BIND_IOMMUFD kernel ioctl returns ENODEV for this specific Blackwell GB203 device.




What Works (For Reference)​


Multiple community members confirm RTX 5000 series passthrough works on PVE 8.x with kernel 6.8. See these threads for reference:





Bug Report Filed​


I've filed a bug report with Proxmox:https://bugzilla.proxmox.com/show_bug.cgi?id=7374


The report proposes that qemu-server needs to generate iommufd-aware QEMU arguments on kernels with CONFIG_VFIO_DEVICE_CDEV=y, and that there may also be a kernel-level issue with VFIO_DEVICE_BIND_IOMMUFD and Blackwell PCIe Legacy Endpoint devices specifically.




Questions for the Community​


  1. Has anyone successfully passed through an RTX 5080 (or any RTX 5000 series card) on PVE 9.1 specifically? If so, what kernel and configuration did you use?
  2. Has anyone compiled a custom PVE kernel with CONFIG_VFIO_DEVICE_CDEV=n and tested whether that resolves it?
  3. Is there a supported way to install an older PVE kernel (e.g. 6.8) alongside 6.17 on PVE 9.1 to test?
  4. Has anyone found a working iommufd-based QEMU argument syntax that gets past the VFIO_DEVICE_BIND_IOMMUFD ENODEV failure for Blackwell?

Any help, pointers, or working configurations would be greatly appreciated. Happy to provide any additional diagnostic output needed.


Thanks!
 
kernel 6.17 changed ... something!
chatbots suggest that this was "Unity Map" for whatever that's worth.
The hint for me was "Firmware has requested this device have a 1:1 IOMMU mapping" in journalctl -k

We have similar Gigabyte boards (mine is AMD though) and the correct setting was hidden.
BIOS >> Settings / AMD CBS / NBIO Common Options /
I had to change IOMMU from Automatic to Enabled
which then revealed
- "Kernel DMA Protection Indicator" which I changed to Disabled
and
- "Pre-boot DMA Protection" which i left Enabled

i'm worried that I just overrode an important kernel level 1:1 IOMMU mapping security feature, but at least in my mental map of what IOMMU needs to do, I think 1:1 IOMMU is probably incompatible with VFIO as a concept...
 
I faced with an issue as well on the kernel 6.17 and NVIDIA A100. So, I really believe there are a lot of significant changes in the kernel 6.17 for virtualization of GPU.
My idea to revert to kernel 6.8 or 6.14 for now.
In case there are no real use cases to use 6.17.
Even on my home PC I noticed problems with GPU in wine which decrease the performance in Linux mint. So, I would step back from the kernel 6.17 it is not looking good for now to me.