RTX 5080 (Blackwell GB203) VFIO Passthrough Failing on PVE 9.1 / Kernel 6.17 — Has Anyone Solved This?

liferollson91

New Member
Feb 27, 2026
5
0
1
Hi everyone,


I've spent a significant amount of time trying to get GPU passthrough working for an NVIDIA RTX 5080 on a fresh PVE 9.1 install and have hit a wall that I believe is a kernel-level incompatibility specific to PVE 9.1 / kernel 6.17. I'm posting here to share my findings, ask if anyone has solved this, and point to a bug report I've filed.




My Hardware​


  • CPU: Intel Core Ultra 9 285K (Arrow Lake, LGA 1851)
  • Motherboard: GIGABYTE Z890 AORUS MASTER AI TOP (latest BIOS)
  • GPU: GIGABYTE WINDFORCE GeForce RTX 5080 16GB (Blackwell GB203)
  • PVE Version: 9.1.6
  • Kernel: 6.17.9-1-pve
  • QEMU: 10.1.2



The Error​


When attempting to start a Windows 11 VM with the RTX 5080 passed through, QEMU exits immediately with:

Code:
error writing '1' to '/sys/bus/pci/devices/0000:02:00.0/reset': Inappropriate ioctl for device
failed to reset PCI device '0000:02:00.0', but trying to continue as not all devices need a reset
kvm: -device vfio-pci,host=0000:02:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:02:00.0: error getting device from group 14: No such device
Verify all devices in group 14 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1




What I've Already Verified (Everything Looks Correct)​


  • intel_iommu=on is set in kernel cmdline, IOMMU confirmed enabled in dmesg
  • Both 0000:02:00.0 (GPU) and 0000:02:00.1 (HDMI audio) are bound to vfio-pci
  • /dev/vfio/14 exists and nothing holds it open (lsof /dev/vfio/14 returns nothing)
  • IOMMU group 14 contains only the two GPU devices — no contamination
  • Group type is DMA-FQ (without iommu=pt)
  • All other hostpci devices are in separate groups with correct bindings
  • Tried with and without iommu=pt, rombar on/off, vga: none, reduced memory — no change



Root Cause I've Identified​


Kernel 6.17.9-1-pve is built with CONFIG_VFIO_DEVICE_CDEV=y, which causes the VFIO subsystem to use iommufd as its primary backend instead of the legacy vfio_iommu_type1 group backend. You can confirm this yourself:

Bash:
grep -i vfio_device_cdev /boot/config-$(uname -r)
# Returns: CONFIG_VFIO_DEVICE_CDEV=y

lsmod | grep vfio
# vfio_iommu_type1   49152  0     ← use count 0, not active
# iommufd           126976  1 vfio ← iommufd is the active backend

Proxmox's qemu-server generates QEMU arguments using the legacy group API (host=0000:02:00.0), but in kernel 6.17 the device is registered to the cdev/iommufd interface. This causes VFIO_GROUP_GET_DEVICE_FD to return ENODEV even though everything else is correctly configured.




What I've Tried to Fix It​


Attempt 1 — Patch PCI.pm to use iommufd API:I patched /usr/share/perl5/PVE/QemuServer/PCI.pm to inject -object iommufd,id=iommufd0 and replace host= with sysfsdev= + iommufd=iommufd0. QEMU then gets further and reaches the iommufd bind step, but fails with:

Code:
vfio 0000:02:00.0: error bind device fd=65 to iommufd=64: No such device

Attempt 2 — NVIDIA driver pre-initialization:Based on community reports that Blackwell GPUs require NVIDIA firmware initialization before VFIO can claim them, I installed nvidia-kernel-open-dkms 590.48.01, let nvidia load at boot to initialize the GPU, then unloaded it so vfio-pci could claim it. The rebind works correctly, but the iommufd bind still fails with the same ENODEV.


Attempt 3 — Removing iommu=pt:The group type changes from identity to DMA-FQ without iommu=pt, which should be compatible with iommufd. Still fails.


In all cases, the VFIO_DEVICE_BIND_IOMMUFD kernel ioctl returns ENODEV for this specific Blackwell GB203 device.




What Works (For Reference)​


Multiple community members confirm RTX 5000 series passthrough works on PVE 8.x with kernel 6.8. See these threads for reference:





Bug Report Filed​


I've filed a bug report with Proxmox:https://bugzilla.proxmox.com/show_bug.cgi?id=7374


The report proposes that qemu-server needs to generate iommufd-aware QEMU arguments on kernels with CONFIG_VFIO_DEVICE_CDEV=y, and that there may also be a kernel-level issue with VFIO_DEVICE_BIND_IOMMUFD and Blackwell PCIe Legacy Endpoint devices specifically.




Questions for the Community​


  1. Has anyone successfully passed through an RTX 5080 (or any RTX 5000 series card) on PVE 9.1 specifically? If so, what kernel and configuration did you use?
  2. Has anyone compiled a custom PVE kernel with CONFIG_VFIO_DEVICE_CDEV=n and tested whether that resolves it?
  3. Is there a supported way to install an older PVE kernel (e.g. 6.8) alongside 6.17 on PVE 9.1 to test?
  4. Has anyone found a working iommufd-based QEMU argument syntax that gets past the VFIO_DEVICE_BIND_IOMMUFD ENODEV failure for Blackwell?

Any help, pointers, or working configurations would be greatly appreciated. Happy to provide any additional diagnostic output needed.


Thanks!
 
kernel 6.17 changed ... something!
chatbots suggest that this was "Unity Map" for whatever that's worth.
The hint for me was "Firmware has requested this device have a 1:1 IOMMU mapping" in journalctl -k

We have similar Gigabyte boards (mine is AMD though) and the correct setting was hidden.
BIOS >> Settings / AMD CBS / NBIO Common Options /
I had to change IOMMU from Automatic to Enabled
which then revealed
- "Kernel DMA Protection Indicator" which I changed to Disabled
and
- "Pre-boot DMA Protection" which i left Enabled

i'm worried that I just overrode an important kernel level 1:1 IOMMU mapping security feature, but at least in my mental map of what IOMMU needs to do, I think 1:1 IOMMU is probably incompatible with VFIO as a concept...
 
I faced with an issue as well on the kernel 6.17 and NVIDIA A100. So, I really believe there are a lot of significant changes in the kernel 6.17 for virtualization of GPU.
My idea to revert to kernel 6.8 or 6.14 for now.
In case there are no real use cases to use 6.17.
Even on my home PC I noticed problems with GPU in wine which decrease the performance in Linux mint. So, I would step back from the kernel 6.17 it is not looking good for now to me.
 
As of now, I'm going to try booting into a Windows 11 to go environment from a USB drive, then see if I can update the firmware on the nvidia card. If I do get a firmware update installed, I'll then try to get the windows 11 VM in proxmox to boot. If that fails, I'll try poking around in the BIOS for something like you described aubreybailey.
 
i worked through this issue with PVE 9 and the RTX 50 series. Initially passthrough was working out of the box then I applied updates to 9.0.3 and it broke. I struggled for a couple of days to try to go back to the old kernel and couldn't get it working so I ended up compiling a new kernel from sources on github. I am planning on building a new PVE 9.1 cluster to scale out GPU passthrough (have RTX 5090, 5080 and the rest of the series) . I finally used claude code to document the kernel mods and posted it to github https://github.com/szoran53/proxmox-kernel-6.17-gpu-passthrough to help others out in the community. This kernel has been rock solid (there are still some quirks with passthrough like intermittent black console screens, PVE snapshots work but without memory state, and the nvidia drivers are touchy on the VMs). Hope this helps and I will test this kernel patch with PVE 9.1 and post the results and also improve the patch.
 
As of now, I'm going to try booting into a Windows 11 to go environment from a USB drive, then see if I can update the firmware on the nvidia card. If I do get a firmware update installed, I'll then try to get the windows 11 VM in proxmox to boot. If that fails, I'll try poking around in the BIOS for something like you described aubreybailey.
I tried the latest firmware and it didn't help with the gigabyte passthrough
 
Yeah the nvidia firmware update didn't help me with that either. That's awesome you ended up getting it mostly working by just figuring out with Claude what you needed to change in the kernel and documenting it! I'll link that in the bug ticket I have open. I had to get some work done so I ended up just abandoning proxmox for now and dual booting Windows and Linux, since I don't really need to have a windows VM running at the same time as linux. I'll probably return to this at some point.
 
Yeah the nvidia firmware update didn't help me with that either. That's awesome you ended up getting it mostly working by just figuring out with Claude what you needed to change in the kernel and documenting it! I'll link that in the bug ticket I have open. I had to get some work done so I ended up just abandoning proxmox for now and dual booting Windows and Linux, since I don't really need to have a windows VM running at the same time as linux. I'll probably return to this at some point.
i will keep you updated, hopefully passthrough will work out of the box again soon so we dont need to modify the kernel. Keep in mind that with GPU passthrough in PVE only one VM on can use the GPU at a time, you can have other VMs running that dont use that GPU . I will test GPU sharing later on. Performance for neural network training is much faster in linux / proxmox than bare metal windows (2-3X faster) with the same hardware.
 
Yeah the nvidia firmware update didn't help me with that either. That's awesome you ended up getting it mostly working by just figuring out with Claude what you needed to change in the kernel and documenting it! I'll link that in the bug ticket I have open. I had to get some work done so I ended up just abandoning proxmox for now and dual booting Windows and Linux, since I don't really need to have a windows VM running at the same time as linux. I'll probably return to this at some point.
i actually struggled through the kernel build working on my own with some assistance from grok4, it has its good days and bad days. Next build I will use claude code definately, well worth the $200 a year for the pro .
 
I'm having the same issue sadly. I don't have a "Kernel DMA Protection Indicator" setting in my bios so that doesn't fix it either. I've tried everything to get this to work. I want to use my 5080 for streaming games with sunshine and moonlight.
my setup:
GPU: Gigabyte GeForce RTX 5080, 16GB, Windforce OC
MOTHERBOARD: Gigabyte Z890 AERO G

DRIVES: 2x Lexar NM1090 Pro 2TB M.2 SSD (proxmox mirror ZFS)
CPU: Intel Core Ultra 7 265K

Kernel: Linux 6.17.13-2-pve
Proxmox: 9.1.6

Error:



error writing '1' to '/sys/bus/pci/devices/0000:02:00.0/reset': Inappropriate ioctl for device
failed to reset PCI device '0000:02:00.0', but trying to continue as not all devices need a reset
kvm: -device vfio-pci,host=0000:02:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on: vfio 0000:02:00.0: error getting device from group 15: No such device
Verify all devices in group 15 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1

Sadly @liferollson91 bug report has been closed for some strange reason, pointing to the "solution" that is not a solution.
 
Hey @berryblast. Yeah I re-commented on the bug pointing to szoran's comments and his guthub page documenting all the patches and changes he made in order to get his working. I too did not have anything like "Kernel DMA Protection" in my BIOS either.

Seriously solid work and documentation @szoran! It seems like if people are having this same bug that I reported and you're itching to make it work ASAP, you should try following szoran's documentation to compile your own kernel with those changes. Otherwise, hopefully with time these changes will get folded into the out of the box proxmox VE on a future release.
 
Hey @liferollson91,

Thank you for commenting on the bug report, I hope they will do something with it. I know @szoran made it work, sadly I'm not comfortable enough with doing kernel stuff myself. I don't understand the code and there is some "critical" stuff on my server I don't want to break. So preferably I do as little as possible on the host and do the tinkering in the VMs and LXCs :).

Hopefully it's something that can be fixed and will not be a "consumer card" issue as stated by Dominik.
 
Hey @liferollson91 i will reply on the bugzilla to Dominik but i think his issue is happening even before the issues that we were running into are happening in the kernel driver loading process . He is having IOMMU mapping grouping conflicts, the patch I wrote did the following described in this markup file . Dominik is having issues that he needs to resolve before he even deals with the issues that we are having.
 

Attachments

Hey @liferollson91 i will reply on the bugzilla to Dominik but i think his issue is happening even before the issues that we were running into are happening in the kernel driver loading process . He is having IOMMU mapping grouping conflicts, the patch I wrote did the following described in this markup file . Dominik is having issues that he needs to resolve before he even deals with the issues that we are having.
Here is a summary of the patch , the pdf is verbose :
The document you linked, titled "VFIO 1:1 IOMMU Mapping Bypass Patch," describes a specific kernel patch for Linux-based virtualization environments, such as Proxmox VE, to enable GPU passthrough for newer NVIDIA hardware.





Key Points from the Document:


  • Purpose: The patch is designed to bypass a new firmware-requested 1:1 IOMMU mapping enforcement in the VFIO (Virtual Function I/O) kernel driver.
  • Symptom: When attempting to pass through an NVIDIA Blackwell-generation GPU (specifically confirmed for the GeForce RTX 5060 and RTX 5060 Ti) to a virtual machine (VM), the VM fails to start and the system log shows an error related to the required 1:1 IOMMU mapping.
  • Root Cause: Blackwell-generation NVIDIA GPUs (GB206 and related silicon) set a firmware flag in their PCIe configuration space that requests the IOMMU to use a 1:1 (identity) mapping. The current kernel (specifically kernel 6.17 on Proxmox VE 9.x) rejects configuring the device when the iommu=pt (passthrough) kernel command-line option is used, because passthrough mode does not satisfy the kernel's check for a "unity mapping".
  • The Patch: The solution is a small code change in the drivers/vfio/pci/vfio_pci_core.c file that comments out the specific line (return -EINVAL;) responsible for rejecting the device. This allows VFIO to proceed with passthrough even without the 1:1 mapping.
  • Alternative: The document suggests first trying to remove iommu=pt from the GRUB configuration. This may satisfy the firmware requirement without needing to build a custom kernel, but it comes with a minor performance trade-off.
  • Affected Hardware: The issue is confirmed on the RTX 5060 and RTX 5060 Ti, and is expected to affect the broader Blackwell family (e.g., RTX 5070, 5080, 5090). It does not affect previous generations like the RTX 30-series or 40-series.
  • Security Warning: The document strictly advises that this patch is for test lab/homelab use only and should not be applied to production systems or those handling untrusted VMs, as it removes a kernel safety check that exists to ensure DMA coherence and memory access integrity.
 
I tightened up the documentation on the patch. I have a lot going on at work and other things but if you give me a month I will do a fresh build of the kernel on a fresh PVE 9.1 build and post the updates and prompts i used for claude code. Hang in there guys this is not rocket science.