GPU Passthrough fails on AMD Ryzen 9 9950X (Zen 5) / RTX 5070 Ti - Kernel 6.14.8 - IOMMU issues

jay_

New Member
Sep 8, 2025
1
0
1
Hello Proxmox Community and Team,

I am experiencing a complete failure of GPU passthrough with my new hardware setup and have exhausted all standard troubleshooting steps. The issue is strongly isolated to the PCIe passthrough itself, and I suspect a kernel-level bug due to an additional critical observation.

1. Proxmox Version:

plaintext
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.3 (running version: 9.0.3/025864202ebb6109)
pve-qemu-kvm: 10.0.2-4
qemu-server: 9.0.16
2. Host Hardware:

  • CPU: AMD Ryzen 9 9950X (Granite Ridge, Zen 5, AM5)
  • GPU: NVIDIA GeForce RTX 5070 Ti (ASUS TUF model)
  • Motherboard: ASUS TUF GAMING B650-PLUS WIFI
  • RAM: 32 GB DDR5
3. Problem Description:
When the GPU (01:00.0 and its audio function 01:00.1) is added to the VM, the VM starts but becomes completely unresponsive. It does not boot far enough to bring up the network interface, resulting in ssh: connect to host [...] port 22: Connection timed out. The NoVNC console also fails to initialize, throwing a TASK ERROR: Failed to run vncproxy / qmp command 'set_password' failed error.

Crucially: The same VM works flawlessly via SSH when the GPU is not passed through. The problem is 100% tied to the PCI passthrough configuration.

4. Steps Already Taken (All unsuccessful):

  • ✅ IOMMU enabled in GRUB: amd_iommu=on iommu=pt
  • ✅ vfio-pci drivers bound to the GPU IDs via /etc/modprobe.d/vfio.conf
  • ✅ Added video=efifb:off to kernel parameters
  • ✅ VM configured with machine: q35, BIOS: OVMF (UEFI)
  • ✅ PCI device flags: pcie=1,rombar=0,x-vga=1
  • ✅ Used a known-good VBIOS dump via romfile=/path/to/rom.rom
  • ✅ Configured a permanent EFI Disk
  • ✅ Hypervisor concealment: args: -cpu 'host,hv_vendor_id=null,-hypervisor' in VM config
5. Critical Symptom / Potential Bug:
The standard command to check IOMMU group isolation returns corrupted and nonsensical output on my system. This suggests a potential underlying issue with the IOMMU subsystem in this specific kernel/hardware combination.

Command executed:

bash
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done
Corrupted output example: The output is long and repetitive, showing impossibly high group numbers (e.g., Group 40+) and repeating bridge devices instead of listing actual hardware like the 01:00.0 GPU.

6. Request for Help:
Has anyone successfully achieved passthrough on a similar very new AM5/Raphael and RTX 90xx setup? Is this failure mode with the corrupted lspci output known to the developers? Could this be a kernel bug introduced in the 6.14.x series?

Any guidance or insight would be greatly appreciated. Thank you for your time and help.

Best regards,
Julian
 
Last edited: