I have been attempting to passthrough my RTX Pro 6000 now for a week and for the life of me can't seem to get it to work. Is there anyone that can give me some guidance as after two weeks I think I am ready to throw in the towel. I have:
Hardware:
Note: Firstly I have removed the T1000 and the A6000 just to try a straight forward pass through but I have the same errors so put everything back in.
I have been binding the A6000 and the Pro 6000 with vfio. I have tried all the settings in the pcie for the VM but no combination works. I don't know what these errors mean. Please let me know if there is anything else that might help me understand what is going wrong.
In the windows VM the Display in device manager shows a Code: 43 error.
Latest dmesg grep (NVRM|Xid|vfio):
lspci -vvv -s 01:00.0 (GB202GL): (highlights)
PCIe / IOMMUSlots → BDF map (from dmidecode -t slot):
IOMMU groups (relevant):
NVRM (expected) notes when loading NVIDIA open driver:
VM Config (/etc/pve/qemu-server/101.conf)
Any ideas what could be causing these PCIe bus errors? Any guidance would be very helpful. Thank you
Hardware:
- Wrx90e Motherboard
- Threadripper Pro 9775
- RTX a6000
- RTX T1000
- RTX Pro 6000
- proxmox-ve: 8.4.0
- pve-manager: 8.4.9 (649acf70aab54798)
- Kernel: 6.14.8-2-bpo12-pve
- Root FS: ZFS
- IOMMU: Enabled
- SV-IOV: Enabled
- Above 4G Decoding: Not an option. Apparently set to enabled when booted in UEFI
- Resizable BAR (ReBAR): Disabled
- CSM: Disabled
Note: Firstly I have removed the T1000 and the A6000 just to try a straight forward pass through but I have the same errors so put everything back in.
I have been binding the A6000 and the Pro 6000 with vfio. I have tried all the settings in the pcie for the VM but no combination works. I don't know what these errors mean. Please let me know if there is anything else that might help me understand what is going wrong.
In the windows VM the Display in device manager shows a Code: 43 error.
Latest dmesg grep (NVRM|Xid|vfio):
Code:
[ 6.558390] VFIO - User Level meta-driver version: 0.3
[ 6.564948] vfio-pci 0000:e1:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 6.612641] vfio_pci: add [10de:2230[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.612652] vfio_pci: add [10de:1aef[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.612661] vfio_pci: add [10de:2bb1[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.634647] vfio-pci 0000:e1:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
[ 6.634760] vfio-pci 0000:e1:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 6.661402] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 6.685402] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1117.955385] NVRM: GPU 0000:e1:00.0 is already bound to vfio-pci.
[ 1117.958700] NVRM: GPU 0000:01:00.0 is already bound to vfio-pci.
[ 1118.010226] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 1118.010230] NVRM: This can occur when another driver was loaded and
NVRM: obtained ownership of the NVIDIA device(s).
[ 1118.010232] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 1118.010235] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 580.65.06 Release Build (dvs-builder@U22-I3-AF03-09-1) Sun Jul 27 06:54:38 UTC 2025
[ 1165.453340] NVRM: GPU 0000:e1:00.0 is already bound to vfio-pci.
[ 1165.456489] NVRM: GPU 0000:01:00.0 is already bound to vfio-pci.
[ 1165.500600] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 1165.500603] NVRM: This can occur when another driver was loaded and
NVRM: obtained ownership of the NVIDIA device(s).
[ 1165.500605] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 1165.500607] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 580.65.06 Release Build (dvs-builder@U22-I3-AF03-09-1) Sun Jul 27 06:54:38 UTC 2025
[ 1207.692856] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1207.692866] vfio-pci 0000:01:00.0: resetting
[ 1207.835494] vfio-pci 0000:01:00.0: reset done
[ 1210.589086] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1210.589121] vfio-pci 0000:01:00.0: enabling device (0000 -> 0002)
[ 1210.589254] vfio-pci 0000:01:00.0: resetting
[ 1210.691446] vfio-pci 0000:01:00.0: reset done
[ 1210.730654] vfio-pci 0000:01:00.0: resetting
[ 1211.107427] vfio-pci 0000:01:00.0: reset done
[ 1233.550302] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 1233.550305] vfio-pci 0000:01:00.0: device [10de:2bb1] error status/mask=00002000/00000000
[ 1233.550308] vfio-pci 0000:01:00.0: [13] NonFatalErr
[ 1233.550335] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 1233.550337] vfio-pci 0000:01:00.0: device [10de:2bb1] error status/mask=00002000/00000000
[ 1233.550339] vfio-pci 0000:01:00.0: [13] NonFatalErr
[ 1717.595693] vfio-pci 0000:e1:00.0: resetting
[ 1717.702484] vfio-pci 0000:e1:00.0: reset done
[ 1720.490788] vfio-pci 0000:e1:00.0: resetting
[ 1720.598455] vfio-pci 0000:e1:00.0: reset done
[ 1720.622178] vfio-pci 0000:e1:00.1: enabling device (0000 -> 0002)
[ 1720.654198] vfio-pci 0000:e1:00.0: resetting
[ 1720.654237] vfio-pci 0000:e1:00.1: resetting
[ 1720.838252] vfio-pci 0000:e1:00.0: reset done
[ 1720.838294] vfio-pci 0000:e1:00.1: reset done
[ 1720.839616] vfio-pci 0000:e1:00.0: resetting
[ 1720.942444] vfio-pci 0000:e1:00.0: reset done
[ 1734.729179] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1734.729188] vfio-pci 0000:01:00.0: resetting
[ 1734.830040] vfio-pci 0000:01:00.0: reset done
[ 1737.703003] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1737.703112] vfio-pci 0000:01:00.0: resetting
[ 1737.807005] vfio-pci 0000:01:00.0: reset done
[ 1737.819546] vfio-pci 0000:01:00.0: resetting
[ 1738.198009] vfio-pci 0000:01:00.0: reset done
[ 1802.839874] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 1802.839877] vfio-pci 0000:01:00.0: device [10de:2bb1] error status/mask=0000a000/00000000
[ 1802.839880] vfio-pci 0000:01:00.0: [13] NonFatalErr
[ 1802.839885] vfio-pci 0000:01:00.0: [15] HeaderOF
[ 5059.746580] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 5059.746602] vfio-pci 0000:01:00.0: resetting
[ 5059.850568] vfio-pci 0000:01:00.0: reset done
[ 5062.833462] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 5062.833592] vfio-pci 0000:01:00.0: resetting
[ 5062.939532] vfio-pci 0000:01:00.0: reset done
[ 5062.952332] vfio-pci 0000:01:00.0: resetting
[ 5063.338533] vfio-pci 0000:01:00.0: reset done
[ 5083.733478] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 5083.733481] vfio-pci 0000:01:00.0: device [10de:2bb1] error status/mask=0000a000/00000000
[ 5083.733487] vfio-pci 0000:01:00.0: [13] NonFatalErr
[ 5083.733489] vfio-pci 0000:01:00.0: [15] HeaderOF
[73806.075382] vfio-pci 0000:01:00.0: Enabling HDA controller
[73806.075397] vfio-pci 0000:01:00.0: resetting
[73806.178887] vfio-pci 0000:01:00.0: reset done
[73809.066041] vfio-pci 0000:01:00.0: Enabling HDA controller
[73809.066153] vfio-pci 0000:01:00.0: resetting
[73809.169855] vfio-pci 0000:01:00.0: reset done
[73809.182181] vfio-pci 0000:01:00.0: resetting
[73809.561860] vfio-pci 0000:01:00.0: reset done
[73826.493651] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[73826.493653] vfio-pci 0000:01:00.0: device [10de:2bb1] error status/mask=0000a000/00000000
[73826.493656] vfio-pci 0000:01:00.0: [13] NonFatalErr
[73826.493661] vfio-pci 0000:01:00.0: [15] HeaderOF
lspci -vvv -s 01:00.0 (GB202GL): (highlights)
Code:
01:00.0 3D controller: NVIDIA Corporation GB202GL [RTX PRO 6000 Blackwell Workstation Edition] (rev a1)
LnkSta: Speed 32GT/s, Width x16
AER: UESta: ... UnsupReq+ ...
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
Kernel driver in use: vfio-pci
PCIe / IOMMUSlots → BDF map (from dmidecode -t slot):
Code:
PCIEx16(G5)_3 → 0000:01:00.0 (RTX PRO 6000 target)
PCIEx16(G5)_1 → 0000:e1:00.0 / .1 (RTX A6000 + HDA)
PCIEx16(G5)_7 → 0000:02:00.0 / .1 (RTX T1000 + HDA; host GPU)
IOMMU groups (relevant):
Code:
01:00.0 -> group 30
02:00.0, 02:00.1 -> group 31
e1:00.0, e1:00.1 -> group 9
NVRM (expected) notes when loading NVIDIA open driver:
Code:
NVRM: GPU 0000:01:00.0 is already bound to vfio-pci.
NVRM: No NVIDIA devices probed.
VM Config (/etc/pve/qemu-server/101.conf)
Code:
#cpu%3A host,hidden=1,hv-vendor-id=proxmox
bios: ovmf
boot: order=ide0;ide2
cores: 8
cpu: host
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:01:00,pcie=1
ide0: local-zfs:vm-101-disk-1,size=250G
ide2: local:iso/virtio-win.iso,media=cdrom,size=709474K
machine: q35
memory: 65536
meta: creation-qemu=9.0.2,ctime=1734027844
name: VM02
net0: e1000=BC:24:11:ED:C5:16,bridge=vmbr0
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=816bc79b-f944-49a8-9641-f1a46c321704
sockets: 1
tpmstate0: local-zfs:vm-101-disk-2,size=4M,version=v2.0
vga: none
vmgenid: 6505d925-2057-4886-8532-ecab796421ee
Any ideas what could be causing these PCIe bus errors? Any guidance would be very helpful. Thank you
Last edited: