RTX Pro 6000 Passthrough

NickM8 · Aug 16, 2025

I have been attempting to passthrough my RTX Pro 6000 now for a week and for the life of me can't seem to get it to work. Is there anyone that can give me some guidance as after two weeks I think I am ready to throw in the towel. I have:

Hardware:

Wrx90e Motherboard
Threadripper Pro 9775
RTX a6000
RTX T1000
RTX Pro 6000

Proxmox:

proxmox-ve: 8.4.0
pve-manager: 8.4.9 (649acf70aab54798)
Kernel: 6.14.8-2-bpo12-pve
Root FS: ZFS

BIOS/UEFI Settings:

IOMMU: Enabled
SV-IOV: Enabled
Above 4G Decoding: Not an option. Apparently set to enabled when booted in UEFI
Resizable BAR (ReBAR): Disabled
CSM: Disabled

Note: Firstly I have removed the T1000 and the A6000 just to try a straight forward pass through but I have the same errors so put everything back in.

I have been binding the A6000 and the Pro 6000 with vfio. I have tried all the settings in the pcie for the VM but no combination works. I don't know what these errors mean. Please let me know if there is anything else that might help me understand what is going wrong.

In the windows VM the Display in device manager shows a Code: 43 error.

Latest dmesg grep (NVRM|Xid|vfio):

Code:

[    6.558390] VFIO - User Level meta-driver version: 0.3
[    6.564948] vfio-pci 0000:e1:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    6.612641] vfio_pci: add [10de:2230[ffffffff:ffffffff]] class 0x000000/00000000
[    6.612652] vfio_pci: add [10de:1aef[ffffffff:ffffffff]] class 0x000000/00000000
[    6.612661] vfio_pci: add [10de:2bb1[ffffffff:ffffffff]] class 0x000000/00000000
[    6.634647] vfio-pci 0000:e1:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
[    6.634760] vfio-pci 0000:e1:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    6.661402] vfio-pci 0000:01:00.0: Enabling HDA controller
[    6.685402] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1117.955385] NVRM: GPU 0000:e1:00.0 is already bound to vfio-pci.
[ 1117.958700] NVRM: GPU 0000:01:00.0 is already bound to vfio-pci.
[ 1118.010226] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 1118.010230] NVRM: This can occur when another driver was loaded and
               NVRM: obtained ownership of the NVIDIA device(s).
[ 1118.010232] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[ 1118.010235] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  580.65.06  Release Build  (dvs-builder@U22-I3-AF03-09-1)  Sun Jul 27 06:54:38 UTC 2025
[ 1165.453340] NVRM: GPU 0000:e1:00.0 is already bound to vfio-pci.
[ 1165.456489] NVRM: GPU 0000:01:00.0 is already bound to vfio-pci.
[ 1165.500600] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 1165.500603] NVRM: This can occur when another driver was loaded and
               NVRM: obtained ownership of the NVIDIA device(s).
[ 1165.500605] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[ 1165.500607] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  580.65.06  Release Build  (dvs-builder@U22-I3-AF03-09-1)  Sun Jul 27 06:54:38 UTC 2025
[ 1207.692856] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1207.692866] vfio-pci 0000:01:00.0: resetting
[ 1207.835494] vfio-pci 0000:01:00.0: reset done
[ 1210.589086] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1210.589121] vfio-pci 0000:01:00.0: enabling device (0000 -> 0002)
[ 1210.589254] vfio-pci 0000:01:00.0: resetting
[ 1210.691446] vfio-pci 0000:01:00.0: reset done
[ 1210.730654] vfio-pci 0000:01:00.0: resetting
[ 1211.107427] vfio-pci 0000:01:00.0: reset done
[ 1233.550302] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 1233.550305] vfio-pci 0000:01:00.0:   device [10de:2bb1] error status/mask=00002000/00000000
[ 1233.550308] vfio-pci 0000:01:00.0:    [13] NonFatalErr        
[ 1233.550335] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 1233.550337] vfio-pci 0000:01:00.0:   device [10de:2bb1] error status/mask=00002000/00000000
[ 1233.550339] vfio-pci 0000:01:00.0:    [13] NonFatalErr        
[ 1717.595693] vfio-pci 0000:e1:00.0: resetting
[ 1717.702484] vfio-pci 0000:e1:00.0: reset done
[ 1720.490788] vfio-pci 0000:e1:00.0: resetting
[ 1720.598455] vfio-pci 0000:e1:00.0: reset done
[ 1720.622178] vfio-pci 0000:e1:00.1: enabling device (0000 -> 0002)
[ 1720.654198] vfio-pci 0000:e1:00.0: resetting
[ 1720.654237] vfio-pci 0000:e1:00.1: resetting
[ 1720.838252] vfio-pci 0000:e1:00.0: reset done
[ 1720.838294] vfio-pci 0000:e1:00.1: reset done
[ 1720.839616] vfio-pci 0000:e1:00.0: resetting
[ 1720.942444] vfio-pci 0000:e1:00.0: reset done
[ 1734.729179] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1734.729188] vfio-pci 0000:01:00.0: resetting
[ 1734.830040] vfio-pci 0000:01:00.0: reset done
[ 1737.703003] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1737.703112] vfio-pci 0000:01:00.0: resetting
[ 1737.807005] vfio-pci 0000:01:00.0: reset done
[ 1737.819546] vfio-pci 0000:01:00.0: resetting
[ 1738.198009] vfio-pci 0000:01:00.0: reset done
[ 1802.839874] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 1802.839877] vfio-pci 0000:01:00.0:   device [10de:2bb1] error status/mask=0000a000/00000000
[ 1802.839880] vfio-pci 0000:01:00.0:    [13] NonFatalErr        
[ 1802.839885] vfio-pci 0000:01:00.0:    [15] HeaderOF           
[ 5059.746580] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 5059.746602] vfio-pci 0000:01:00.0: resetting
[ 5059.850568] vfio-pci 0000:01:00.0: reset done
[ 5062.833462] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 5062.833592] vfio-pci 0000:01:00.0: resetting
[ 5062.939532] vfio-pci 0000:01:00.0: reset done
[ 5062.952332] vfio-pci 0000:01:00.0: resetting
[ 5063.338533] vfio-pci 0000:01:00.0: reset done
[ 5083.733478] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[ 5083.733481] vfio-pci 0000:01:00.0:   device [10de:2bb1] error status/mask=0000a000/00000000
[ 5083.733487] vfio-pci 0000:01:00.0:    [13] NonFatalErr        
[ 5083.733489] vfio-pci 0000:01:00.0:    [15] HeaderOF           
[73806.075382] vfio-pci 0000:01:00.0: Enabling HDA controller
[73806.075397] vfio-pci 0000:01:00.0: resetting
[73806.178887] vfio-pci 0000:01:00.0: reset done
[73809.066041] vfio-pci 0000:01:00.0: Enabling HDA controller
[73809.066153] vfio-pci 0000:01:00.0: resetting
[73809.169855] vfio-pci 0000:01:00.0: reset done
[73809.182181] vfio-pci 0000:01:00.0: resetting
[73809.561860] vfio-pci 0000:01:00.0: reset done
[73826.493651] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
[73826.493653] vfio-pci 0000:01:00.0:   device [10de:2bb1] error status/mask=0000a000/00000000
[73826.493656] vfio-pci 0000:01:00.0:    [13] NonFatalErr        
[73826.493661] vfio-pci 0000:01:00.0:    [15] HeaderOF

lspci -vvv -s 01:00.0 (GB202GL): (highlights)

Code:

01:00.0 3D controller: NVIDIA Corporation GB202GL [RTX PRO 6000 Blackwell Workstation Edition] (rev a1)
    LnkSta: Speed 32GT/s, Width x16
    AER: UESta: ... UnsupReq+ ...
         CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
    Kernel driver in use: vfio-pci

PCIe / IOMMUSlots → BDF map (from dmidecode -t slot):

Code:

PCIEx16(G5)_3 → 0000:01:00.0 (RTX PRO 6000 target)
PCIEx16(G5)_1 → 0000:e1:00.0 / .1 (RTX A6000 + HDA)
PCIEx16(G5)_7 → 0000:02:00.0 / .1 (RTX T1000 + HDA; host GPU)

IOMMU groups (relevant):

Code:

01:00.0 -> group 30
02:00.0, 02:00.1 -> group 31
e1:00.0, e1:00.1 -> group 9

NVRM (expected) notes when loading NVIDIA open driver:

Code:

NVRM: GPU 0000:01:00.0 is already bound to vfio-pci.
NVRM: No NVIDIA devices probed.

VM Config (/etc/pve/qemu-server/101.conf)

Code:

#cpu%3A host,hidden=1,hv-vendor-id=proxmox
bios: ovmf
boot: order=ide0;ide2
cores: 8
cpu: host
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:01:00,pcie=1
ide0: local-zfs:vm-101-disk-1,size=250G
ide2: local:iso/virtio-win.iso,media=cdrom,size=709474K
machine: q35
memory: 65536
meta: creation-qemu=9.0.2,ctime=1734027844
name: VM02
net0: e1000=BC:24:11:ED:C5:16,bridge=vmbr0
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=816bc79b-f944-49a8-9641-f1a46c321704
sockets: 1
tpmstate0: local-zfs:vm-101-disk-2,size=4M,version=v2.0
vga: none
vmgenid: 6505d925-2057-4886-8532-ecab796421ee

Any ideas what could be causing these PCIe bus errors? Any guidance would be very helpful. Thank you

dcsapak · Aug 18, 2025

Hi,

first, to clarify, do you want/use any of the gpus with vGPU ? I ask because, using pre ada lovelace cards with ada+ (including blackwell) drivers only work in one specific scenario: if you want to use the pre ada cards with vgpu and the ada+ as full passthrough. Wen you want that you'd have to install the 'proprietary' nvidia kernel module instead of the (default if you have an ada+ card) open source kernel modules. Also note that only the RTX PRO 6000 Blackwell server edition is supported by vgpu (not the workstation edition).

if you don't want to use vgpu, which driver did you try to install in the guest?

Could you maybe also try once with a linux vm? just to rule out any issues that windows might introduce.

Interesting would be then: the full output of 'dmesg' from the host and the guest, maybe the journal from the host, and the exact driver versions of the nvidia driver

NickM8 said:
Any ideas what could be causing these PCIe bus errors? Any guidance would be very helpful. Thank you

did you check the mainboards bios for things like ARI (alternative routing id) or AER (advanced error recovery) ?
the errors seem to be transient and not fatal (NonFatalErr) so maybe they're just a red herring.

guruevi · Aug 18, 2025

Resizable BAR should be enabled, not disabled. Also memory ballooning should be disabled. If you’re doing passthrough, the you don’t want/need the NVIDIA drivers at all in the host system. But that doesn’t seem to be the problem as the Windows VM discovered the GPU.

Code 43 just means the Windows driver did not recognize this as a valid GPU or had issues restarting the PCIe bus, common issues on AMD is that you may need to patch in the VBIOS or that your IOMMU group contains other devices you haven’t passed through or you’re using the GeForce driver instead of Quadro/Pro/Enterprise drivers. If I’m not mistaken, the RTX A6000 Pro has 96GB of memory, so you need at least as much memory in your VM (preferably more, like 128G)

NickM8 · Aug 18, 2025

dcsapak said:
Hi,

first, to clarify, do you want/use any of the gpus with vGPU ? I ask because, using pre ada lovelace cards with ada+ (including blackwell) drivers only work in one specific scenario: if you want to use the pre ada cards with vgpu and the ada+ as full passthrough. Wen you want that you'd have to install the 'proprietary' nvidia kernel module instead of the (default if you have an ada+ card) open source kernel modules. Also note that only the RTX PRO 6000 Blackwell server edition is supported by vgpu (not the workstation edition).

if you don't want to use vgpu, which driver did you try to install in the guest?

Could you maybe also try once with a linux vm? just to rule out any issues that windows might introduce.

Interesting would be then: the full output of 'dmesg' from the host and the guest, maybe the journal from the host, and the exact driver versions of the nvidia driver

did you check the mainboards bios for things like ARI (alternative routing id) or AER (advanced error recovery) ?
the errors seem to be transient and not fatal (NonFatalErr) so maybe they're just a red herring.

Thank you Dominik.

I don't want to use any of the GPUs with vGPU, I particularly want to use it for pass through, although I do have the drivers installed on the host. Ideally I would also like to have a GPU for the host which but I can always just pass that through too. I have passed through the a6000 to another VM to test and that has been working fine as vfio.

I installed the open source kernel model. In the guest I have tried both the driver from their website and the driver from their portal. Neither have worked. Currently its 580.88.

Guruevi: I tried turning ballooning off and allocating more memory too. Plus I have tried with both REBAR on and off.

I am not great with Linux or Proxmox but I manage to create a vm with linux on it assigned the vfio to the vm..

Code:

[ 1161.872573] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1161.872584] vfio-pci 0000:01:00.0: resetting
[ 1161.973546] vfio-pci 0000:01:00.0: reset done
[ 1162.385724] tap102i0: entered promiscuous mode
[ 1162.448047] vmbr0: port 2(fwpr102p0) entered blocking state
[ 1162.448052] vmbr0: port 2(fwpr102p0) entered disabled state
[ 1162.448066] fwpr102p0: entered allmulticast mode
[ 1162.448107] fwpr102p0: entered promiscuous mode
[ 1162.448149] vmbr0: port 2(fwpr102p0) entered blocking state
[ 1162.448151] vmbr0: port 2(fwpr102p0) entered forwarding state
[ 1162.457440] fwbr102i0: port 1(fwln102i0) entered blocking state
[ 1162.457445] fwbr102i0: port 1(fwln102i0) entered disabled state
[ 1162.457465] fwln102i0: entered allmulticast mode
[ 1162.459254] fwln102i0: entered promiscuous mode
[ 1162.459316] fwbr102i0: port 1(fwln102i0) entered blocking state
[ 1162.459318] fwbr102i0: port 1(fwln102i0) entered forwarding state
[ 1162.469005] fwbr102i0: port 2(tap102i0) entered blocking state
[ 1162.469009] fwbr102i0: port 2(tap102i0) entered disabled state
[ 1162.469025] tap102i0: entered allmulticast mode
[ 1162.469101] fwbr102i0: port 2(tap102i0) entered blocking state
[ 1162.469102] fwbr102i0: port 2(tap102i0) entered forwarding state
[ 1162.610672] vfio-pci 0000:01:00.0: Enabling HDA controller
[ 1162.610810] vfio-pci 0000:01:00.0: resetting
[ 1162.717550] vfio-pci 0000:01:00.0: reset done
[ 1162.735581] vfio-pci 0000:01:00.0: resetting
[ 1163.109539] vfio-pci 0000:01:00.0: reset done
[ 1412.310417]  zd32: p1 p2 p3
[ 1412.398320] tap102i0: left allmulticast mode
[ 1412.398342] fwbr102i0: port 2(tap102i0) entered disabled state
[ 1412.401018] fwbr102i0: port 1(fwln102i0) entered disabled state
[ 1412.401090] vmbr0: port 2(fwpr102p0) entered disabled state
[ 1412.401216] fwln102i0 (unregistering): left allmulticast mode
[ 1412.401220] fwln102i0 (unregistering): left promiscuous mode
[ 1412.401222] fwbr102i0: port 1(fwln102i0) entered disabled state
[ 1412.401441] fwpr102p0 (unregistering): left allmulticast mode
[ 1412.401443] fwpr102p0 (unregistering): left promiscuous mode

guruevi · Aug 18, 2025

Ballooning needs to be off and rebar needs to be on and you need enough memory to properly address the GPU and you cannot have the host also have nVIDIA drivers activate the card during boot (at least not on AMD), they don’t necessarily need to be bound to the vfio drivers manually, that happens automatically and also all your devices need to be in insular VFIO groups like on Intel boards, not on shared PCIe busses like many AMD boards.

Once that is all true, you can continue troubleshooting any guest issues.

Try with a Linux guest first, see if you can get a Linux guest to activate the driver, that will be much more reliable and give you more info than a non-descript code. Use the closed source drivers, the open source drivers aren’t nearly as mature especially when it comes to CUDA and virtualization, likewise don’t use the Microsoft/GeForce drivers, they do not work with pro cards.

uzumo · Aug 19, 2025

I think it would be better to write the following settings

cat /etc/kernel/cmdline
cat /etc/default/grub
cat /etc/modules
cat /etc/modprobe.d/*
cat /etc/pve/qemu-server/<vmid>.conf
cat /var/lib/vz/snippets/*

dcsapak · Aug 19, 2025

guruevi said:
you need at least as much memory in your VM (preferably more, like 128G)

i don't think this is correct. I regularly passed through gpus with 24+ GiB memory in vms with less than 8 GiB, not sure why you would think that ? Is that documented anywhere ?

guruevi said:
Ballooning needs to be off and rebar needs to be on

while both might influence it, in general it's not necessary to set these parameters. balloning will not work with pci passthrough (since all the guest memory needs to be mapped to the vm) but the balloning driver can be installed nonetheless...

NickM8 said:
Thank you Dominik.

I don't want to use any of the GPUs with vGPU, I particularly want to use it for pass through, although I do have the drivers installed on the host. Ideally I would also like to have a GPU for the host which but I can always just pass that through too. I have passed through the a6000 to another VM to test and that has been working fine as vfio.

I installed the open source kernel model. In the guest I have tried both the driver from their website and the driver from their portal. Neither have worked. Currently its 580.88.

in that case, i'd remove the vgpu driver completely and try first without any nvidia driver installed. make sure nouveau is blacklisted so it does not get loaded for those cards.

then please try passing through the rtx pro 6000 to a linux guest and try to install the reguilar nvidia linux driver there. then attach the complete dmesg output from the host + guest here. You can of course try a fresh windows guest again and install the regular nvidia driver (non vgpu) there.

hawxxer · Sep 3, 2025

Any updates on that? I have a RTX Pro 6000 MaxQ which kind of work in the vm but no display output. Works under Ubuntu (580.82.07.run driver from website and selecting the MIT version, proprietary not working) and Windows -> Normal 580 Driver from Website no error Codes but also no Display Output. Nvidia Control Center shows the correct display name but as said no output, display is on but black image (I see the backlight)
System is Proxmox 9.0.6 with zfs install

Code:

root@titan:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on initcall_blacklist=sysfb_init vfio_iommu_type1.allow_unsafe_interrupts=1 vfio-pci.ids=10de:2bb4,10de:22e8

Code:

root@titan:~# cat /etc/modules
# /etc/modules is obsolete and has been replaced by /etc/modules-load.d/.
# Please see modules-load.d(5) and modprobe.d(5) for details.
#
# Updating this file still works, but it is undocumented and unsupported.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Code:

root@titan:~# ls /etc/modprobe.d/
blacklist.conf  intel-microcode-blacklist.conf  pve-blacklist.conf  vfio.conf  zfs.conf
root@titan:~# cat /etc/modprobe.d/*
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel
blacklist amd76x_edac
blacklist vga16fb
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
# The microcode module attempts to apply a microcode update when
# it autoloads.  This is not always safe, so we block it by default.
blacklist microcode
# This file contains a list of modules which are not supported by Proxmox VE

# nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
options vfio_iommu_type1 allow_unsafe_interrupts=1
options kvm ignore_msrs=1 report_ignored_msrs=0
options vfio-pci ids=10de:2bb4,10de:22e8 disable_vga=1 disable_idle_d3=1
options zfs zfs_arc_max=13477347328

Code:

root@titan:~# cat /etc/pve/qemu-server/1000.conf #WINDOWS
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;usb0
cores: 16
cpu: host
efidisk0: local-zfs:vm-1000-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:01:00,pcie=1
machine: pc-q35-10.0
memory: 65536
meta: creation-qemu=10.0.2,ctime=1756918785
name: WIN-TEMPLATE
net0: virtio=xx:xx:11:xx:21:F9,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
parent: IDLE
scsi0: local-zfs:vm-1000-disk-1,cache=writeback,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=ab850xxb-1b2d-4xx65-axx7-316f27xxxxxx
sockets: 1
tpmstate0: local-zfs:vm-1000-disk-2,size=4M,version=v2.0
usb0: host=8564:1000,usb3=1
vga: none
vmgenid: efce54da-2664-xx8e-xx4b-afxxxxxxxxx

The other files I guess are not important as i use systemd boot and also snippets is empty

dcsapak · Sep 4, 2025

can you post the output of

Code:

lspci
dmesg

from the host?
and

Code:

nvidia-smi -q

from the guest?

since you can install the driver and nvidia-smi (in the guest) shows the card, it's a different issue than the OP, so it might be better to open a new thread

hawxxer · Sep 4, 2025

I guess that makes sense, I created an own thread for that issue and added the information there, thank you.

Search

Search

RTX Pro 6000 Passthrough

NickM8

New Member

dcsapak

Proxmox Staff Member

guruevi

Well-Known Member

NickM8

New Member

guruevi

Well-Known Member

uzumo

Active Member

dcsapak

Proxmox Staff Member

hawxxer

Member

dcsapak

Proxmox Staff Member

hawxxer

Member

We value your privacy