DGX B300 GPU passthrough on Proxmox: PCI device visible but NVIDIA driver cannot initialize

ugurbas

New Member
Jan 19, 2026
3
0
1
Hello everyone,

I’m trying to understand the correct expectations and support boundaries for GPU passthrough on DGX / HGX B300 platforms when using Proxmox (KVM/QEMU).

Environment:
- Proxmox VE (9.1.4 )
- Machine type: q35
- BIOS: OVMF (UEFI)
- IOMMU enabled, vfio-pci used
- Host platform: NVIDIA DGX / HGX B300 (Blackwell)
- Guest OS: Ubuntu 24.04
- NVIDIA proprietary driver installed in guest

Test scenario:
- Assign a single B300 GPU to a VM using vfio-pci
- VM boots normally
- GPU is visible inside the VM via lspci

Inside the VM:
Code:
lspci -vv -s 01:00.0

01:00.0 3D controller: NVIDIA Corporation Device 3182 (rev a1)
Control: I/O- Mem+ BusMaster-
Region 0: Memory at 2000000000 (64-bit, prefetchable) [size=64M]
Region 4: Memory at 2004000000 (64-bit, prefetchable) [size=32M]
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Code:
dmesg | grep -i nvidia

NVRM: This PCI I/O region assigned to your NVIDIA device is invalid
nvidia: probe of 0000:01:00.0 failed with error -1
NVRM: None of the NVIDIA devices were initialized


Code:
nvidia-smi
→ NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

Observations:
  • The GPU is enumerated correctly as a PCI device
  • BARs are present (not 0M @ 0x0), so this is NOT a simple MMIO sizing issue
  • However, BusMaster is disabled in the guest
  • NVIDIA driver refuses to bind and initialize the device

For comparison:

  • The same passthrough approach works on Proxmox with an NVIDIA L40S once PCI MMIO (OVMF X-PciMmio64Mb) is increased
  • On B300, increasing MMIO (even to 256 GB) does not change the behavior

Questions:

  1. Is single-GPU passthrough on DGX / HGX B300 platforms expected to work with Proxmox/KVM?
  2. Is full-board passthrough (entire HGX) the only supported model on these platforms?
  3. Is the disabled BusMaster state a known limitation for partial passthrough on fabric-based GPUs (NVSwitch / NVLink)?
  4. Has anyone successfully initialized an NVIDIA B300 GPU inside a KVM VM using vfio-pci?

At this point I’m not looking for a workaround, but for clarification:

  • Is this a Proxmox/QEMU limitation?
  • Or an NVIDIA platform/driver limitation by design?
Any insight from Proxmox developers or users with HGX/DGX experience would be greatly appreciated.

Thanks in advance.
 
Hi, you say increasing the mmio size did not do anything:
  • On B300, increasing MMIO (even to 256 GB) does not change the behavior

how did you do increase the size?

did you see the different solutions to increasing the mmio size here: https://pve.proxmox.com/wiki/PCI_Passthrough#"BAR0_is_0M"_error_or_Windows_Code_12_Error
? might be still worth a try, since the output of lspci that you posted only shows bar 0 and 4, but not bar 2 (which would normally be the 'vram' region)

can you also post

Code:
lspci -vvv

from the host? (limiting to the b300)
and the full config of the vm?

sadly we don't had a chance yet to take a look at this platform, so this:
  1. Is full-board passthrough (entire HGX) the only supported model on these platforms?
might be a possibility

also i can try to help, but as we don't have access to that platform, our help might be limited
 
Hi Dominik,

thanks for the follow-up — happy to provide more details.

Below is exactly how we increased MMIO, plus the requested host lspci output and full VM config.

---

1) How MMIO was increased

We tested multiple MMIO sizes using the OVMF fw_cfg method, always with the VM fully powered off (cold boot).

Example with 512 GB MMIO:

qm stop <VMID>
qm set <VMID> -args '-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=524288'
qm start <VMID>

We tested:
- 128 GB
- 256 GB
- 512 GB

Result:
- Guest BAR layout did not change at all
- BAR2 (VRAM aperture) never appeared
- BusMaster remained disabled

This same method successfully resolved BAR issues on an NVIDIA L40S system, so we are confident the mechanism itself works correctly on Proxmox.

---

2) Host-side lspci output (B300)


Code:
root@proxmox:~# lspci -vvv -s 3c:00.0
3c:00.0 3D controller: NVIDIA Corporation Device 3182 (rev a1)
        DeviceName: GPU1
        Subsystem: NVIDIA Corporation Device 20e6
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 255
        NUMA node: 0
        IOMMU group: 78
        Region 0: Memory at 298000000000 (64-bit, prefetchable) [disabled] [size=64M]
        Region 2: Memory at 288000000000 (64-bit, prefetchable) [disabled] [size=512G]
        Region 4: Memory at 298044000000 (64-bit, prefetchable) [disabled] [size=32M]
        Capabilities: [40] Express (v2) Endpoint, IntMsgNum 0
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 64GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 64GT/s, Width x16
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp+ 10BitTagReq+ OBFF Via message, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
                DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis-
                         AtomicOpsCtl: ReqEn+
                         IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
                         10BitTagReq+ OBFF Disabled, EETLPPrefixBlk-
                LnkCap2: Supported Link Speeds: 2.5-64GT/s, Crosslink- Retimer+ 2Retimers+ DRS+
                LnkCtl2: Target Link Speed: 64GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [7c] MSI-X: Enable- Count=12 Masked-
                Vector table: BAR=0 offset=00b90000
                PBA: BAR=0 offset=00ba0000
        Capabilities: [88] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] Vendor Specific Information: Len=14 <?>
        Capabilities: [100 v1] Device Serial Number 48-b0-2d-98-92-83-26-6c
        Capabilities: [148 v3] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq+ ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC+ UnsupReq+ ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
                        ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
                AERCap: First Error Pointer: 08, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap+
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [198 v1] Physical Resizable BAR
                BAR 2: current size: 512GB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB 512GB
        Capabilities: [1a4 v1] Virtual Resizable BAR
                BAR 2: current size: 16GB, supported: 4GB 8GB 16GB 32GB 64GB 128GB 256GB 512GB 1TB 2TB 4TB 8TB
        Capabilities: [1d8 v1] Data Link Feature <?>
        Capabilities: [1e4 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [214 v1] Physical Layer 32.0 GT/s <?>
        Capabilities: [244 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: LaneErr at lane: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
        Capabilities: [270 v1] Lane Margining at the Receiver
                PortCap: Uses Driver+
                PortSta: MargReady- MargSoftReady-
        Capabilities: [2f8 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [300 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq+ IntMsgNum 0
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 2, stride: 1, Device ID: 3182
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 0000298046000000 (64-bit, prefetchable)
                Region 2: Memory at 0000290000000000 (64-bit, prefetchable)
                Region 4: Memory at 0000298004000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [368 v1] Power Budgeting <?>
        Capabilities: [378 v2] Data Object Exchange
                DOECap: IntSup+
                        IntMsgNum 9
                DOECtl: IntEn-
                DOESta: Busy- IntSta- Error- ObjectReady-
        Capabilities: [ac0 v1] Extended Capability ID 0x31
        Capabilities: [ae0 v1] Extended Capability ID 0x2f
        Capabilities: [af0 v1] Designated Vendor-Specific: Vendor=10de ID=0000 Rev=0 Len=28 <?>
        Capabilities: [b0c v1] Designated Vendor-Specific: Vendor=10de ID=0001 Rev=0 Len=16 <?>
        Capabilities: [b1c v1] Extended Capability ID 0x32
        Capabilities: [ba4 v1] Designated Vendor-Specific: Vendor=10de ID=0003 Rev=0 Len=20 <?>
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau

On the host, BARs and capabilities appear normal and complete.

---

3) Guest-side lspci (for reference)

Inside the VM, after MMIO was increased to 512 GB:

lspci -vv -s 01:00.0

01:00.0 3D controller: NVIDIA Corporation Device 3182 (rev a1)
Control: I/O- Mem+ BusMaster-
Region 0: Memory at 8000000000 (64-bit, prefetchable) [size=64M]
Region 4: Memory at 8004000000 (64-bit, prefetchable) [size=32M]

BAR2 (large VRAM region) is never exposed, regardless of MMIO size.

---

4) Full VM configuration

Code:
root@proxmox:~# qm config 100
args: -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=524288
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 16
cpu: x86-64-v2-AES
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,size=4M
hostpci0: 0000:1a:00.0,pcie=1
ide2: local:iso/ubuntu-24.04.3-live-server-amd64.iso,media=cdrom,size=3226020K
machine: q35
memory: 20480
meta: creation-qemu=10.1.2,ctime=1768811596
name: passthru
net0: virtio=bc:24:11:1b:d3:fd,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
scsi0: nvme0:100/vm-100-disk-0.qcow2,iothread=1,size=320G
scsihw: virtio-scsi-single
smbios1: uuid=ca140451-6f19-47e2-bd47-a3144515d58d
sockets: 1
vmgenid: cfdc70be-f749-4765-be58-ac098e2b967d
root@proxmox:~#

Key points:
- machine: q35
- bios: ovmf
- hostpci configured with pcie=1
- rombar=0 tested
- Above 4G decoding enabled in host BIOS
- vfio-pci used

---

5) Current conclusion

Given that:
- MMIO space up to 512 GB does not change guest BAR layout
- BAR2 never appears
- BusMaster remains disabled
- the same setup works on L40S

we currently suspect this is related to how B300 (Blackwell / HGX-class) GPUs expose PCIe resources when assigned as a single GPU to a VM.

This brings us back to the question whether full-board passthrough (entire HGX) is the only expected working model for this platform.

If there are any additional angles to test (QEMU options, PCIe topology changes, etc.), we are very open to trying them.

Thanks again for taking a look — fully understand the limitations without direct access to the platform.

Best regards,
 
ok so because i also ran into a similar issue with an RTX Pro 6000 Blackwell card.
if the host has one region with X GiB + several others , the MMIO size must be the next multiple of 2 above X

so in your case, the host reports:
Region 0: Memory at 298000000000 (64-bit, prefetchable) [disabled] [size=64M]
Region 2: Memory at 288000000000 (64-bit, prefetchable) [disabled] [size=512G]
Region 4: Memory at 298044000000 (64-bit, prefetchable) [disabled] [size=32M]

which would need a bar size of at least 1024G which you didn't try IIUC.

before trying to mess with the args + MMIO64 size though, could you try to boot the guest withitout that args line with either
'host' type cpu
or
any other cpu type with 'phy-bits' set to 'host' and the 'pdpe1gb' flag enabled (like it's written in our troubleshooting section i linked above)
or
instead of using OVMF try using seabios
or
setting the PciMmio64Mb setting to 1048576

if none of that works, i'm conviced that it's some other limitation like you mentioned about the busmaster.
Though i guess that NVIDIA would have more info about that. Do they have some documentation on how to pass that through? Maybe for another linux flavor? (e.g. RHEL or Ubuntu)?
 
Hi Dominik,

thanks a lot for the detailed suggestions — we followed up on all of them and here is a complete update.

---

1) MMIO size (next power of two above BAR2)

You were absolutely right about the power-of-two requirement.

On the host, the B300 reports:
- BAR2 = 512 GB
- plus additional smaller regions

So we increased the MMIO64 space to the next power of two above that, i.e. 1 TB.

Command used (VM fully powered off, cold boot):

qm stop 100
qm set 100 -args '-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=1048576'
qm start 100

Result inside the guest:
- BAR2 (VRAM aperture) still not exposed
- BusMaster remained disabled
- NVIDIA driver still failed to initialize
- nvidia-smi could not communicate with the driver

So even with 1 TB MMIO, BAR exposure and BusMaster state did not change.

---

2) CPU topology tests (phys-bits / pdpe1gb / host CPU)

We tested:
- cpu: host
- cpu models with phys-bits=host and +pdpe1gb

Results:
- Some combinations had no effect
- Others caused the GPU to disappear entirely from the guest (“No devices were found”)

None of the CPU topology variants resulted in BAR2 being exposed or BusMaster being enabled.

---

3) SeaBIOS instead of OVMF

We attempted to switch the VM to SeaBIOS as suggested.

However, the guest OS was installed in UEFI mode (OVMF, GPT + EFI system partition), so SeaBIOS cannot boot the existing disk and stops at “Booting from hard disk”.

Given that this was only meant as a control test and would require reinstalling a legacy (MBR) guest just for enumeration, we did not pursue this further.

---

4) NVIDIA’s official position on HGX passthrough

We also checked NVIDIA’s official documentation to see whether this behavior is expected.

From the NVIDIA AI Enterprise Support Matrix:
https://docs.nvidia.com/ai-enterprise/release-7/latest/support/support-matrix.html

Quote:

“On NVIDIA HGX platforms, only VMs configured in full PCIe passthrough are supported, meaning the entire HGX board can be assigned to a single VM on supported hypervisors. For information about NVIDIA Fabric Manager integration or support for deploying 1-, 2-, 4- or 8-GPU VMs on your hypervisor, consult the documentation from your hypervisor vendor.”

This aligns very well with what we are observing:
- Single-GPU vfio-pci passthrough on HGX/B300 does not behave like a standard PCIe GPU
- BAR exposure and BusMaster never transition into a usable state inside the guest


Best regards,
 
Hi,
I have a very similar problem. I switched to new Mainboard and CPU and now GPU passthrough does not work, whereas it works without any problems with a network card and also a TV card. GPU passthrough worked without any problems with the old hardware.

As I am very impatient, waiting is not really a solution. Do you have any other ideas as to what the problem could be?