Oculink device issues

frenzybiscuit

New Member
Jul 8, 2025
9
1
3
Hi,

I have a server with PVE 9.0.3. I have two GPU on the main server (the pci slots) and then one GPU with oculink.

It detects all three GPU. But when launching a VM after restarting the physical server, only two show up on the VM under nvidia-smi.

I have to manually remove the PCI devices and re-add them (sometimes several times, with several reboots of the VM) before they are all detected in the VM.

Is there a fix for this?
 
Hello,

Check if VM Configuration Reapplying All Devices in the VM config (/etc/pve/qemu-server/<vmid>.conf) might list all three GPUs, but one could be failing silently due to driver or group conflicts.
Check dmesg and journalctl -xe after VM launch for errors.
 
qemu-server/100.conf correctly lists the pci devices:

hostpci0: 0000:01:00,pcie=1,x-vga=1
hostpci1: 0000:02:00,pcie=1,x-vga=1
hostpci2: 0000:03:00,pcie=1,x-vga=1

However, dmesg on the host does show the following:

[ 50.635523] vfio-pci 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x564e

Turning off the VM and turning it back on fixed the issue.

Is there a way to fix this? Because the error doesn't occur on one of the many resets.