Hi all - wondering if someone can help.
I have got Proxmox 8.3.2 running on a Cisco UCS X-Series Blade that has 2 x NVIDIA T4 cards installed - and I am trying to get GPU passthrough to work to a Linux 24.04 VM I am running on it. I have found a great deal of information on the threads in this forum - and I have tried multiple permutations. I have the drivers blacklisted, and I can see the 2 cards (which interestingly have the same Vendor and Device ID) and can see that the blacklisted drivers appear to be working as the "Drivers in use" no longer appear as shown below:
root@ai:~# lspci -k | grep -E "vfio-pci | NVIDIA"
63:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
64:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
I have mapped the PCI device for the first card above (0000:63:00) which is in IOMMU Group 5.
When I try to start the Linux VM I get the following error:
kvm: -device vfio-pci,host=0000:63:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio: error disconnecting group 5 from container
kvm: -device vfio-pci,host=0000:63:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:63:00.0: error getting device from group 5: No such device
Verify all devices in group 5 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1
It looks like the host or something else is using it - but I can't see how or where - Does anyone know how to fix this? (Any assistance greatly appreciated!)
I have got Proxmox 8.3.2 running on a Cisco UCS X-Series Blade that has 2 x NVIDIA T4 cards installed - and I am trying to get GPU passthrough to work to a Linux 24.04 VM I am running on it. I have found a great deal of information on the threads in this forum - and I have tried multiple permutations. I have the drivers blacklisted, and I can see the 2 cards (which interestingly have the same Vendor and Device ID) and can see that the blacklisted drivers appear to be working as the "Drivers in use" no longer appear as shown below:
root@ai:~# lspci -k | grep -E "vfio-pci | NVIDIA"
63:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
64:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
I have mapped the PCI device for the first card above (0000:63:00) which is in IOMMU Group 5.
When I try to start the Linux VM I get the following error:
kvm: -device vfio-pci,host=0000:63:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio: error disconnecting group 5 from container
kvm: -device vfio-pci,host=0000:63:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:63:00.0: error getting device from group 5: No such device
Verify all devices in group 5 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1
It looks like the host or something else is using it - but I can't see how or where - Does anyone know how to fix this? (Any assistance greatly appreciated!)