[SOLVED] PCI passthrough of multiple NVIDIA GPUs

testa

New Member
Nov 29, 2023
6
1
3
My current configueration is an AMD EPYC Server that is equipped with two NVIDIA L40 GPUs. If I install Ubuntu or Debian (newest) everything works. Both the GPUs are correctly identified and usable.

Now I try since 14 days to setup Proxmox (v.8.1.3) and provide the GPUs via PCI passthrough to a VM running Ubuntu or Debian.

What have I done to allow PCI passthrough:

1. configure grub
2. VFIO modules
3. IMMO interrupt remapping
4. driver blacklisting
5. adding GPUs to VFIO
6. VM Bios UEFI, Machine q35, memory balloon disabled

I can use the first GPU in the VM without any troubles. It is identified as nVidia Corporation 3D controller and I can install the NVIDIA drivers and utilize the GPU in my computations. If I shutdown the VM and replace the first GPU with the second one (both same vendor, same type), that is identified as nVidia Corporation VGA compatible controller, the driver can't communicate with the card. Same situation if I enable passthrough of both NVIDIA cards: first one is working, second one not.

I thought there is a problem with the second card so I tried to setup ESXi and created the same VM (same versions) and used PCI passthrough. Both NVIDIA GPUs were working without troubles.

But I love PROXMOX and want to stick to this system. Any ideas what to try?

Thank you.
 
can you post the vm configs and the host journal ?

also the dmesg from the host and the exact mainboard/gpu models would be interesting
 
Sure - thank you for your help!
Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 64
cpu: EPYC-Milan
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:01:00,pcie=1
hostpci1: 0000:81:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 65546
meta: creation-qemu=8.1.2,ctime=1701716924
name: a10
net0: virtio=BC:24:11:FE:6D:A0,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,iothread=1,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=0b674f62-3ecc-4909-a65e-c1db732bb03b
sockets: 1
vmgenid: 1f1a8f2e-2e5f-4af5-8642-7d3da528abe8

The journal is full of errors (0000:81:00.0 is the non-working L40 GPU):
kvm: vfio 0000:81:00.0: Failed to set up TRIGGER eventfd signaling for interrupt MSIX-1: VFIO_DEVICE_SET_IRQS failure: Invalid argument

Server is a: Supermicro A+ Server 4124GS-TNR

GPUs are: NVIDIA L40
 
Last edited:
can you post the output of
Code:
lspci -vvv
?

the error
kvm: vfio 0000:81:00.0: Failed to set up TRIGGER eventfd signaling for interrupt MSIX-1: VFIO_DEVICE_SET_IRQS failure: Invalid argument
seems like there is an issue with the interrupts, maybe the lspci gives more insight
 
[SOLVED]
Thank you very much!
The "display mode" was the reason. Download the selector and changed to
Code:
physical_display_disabled
and passthrough is working now.
 
  • Like
Reactions: dcsapak

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!