Can't boot VM when it has 2 GPUs assigned (pcie passthrough)

Komuri

New Member
Aug 20, 2023
1
0
1
I'm trying to passthrough 2 GPUs to the same VM without luck. If I boot it with only one of them configured, the VM boots and everything works as expected. But when I configure both at the same time, the VM doesn't output anything nor collects logs (checked multiple logs inside `/var/log` and there seems to be no messages), but PVE reports it as booted.

To set everything up I followed this guide and I've been able to create both Windows and Linux (when only 1 GPU is assigned per VM).

Server specs:

- CPU: Intel i9 10980xe
- MB: x299x Designare 10g
- RAM: 256gb
- GPU1: 3090 MSI Suprim X
- GPU2: 3090 MSI Suprim X

Things tried:

- Verified that both GPUs are using the `vfio-pci` driver with `lspci -nnk -s <BDF>`
- Verified that both GPUs aren't assigned to the same IOMMU group. GPU 1 has group 3, GPU 2 has group 5.
- Checked PVE logs for the VM and what stood up to me was: "kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM"
- (Update 1) Enabled "Above 4G Decoding" bios option (as recommended here)
- (Update 2) Tried connecting the GPUs to different PCIe ports, properly changing the configuration with each change. Still same errors
- (Update 2) Confirmed 2 different VMs can be booted with 1 GPU each at the same time.
- (Update 2) Disconnected all HDMI/Displayport cables from the GPUs and restarted

Is anyone running 2 or more GPUs on the same VM? Any ideas of what could be causing this? Any help is appreciated


Update 1:


I enabled "Above 4G Decoding" option in the MB's BIOS like suggested in this thread but still no luck.

Here are the logs I'm getting:

Code:
Aug 20 22:42:06 pve01 kernel: vfio-pci 0000:a1:00.0: enabling device (0100 -> 0103)
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
Aug 20 22:42:07 pve01 kernel: resource: resource sanity check: requesting [mem 0x00000000000c0000-0x00000000000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Aug 20 22:42:07 pve01 kernel: caller pci_map_rom+0x6c/0x1d0 mapping multiple BARs
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
Aug 20 22:42:08 pve01 pvedaemon[1383]: <root@pam> end task UPID:pve01:000005C4:00000D4E:64E2181F:qmstart:101:root@pam: OK
Aug 20 22:42:08 pve01 pvestatd[1353]: status update time (6.513 seconds)
Aug 20 22:42:08 pve01 kernel: usb 1-8.1.2.1.3: USB disconnect, device number 17
Aug 20 22:42:12 pve01 kernel: resource: resource sanity check: requesting [mem 0x00000000000c0000-0x00000000000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Aug 20 22:42:12 pve01 kernel: caller pci_map_rom+0x6c/0x1d0 mapping multiple BARs
Aug 20 22:42:12 pve01 kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM
Aug 20 22:42:12 pve01 kernel: resource: resource sanity check: requesting [mem 0x00000000000c0000-0x00000000000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Aug 20 22:42:12 pve01 kernel: caller pci_map_rom+0x6c/0x1d0 mapping multiple BARs
Aug 20 22:42:12 pve01 kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM

♂️
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!