I'm trying to passthrough 2 GPUs to the same VM without luck. If I boot it with only one of them configured, the VM boots and everything works as expected. But when I configure both at the same time, the VM doesn't output anything nor collects logs (checked multiple logs inside `/var/log` and there seems to be no messages), but PVE reports it as booted.
To set everything up I followed this guide and I've been able to create both Windows and Linux (when only 1 GPU is assigned per VM).
Server specs:
- CPU: Intel i9 10980xe
- MB: x299x Designare 10g
- RAM: 256gb
- GPU1: 3090 MSI Suprim X
- GPU2: 3090 MSI Suprim X
Things tried:
- Verified that both GPUs are using the `vfio-pci` driver with `lspci -nnk -s <BDF>`
- Verified that both GPUs aren't assigned to the same IOMMU group. GPU 1 has group 3, GPU 2 has group 5.
- Checked PVE logs for the VM and what stood up to me was: "kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM"
- (Update 1) Enabled "Above 4G Decoding" bios option (as recommended here)
- (Update 2) Tried connecting the GPUs to different PCIe ports, properly changing the configuration with each change. Still same errors
- (Update 2) Confirmed 2 different VMs can be booted with 1 GPU each at the same time.
- (Update 2) Disconnected all HDMI/Displayport cables from the GPUs and restarted
Is anyone running 2 or more GPUs on the same VM? Any ideas of what could be causing this? Any help is appreciated
Update 1:
I enabled "Above 4G Decoding" option in the MB's BIOS like suggested in this thread but still no luck.
Here are the logs I'm getting:
To set everything up I followed this guide and I've been able to create both Windows and Linux (when only 1 GPU is assigned per VM).
Server specs:
- CPU: Intel i9 10980xe
- MB: x299x Designare 10g
- RAM: 256gb
- GPU1: 3090 MSI Suprim X
- GPU2: 3090 MSI Suprim X
Things tried:
- Verified that both GPUs are using the `vfio-pci` driver with `lspci -nnk -s <BDF>`
- Verified that both GPUs aren't assigned to the same IOMMU group. GPU 1 has group 3, GPU 2 has group 5.
- Checked PVE logs for the VM and what stood up to me was: "kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM"
- (Update 1) Enabled "Above 4G Decoding" bios option (as recommended here)
- (Update 2) Tried connecting the GPUs to different PCIe ports, properly changing the configuration with each change. Still same errors
- (Update 2) Confirmed 2 different VMs can be booted with 1 GPU each at the same time.
- (Update 2) Disconnected all HDMI/Displayport cables from the GPUs and restarted
Is anyone running 2 or more GPUs on the same VM? Any ideas of what could be causing this? Any help is appreciated
Update 1:
I enabled "Above 4G Decoding" option in the MB's BIOS like suggested in this thread but still no luck.
Here are the logs I'm getting:
Code:
Aug 20 22:42:06 pve01 kernel: vfio-pci 0000:a1:00.0: enabling device (0100 -> 0103)
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:a1:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
Aug 20 22:42:07 pve01 kernel: resource: resource sanity check: requesting [mem 0x00000000000c0000-0x00000000000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Aug 20 22:42:07 pve01 kernel: caller pci_map_rom+0x6c/0x1d0 mapping multiple BARs
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM
Aug 20 22:42:07 pve01 kernel: vfio-pci 0000:c1:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
Aug 20 22:42:08 pve01 pvedaemon[1383]: <root@pam> end task UPID:pve01:000005C4:00000D4E:64E2181F:qmstart:101:root@pam: OK
Aug 20 22:42:08 pve01 pvestatd[1353]: status update time (6.513 seconds)
Aug 20 22:42:08 pve01 kernel: usb 1-8.1.2.1.3: USB disconnect, device number 17
Aug 20 22:42:12 pve01 kernel: resource: resource sanity check: requesting [mem 0x00000000000c0000-0x00000000000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Aug 20 22:42:12 pve01 kernel: caller pci_map_rom+0x6c/0x1d0 mapping multiple BARs
Aug 20 22:42:12 pve01 kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM
Aug 20 22:42:12 pve01 kernel: resource: resource sanity check: requesting [mem 0x00000000000c0000-0x00000000000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Aug 20 22:42:12 pve01 kernel: caller pci_map_rom+0x6c/0x1d0 mapping multiple BARs
Aug 20 22:42:12 pve01 kernel: vfio-pci 0000:c1:00.0: No more image in the PCI ROM
Last edited: