x570 dual GPU pass through

ZachH83

New Member
Feb 26, 2021
13
0
1
41
MOBO: MSI x570 Gaming Edge Wifi (with the latest bios update)
CPU: Ryzen 5 5600G
____

I am attempting to pass through 2 Nvidia Quadro M2000 gpu. The intention is to have each one passed through to their own VM (running Ubuntu 22) that will have a monitor plugged into the GPU Display Port for direct access.
____
I have gone through the steps of GPU passthrough as I had before, as well as tried a few different variables that I found online, but mostly I just stick to the PCI_Passthrough provided by proxmox.
____
IOMMU and SVNM are enabled in the bios.
____
The current result is that if I have both VMs with a gpu passed through to them, I can run only one at a time. Either one will work fine, so long as its the only one running. If I turn on the other one, the whole proxmox server crashes. Both VMs can run fine with no gpu passed through.
____
--- I have checked the iommu groups and both cards are in their own respective groups, that are unique to them, and nothing else is in the group. (group 10, group 18)
--- I have tested to see what would happen if both cards were passed into one VM to see if it would be something specific to the card or slots causing the issue, but the VM ran fine with both cards.
--- I attempted ACS override, but that just seemed like overkill as the groups were fine without it.
____
I have another system that I did this in, and it is working fine. It is a MSI x570 Gaming Plus (latest bios update, using a Ryzen 7 3800XT and 2 quadro p400 gpus. In fact, that system worked with no pci passthrough at all in its current set up. I just installed them and used proxmox web panel to add them as hardware to the VMS. Im actually typing this from one of them right now. So, I dont understand that, but its working.
____
Is there any information that I should be looking for outside of the iommu groups? Could it be the current bios version I am running? Could the CPU be the problem as my other system has no integrated graphics? I know the MSI x570 boards are good for this, but maybe the gaming edge has something different that holds it back?



______
find /sys/kernel/iommu_groups/ -type l

/sys/kernel/iommu_groups/17/devices/0000:21:0a.0
/sys/kernel/iommu_groups/17/devices/0000:2c:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:08.2
/sys/kernel/iommu_groups/25/devices/0000:30:00.3
/sys/kernel/iommu_groups/15/devices/0000:2a:00.3
/sys/kernel/iommu_groups/15/devices/0000:2a:00.1
/sys/kernel/iommu_groups/15/devices/0000:21:08.0
/sys/kernel/iommu_groups/15/devices/0000:2a:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:08.0
/sys/kernel/iommu_groups/23/devices/0000:30:00.1
/sys/kernel/iommu_groups/13/devices/0000:21:04.0
/sys/kernel/iommu_groups/3/devices/0000:00:02.0
/sys/kernel/iommu_groups/21/devices/0000:2d:00.0
/sys/kernel/iommu_groups/11/devices/0000:20:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/28/devices/0000:31:00.1
/sys/kernel/iommu_groups/18/devices/0000:23:00.1
/sys/kernel/iommu_groups/18/devices/0000:23:00.0
/sys/kernel/iommu_groups/8/devices/0000:00:14.3
/sys/kernel/iommu_groups/8/devices/0000:00:14.0
/sys/kernel/iommu_groups/26/devices/0000:30:00.4
/sys/kernel/iommu_groups/16/devices/0000:21:09.0
/sys/kernel/iommu_groups/16/devices/0000:2b:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:08.1
/sys/kernel/iommu_groups/24/devices/0000:30:00.2
/sys/kernel/iommu_groups/14/devices/0000:21:05.0
/sys/kernel/iommu_groups/4/devices/0000:00:02.1
/sys/kernel/iommu_groups/22/devices/0000:30:00.0
/sys/kernel/iommu_groups/12/devices/0000:21:01.0
/sys/kernel/iommu_groups/2/devices/0000:00:01.2
/sys/kernel/iommu_groups/20/devices/0000:27:00.0
/sys/kernel/iommu_groups/10/devices/0000:10:00.1
/sys/kernel/iommu_groups/10/devices/0000:10:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/19/devices/0000:26:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:18.3
/sys/kernel/iommu_groups/9/devices/0000:00:18.1
/sys/kernel/iommu_groups/9/devices/0000:00:18.6
/sys/kernel/iommu_groups/9/devices/0000:00:18.4
/sys/kernel/iommu_groups/9/devices/0000:00:18.2
/sys/kernel/iommu_groups/9/devices/0000:00:18.0
/sys/kernel/iommu_groups/9/devices/0000:00:18.7
/sys/kernel/iommu_groups/9/devices/0000:00:18.5
/sys/kernel/iommu_groups/27/devices/0000:31:00.0
____

lspci

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166a
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166b
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166c
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166d
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166e
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166f
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1670
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1671
10:00.0 VGA compatible controller: NVIDIA Corporation GM206GL [Quadro M2000] (rev a1)
10:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
20:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream
21:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
21:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
21:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
21:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
21:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
21:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
23:00.0 VGA compatible controller: NVIDIA Corporation GM206GL [Quadro M2000] (rev a1)
23:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
26:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
27:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
2a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
2a:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
2a:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
2d:00.0 Non-Volatile memory controller: Realtek Semiconductor Co., Ltd. Device 5765 (rev 01)
30:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9)
30:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device 1637
30:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
30:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1
30:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1
31:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 81)
31:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 81)
 
TASK ERROR: start failed: command '/usr/bin/kvm -id 200 -name 'VM,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/200.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/200.pid -daemonize -smbios 'type=1,uuid=02ee0076-7f6e-45a6-bb8f-2c9060cfe3c9' -smp '6,sockets=1,cores=6,maxcpus=6' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -cpu 'host,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt' -m 16384 -object 'iothread,id=iothread-virtioscsi0' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=0b9bd098-216e-42b5-b510-283053c7442a' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:23:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0,x-vga=on' -device 'vfio-pci,host=0000:23:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:42f32262a76c' -drive 'file=/var/lib/vz/template/iso/ubuntu-22.04.1-desktop-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' -drive 'file=/dev/pve/vm-200-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap200i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=A2:E8:40:71:47:BD,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024,bootindex=102' -machine 'smm=off,type=q35+pve0'' failed: got timeout

That is the error it is giving, but the end result is a time out, so I'm not really certain on what is causing that to occur.
 
I just noticed that my other system is running proxmox 7.2, where the one giving me issues is 7.3. I'm going to see if switching it fixes the issue.
 
After much playing around with it, I adjusted my Ram, and suddenly it all works. Guess I was not leaving enough for the environment.
 
After much playing around with it, I adjusted my Ram, and suddenly it all works. Guess I was not leaving enough for the environment.
If you get a timeout without any errors in the systems logs (journaltctl), then it's almost always because of too little continuous memory available. With PCI(e) passthrough, all VM memory must be pinned into actual host memory because of DMA. Ballooning will also not work. I guess 16GiB (plus some overhead) was simply too much at that time. (There are lots of threads about Proxmox memory usage and how to limit ZFS for example.)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!