[SOLVED] Dual GPU Passthrough to Multiple VMs

cmunroe

Member
Apr 19, 2022
7
0
6
Goals:

I am currently trying to get IOMMU working with two GPUs. Each GPU will be given to a separate Windows 11 VM.

Hardware:
ASUS ROG Maximus Hero X :: https://rog.asus.com/motherboards/rog-maximus/rog-maximus-x-hero-model/spec
Intel 8700K

Nvidia 1080
Nvidia 3060

Problem:

if I boot up one VM at a time and then shut down before starting the other, everything works great. However, if both VMs are online the CPU usage will quickly spike to 100% across all cores and the system will basically become unusable.

Here is a snippet of my syslog: https://paste.ee/p/26DeI

However, this setup looks possible with this motherboard and cpu combo:

https://forum.proxmox.com/threads/iommu-device-separation-in-to-groups.68662/
https://forum.level1techs.com/t/asus-maximus-x-hero-both-gpus-in-the-same-iommu-group/174721


Code:
dmesg | grep vfio
[    3.815087] vfio-pci 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    3.834707] vfio_pci: add [10de:1b80[ffffffff:ffffffff]] class 0x000000/00000000
[    3.854625] vfio_pci: add [10de:10f0[ffffffff:ffffffff]] class 0x000000/00000000
[    3.854638] vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    3.878586] vfio_pci: add [10de:2504[ffffffff:ffffffff]] class 0x000000/00000000
[    3.898644] vfio_pci: add [10de:228e[ffffffff:ffffffff]] class 0x000000/00000000
[    4.200926] vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[    4.200928] vfio-pci 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[   87.361223] vfio-pci 0000:02:00.0: enabling device (0000 -> 0003)
[   87.361417] vfio-pci 0000:02:00.0: vfio_ecap_init: hiding ecap 0x19@0x900


From my plethora of google searches, I have found that I needed to put multifunction in the grub to help split up the IOMMU grouping.
https://forum.proxmox.com/threads/iommu-device-separation-in-to-groups.68662/


Code:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec2] (rev 07)
IOMMU group 10 00:1c.4 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #5 [8086:a294] (rev f0)
IOMMU group 11 00:1c.6 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #7 [8086:a296] (rev f0)
IOMMU group 12 00:1d.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #9 [8086:a298] (rev f0)
IOMMU group 13 00:1f.0 ISA bridge [0601]: Intel Corporation Z370 Chipset LPC/eSPI Controller [8086:a2c9]
IOMMU group 13 00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller [8086:a2a1]
IOMMU group 13 00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0]
IOMMU group 13 00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller [8086:a2a3]
IOMMU group 14 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8]
IOMMU group 15 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2504] (rev a1)
IOMMU group 16 01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228e] (rev a1)
IOMMU group 17 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
IOMMU group 18 02:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
IOMMU group 19 05:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8822BE 802.11a/b/g/n/ac WiFi adapter [10ec:b822]
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU group 20 06:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
IOMMU group 21 07:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
IOMMU group 22 08:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963 [144d:a804]
IOMMU group 2 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) [8086:1905] (rev 07)
IOMMU group 3 00:02.0 VGA compatible controller [0300]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:3e92]
IOMMU group 4 00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH CSME HECI #1 [8086:a2ba]
IOMMU group 6 00:17.0 RAID bus controller [0104]: Intel Corporation SATA Controller [RAID mode] [8086:2822]
IOMMU group 7 00:1b.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #17 [8086:a2e7] (rev f0)
IOMMU group 8 00:1c.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #1 [8086:a290] (rev f0)
IOMMU group 9 00:1c.2 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #3 [8086:a292] (rev f0)

Grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction"

vfio:
Code:
options vfio-pci ids=10de:1b80,10de:10f0,10de:2504,10de:228e disable_vga=1

Any help would be appreciated.
 
Last edited:
What are the symptoms of being "unusable"? Is it not reachable by web GUI and SSH and/or appears to be frozen? Does it reset hard? Can you reach the web GUI or ssh from one of those VMs?
Maybe your system is overloaded by the two VM at the same time? Please attach the VM configuration files (from the /etc/pve/qemu-server/ directory) and tell us how much memory your host has (because ballooning does not work with VMs that use passthrough).
Breaking the IOMMU groups with pcie_acs_override does not guarantee that devices from the same IOMMU group are not affected (maybe the host loses some essential devices like network or SATA drives). Please show us the IOMMU groups without using pcie_acs_override. Note that it also introduces a security risk because PCI(e) devices are no longer isolated.
 
  • Like
Reactions: cmunroe
What are the symptoms of being "unusable"?
* System goes to 100% CPU across all cores.
* GUI stops responding.
* SSH Slow, and nearly unresponsive.
* VMs stop functioning.
* After a bit, a hard reset occurs.

{...}

@leesteken you are amazing. I wanted to test your theory on memory and lowered one of the VMs down by half. It all boots up fine now.

Thank you, Thank you!
 
PCI(e) devices can read and write any part of the VM (or host, when not passed through) memory at any time using DMA. Therefore, all VM memory must be pinned into actual RAM and it cannot be swapped, ballooned or shared (KSM). Remember that Proxmox, filesystem cache and other stuff also need some memory to function.
This is also why breaking the IOMMU groups with pcie_acs_override is a security issue: PCIe devices in the same group can communicate with each other without the CPU knowing. Combined with DMA, they can leak VM or host (when some devices of the group are still allocated to the Proxmox host) memory to other sites or people. Make sure not to run untrusted software or allow untrusted users when overriding the PCIe ACS.
 
PCI(e) devices can read and write any part of the VM (or host, when not passed through) memory at any time using DMA. Therefore, all VM memory must be pinned into actual RAM and it cannot be swapped, ballooned or shared (KSM). Remember that Proxmox, filesystem cache and other stuff also need some memory to function.
This is also why breaking the IOMMU groups with pcie_acs_override is a security issue: PCIe devices in the same group can communicate with each other without the CPU knowing. Combined with DMA, they can leak VM or host (when some devices of the group are still allocated to the Proxmox host) memory to other sites or people. Make sure not to run untrusted software or allow untrusted users when overriding the PCIe ACS.


I was not aware of the Memory implications of Passthrough, with all my reading it was never stated this was going to act so destructively.

Both systems will be used by myself only, and nothing critical on either.
 
I was not aware of the Memory implications of Passthrough, with all my reading it was never stated this was going to act so destructively.
Reminds me of a simple PCIe hack back in the days with those express cards slots on laptops. You can build a hardware that copies the entire memory to a storage inside of the malicous card without any protection from the host or host operating system directly via DMA.

We all hope for the fully virtualized GPU cores of the nvidia flagships beeing available on the lower end cards like xx80ti and such for virtualized environments.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!