Proxmox Multiseat: GPU Passthrough works great, but only 1 VM at a time allowed

FuriousGeorge

Renowned Member
Sep 25, 2012
84
2
73
I'm experimenting with PCI passthru and multiseat on proxmox.

The former works well, and impressively so. I can even pass two GPU to one VM, and have no problem.

However, if I start a second VM and pass a GPU to it, then within minutes of starting anything 3D intensive on one, such as a game or a benchmark, the GPU driver will first crash on that VM, and then somehow it also always crashes on the other.

I can expedite this process by also doing something 3D intensive on the second VM. This will cause a GPU driver crash on one or the other VM within minutes at most, and then on the other VM seconds after that.

Sometimes Windows will report that the GPU driver has crashed and recovered, but it never really does, and must be hard powered down to restore stability.

/etc/default/grub
Code:
# sed -e 's/#.*$//' -e '/^$/d' /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts rd.driver.pre=vfio-pci"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"
GRUB_DISABLE_OS_PROBER=true
GRUB_DISABLE_RECOVERY="true"

/etc/modprobe.d/vfio_pci.conf
Code:
options vfio_pci disable_vga=1
options vfio-pci ids=10de:13c2,10de:0fbb,10de:11c0,10de:0e0b

/etc/modprobe.d/kvm.conf
Code:
options kvm ignore_msrs=1

/etc/modprobe.d/blacklist.conf
Code:
blacklist nouveau
blacklist nvidia

Adding a hostpci option to vmid.conf does not generate a working invocation of KVM for me, so I manually start VMs similar to this:
Code:
/usr/bin/kvm \
-id 110 \
-chardev socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait \
-mon chardev=qmp,mode=control \
-pidfile /var/run/qemu-server/110.pid \
-daemonize \
-smbios type=1,uuid=aecb408f-89ef-44ef-9a7a-a7fa9d6f75f8 \
-drive if=pflash,format=raw,readonly,file=/usr/share/kvm/OVMF-pure-efi.fd \
-drive if=pflash,format=raw,file=/tmp/110-OVMF_VARS-pure-efi.fd \
-name Test-PC-1 \
-smp 8,sockets=1,cores=8,maxcpus=8 \
-nodefaults \
-boot menu=on,strict=on,reboot-timeout=1000 \
-vga none \
-nographic \
-no-hpet \
-cpu host,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off \
-m 8196 \
-k en-us \
-readconfig /usr/share/qemu-server/pve-q35.cfg \
-device usb-tablet,id=tablet,bus=ehci.0,port=1 \
-device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0 \
-device vfio-pci,host=04:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0 \
-device usb-host,hostbus=1,hostport=6.1,id=usb0 \
-device usb-host,hostbus=1,hostport=6.2,id=usb1 \
-device usb-host,hostbus=1,hostport=6.3,id=usb2 \
-device usb-host,hostbus=1,hostport=6.4,id=usb3 \
-device usb-host,hostbus=1,hostport=6.5,id=usb4 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \
-iscsi initiator-name=iqn.1993-08.org.debian:01:3f1e9afe6fdb \
-drive file=/dev/zvol/rpool/data/vm-110-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on \
-device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 \
-drive file=/dev/zvol/tank0/vm-110-disk-1,if=none,id=drive-virtio1,cache=writeback,format=raw,aio=threads,detect-zeroes=on \
-device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb \
-netdev type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on \
-device virtio-net-pci,mac=62:63:65:65:32:31,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 \
-rtc driftfix=slew,base=localtime \
-machine type=q35 \
-global kvm-pit.lost_tick_policy=discard

I'm trying to load the vfio modules in the initrd to avoid a potential conflict with the host OS gpu by adding the following to /etc/initramfs-tools/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
As well as adding the "rd.driver.pre=vfio-pci" to /etc/default/grub.

I'm not sure if I am successfully avoiding conflict, however. Even though the boot GPU is reporting that it's bound to vfio-pci...

Code:
lspci -v -s 03:00
03:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 660] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: eVga.com. Corp. Device 2662
        Flags: bus master, fast devsel, latency 0, IRQ 10
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at b8000000 (64-bit, prefetchable) [size=128M]
        Memory at b6000000 (64-bit, prefetchable) [size=32M]
        I/O ports at ac00
        Expansion ROM at f7f00000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: vfio-pci

03:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
        Subsystem: eVga.com. Corp. Device 2662
        Flags: bus master, fast devsel, latency 0, IRQ 5
        Memory at f7ffc000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Kernel driver in use: vfio-pci

... it is the only GPU that I cannot pass through to a VM at all. The screen attached to the device freezes at the initrd when loading the vfio-pci module, which I would more or less expect, but when I attempt to start a VM with that GPU, the screen just goes idle/powersave.

Unsure if that is related to my stability problem.

I have a second motherboard to test, but due to a kernel bug and some difficulties I'm having patching it (as described in this thread), it will likely take a while longer before I can report if the problem follows the board or not.

While I work on that, if anyone has any pointers, I'd really appreciate it.
 
Last edited:
I may have solved this. I (1) reinstalled the node (2) without ZFS and (3) the VMs with NUMA enabled. One or a combination of those three things likely solved my issue.

The problem did follow me to the other node, so I'm fairly certain this is a reproducible problem if I set up a node the same way again.

I notice the invocation of KVM now starts with:

Code:
/usr/bin/systemd-run \
--scope \
--slice qemu \
--unit 110 \
-p KillMode=none \
-p CPUShares=1000 /usr/bin/kvm \

... as opposed to starting with "/usr/bin/kvm". Not sure if that helps.

Over the coming weeks I'll experiment some more and update this thread for posterity if I get any more info.

This may be the end of a months long struggle for me with vfio/iommu. :D
 
(With the caveat that I realize this is clearly not supported.)

That was not the end of my months long struggle with vfio/iommu :D

It turns out that ZFS is not the problem. It seems the only way I can multi-seat to work is by having the OS and the VMs on a single disk.

I've tried MDADM with and without LVM, I've tried MDADM raid 1 and even 0.

I thought for sure that in the worst case scenario I would be able to assign a VM per disk, with XFS. Not so.

Depending on the configuration, it's almost usable. For example, with zfs root and separate zfs RAID disks, the VMs won't crash unless both are doing some fairly intensive 3D acceleration.

OTOH, with an XFS root and VM disk images on XFS partitions on respective disks (i.e. sdb1, sdc1), the running VM will crash as soon as the 2nd VM is started, then the 2nd VM will crash at the windows logon screen.

I'm pretty much back at the drawing board.

Beyond that, I really don't know. I currently have the system set up in almost the most basic way I can to have something acceptable:
-OS on a single 120 GB SSD
-VM Root Pool on 3 240 GB SSD, Raid Z-1

One thing I forgot to mention that may be indicative of something: soft rebooting a VM will always cause that VM's display to get garbled on POST. I don't even have to get into Windows, if that happens I know the VM is beyond salvation, and the second one is going down too.

I'm beginning to think this is somehow tied to my X58 chipset mbs (happens identically on both a Gigabyte and Asus board with that chipset), or the qemu/kvm that comes with Proxmox. A third possibility may be some server-oriented tuning cooked into Proxmox. (Maybe I'll do single disk this time with regular Debian, and see if there is some change.)

Proxmox has a bug which sets HV_Vendor_ID to 'Proxmox' rather than 'Nvidia43Fix', which causes a Code 43 in the Nvidia Driver (it says in device manager: "has reported a problem and has been stopped", or some such). As a result I launch the VMs from the console based on the tweaked output of 'qm showcmd <vmid>':

VM1:
Code:
# sed -e 's/#.*$//' -e '/^$/d' /root/src/brian.1
/usr/bin/systemd-run \
--scope \
--slice qemu \
--unit 110 \
-p KillMode=none \
-p CPUShares=250000 \
/usr/bin/kvm -id 110 \
-chardev socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait \
-mon chardev=qmp,mode=control \
-pidfile /var/run/qemu-server/110.pid \
-daemonize \
-smbios type=1,uuid=6a9ea4a2-48bd-415e-95fb-adf8c9db44f7 \
-drive if=pflash,format=raw,readonly,file=/usr/share/kvm/OVMF-pure-efi.fd \
-drive if=pflash,format=raw,file=/root/sbin/110-OVMF_VARS-pure-efi.fd \
-name Brian-PC \
-smp 12,sockets=1,cores=12,maxcpus=12 \
-nodefaults \
-boot menu=on,strict=on,reboot-timeout=1000 \
-vga none \
-nographic \
-no-hpet \
-cpu host,hv_vendor_id=Nvidia43FIX,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_relaxed,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off \
-m 8192 \
-object memory-backend-ram,size=8192M,id=ram-node0 \
-numa node,nodeid=0,cpus=0-11,memdev=ram-node0 \
-k en-us \
-readconfig /usr/share/qemu-server/pve-q35.cfg \
-device usb-tablet,id=tablet,bus=ehci.0,port=1 \
-device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0 \
-device vfio-pci,host=04:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0 \
-device usb-host,hostbus=1,hostport=6.1 \
-device usb-host,hostbus=1,hostport=6.2.1 \
-device usb-host,hostbus=1,hostport=6.2.2 \
-device usb-host,hostbus=1,hostport=6.2.3 \
-device usb-host,hostbus=1,hostport=6.2 \
-device usb-host,hostbus=1,hostport=6.3 \
-device usb-host,hostbus=1,hostport=6.4 \
-device usb-host,hostbus=1,hostport=6.5 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \
-drive file=/dev/zvol/SSD-pool/vm-110-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on \
-device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 \
-netdev type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on \
-device virtio-net-pci,mac=32:61:36:63:37:64,netdev=net0,bus=pci.0,addr=0x12,id=net0 \
-rtc driftfix=slew,base=localtime \
-machine type=q35 \
-global kvm-pit.lost_tick_policy=discard

VM2:
Code:
# sed -e 's/#.*$//' -e '/^$/d' /root/src/madzia.2
/usr/bin/systemd-run \
--scope \
--slice qemu \
--unit 111 \
-p KillMode=none \
-p CPUShares=250000 \
/usr/bin/kvm \
-id 111 \
-chardev socket,id=qmp,path=/var/run/qemu-server/111.qmp,server,nowait \
-mon chardev=qmp,mode=control \
-pidfile /var/run/qemu-server/111.pid \
-daemonize \
-smbios type=1,uuid=55d862f4-d9b9-40ab-9b0a-e1eadf874750 \
-drive if=pflash,format=raw,readonly,file=/usr/share/kvm/OVMF-pure-efi.fd \
-drive if=pflash,format=raw,file=/root/sbin/111-OVMF_VARS-pure-efi.fd \
-name Madzia-PC \
-smp 12,sockets=1,cores=12,maxcpus=12 \
-nodefaults \
-boot menu=on,strict=on,reboot-timeout=1000 \
-vga none \
-nographic \
-no-hpet \
-cpu host,hv_vendor_id=Nvidia43FIX,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_relaxed,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off \
-m 8192 \
-object memory-backend-ram,size=8192M,id=ram-node0 \
-numa node,nodeid=0,cpus=0-11,memdev=ram-node0 \
-k en-us \
-readconfig /usr/share/qemu-server/pve-q35.cfg \
-device usb-tablet,id=tablet,bus=ehci.0,port=1 \
-device vfio-pci,host=05:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0 \
-device vfio-pci,host=05:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0 \
-device usb-host,hostbus=2,hostport=2.1 \
-device usb-host,hostbus=2,hostport=2.2 \
-device usb-host,hostbus=2,hostport=2.3 \/
-device usb-host,hostbus=2,hostport=2.4 \
-device usb-host,hostbus=2,hostport=2.5 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \
-iscsi initiator-name=iqn.1993-08.org.debian:01:1530d013b944 \
-drive file=/dev/zvol/SSD-pool/vm-111-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on \
-device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 \
-netdev type=tap,id=net0,ifname=tap111i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on \
-device virtio-net-pci,mac=4E:F0:DD:90:DB:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0 \
-rtc driftfix=slew,base=localtime \
-machine type=q35 \
-global kvm-pit.lost_tick_policy=discard

However, I've tried many invocations of KVM without success.

Here is how I load my modules:

Code:
# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

Code:
# cat /etc/modprobe.d/vfio_pci.conf
options vfio_pci disable_vga=1
#install vfio_pci /root/sbin/vfio-pci-override-vga.sh
options vfio-pci ids=10de:13c2,10de:0fbb,10de:11c0,10de:0e0b

Code:
# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=4299967296

Code:
# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1

... I believe grub is set up correctly ...

Code:
# sed -e 's/#.*$//' -e '/^$/d' /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 quiet"
GRUB_CMDLINE_LINUX=""
GRUB_DISABLE_OS_PROBER=true
GRUB_DISABLE_RECOVERY="true"

... I believe I have all the correct modules loaded on boot ...

Code:
# sed -e 's/#.*$//' -e '/^$/d' /etc/modules
coretemp
it87
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

... I should have plenty of RAM, right?

Code:
# cat /proc/meminfo
# cat /proc/meminfo
MemTotal:       24745644 kB
MemFree:        19677712 kB
MemAvailable:   19597404 kB
Buffers:               0 kB
Cached:           118192 kB
SwapCached:         9384 kB
Active:           559848 kB
Inactive:         561096 kB
Active(anon):     510512 kB
Inactive(anon):   547588 kB
Active(file):      49336 kB
Inactive(file):    13508 kB
Unevictable:       18328 kB
Mlocked:           18328 kB
SwapTotal:       8388604 kB
SwapFree:        8353328 kB
Dirty:               116 kB
Writeback:             0 kB
AnonPages:       1016796 kB
Mapped:            81500 kB
Shmem:             50492 kB
Slab:            3592040 kB
SReclaimable:      27680 kB
SUnreclaim:      3564360 kB
KernelStack:       12688 kB
PageTables:        14552 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    20761424 kB
Committed_AS:    3014924 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      153088 kB
DirectMap2M:     9275392 kB
DirectMap1G:    15728640 kB

... and plenty of CPU.

Code:
# cat /proc/cpuinfo | grep -A 5 processor . "\\: 11"
# cat /proc/cpuinfo | grep  -A 4 processor.*": 11"
processor       : 11
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X 000  @ 2.93GHz

The MB is an Asus Rampage III. A Gigabyte GA-EX58 does the exact same thing. I have two more LGA1366 I could test, but I'm sure it would be futile.

If anyone has any suggestions, I would greatly appreciate it.
 
Last edited:
Having received no response here or on a couple of relevant mailing lists, I've started from scratch, and I'm attempting to see what the minimum I can do to trigger the problem.

I've confirmed that it still works when the host and the VMs are all on the same disk, even if that disk is just a spindle disk.

The quick version of the steps to get it working are:

  • Install Proxmox using any FS
  • Setup 2 Windows VMs on the root disk
  • Passthru GPUs to VMs and set up host as per wiki
  • Use qm showcmd <vmid> and/or ps ax|grep kvm to get a 'nearly working' invocation of kvm
  • Modify that invocation so that hv_vendor_id=Nvidia43FIX (not 'proxmox' or the GPU won't work with code 43 in device manager)
  • Run Both VMs
  • It works

In my case, I cloned one of the VMs from the prior installation, to control for the VM itself being the cause.

Tonight, I'll create a second disk for one and then the other VM on a second physical disk, to see if that alone is enough to cause the problem.

If it doesn't I'll move one and then the second VM to the next disk to see if that does it.

I'm sure one of those things will trigger it. (It's almost worse if it doesn't.) If anyone has any better ideas, please let me know.

Thanks.
 
  • Like
Reactions: gcakici

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!