I'm experimenting with PCI passthru and multiseat on proxmox.
The former works well, and impressively so. I can even pass two GPU to one VM, and have no problem.
However, if I start a second VM and pass a GPU to it, then within minutes of starting anything 3D intensive on one, such as a game or a benchmark, the GPU driver will first crash on that VM, and then somehow it also always crashes on the other.
I can expedite this process by also doing something 3D intensive on the second VM. This will cause a GPU driver crash on one or the other VM within minutes at most, and then on the other VM seconds after that.
Sometimes Windows will report that the GPU driver has crashed and recovered, but it never really does, and must be hard powered down to restore stability.
/etc/default/grub
/etc/modprobe.d/vfio_pci.conf
/etc/modprobe.d/kvm.conf
/etc/modprobe.d/blacklist.conf
Adding a hostpci option to vmid.conf does not generate a working invocation of KVM for me, so I manually start VMs similar to this:
I'm trying to load the vfio modules in the initrd to avoid a potential conflict with the host OS gpu by adding the following to /etc/initramfs-tools/modules:
As well as adding the "rd.driver.pre=vfio-pci" to /etc/default/grub.
I'm not sure if I am successfully avoiding conflict, however. Even though the boot GPU is reporting that it's bound to vfio-pci...
... it is the only GPU that I cannot pass through to a VM at all. The screen attached to the device freezes at the initrd when loading the vfio-pci module, which I would more or less expect, but when I attempt to start a VM with that GPU, the screen just goes idle/powersave.
Unsure if that is related to my stability problem.
I have a second motherboard to test, but due to a kernel bug and some difficulties I'm having patching it (as described in this thread), it will likely take a while longer before I can report if the problem follows the board or not.
While I work on that, if anyone has any pointers, I'd really appreciate it.
The former works well, and impressively so. I can even pass two GPU to one VM, and have no problem.
However, if I start a second VM and pass a GPU to it, then within minutes of starting anything 3D intensive on one, such as a game or a benchmark, the GPU driver will first crash on that VM, and then somehow it also always crashes on the other.
I can expedite this process by also doing something 3D intensive on the second VM. This will cause a GPU driver crash on one or the other VM within minutes at most, and then on the other VM seconds after that.
Sometimes Windows will report that the GPU driver has crashed and recovered, but it never really does, and must be hard powered down to restore stability.
/etc/default/grub
Code:
# sed -e 's/#.*$//' -e '/^$/d' /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts rd.driver.pre=vfio-pci"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"
GRUB_DISABLE_OS_PROBER=true
GRUB_DISABLE_RECOVERY="true"
/etc/modprobe.d/vfio_pci.conf
Code:
options vfio_pci disable_vga=1
options vfio-pci ids=10de:13c2,10de:0fbb,10de:11c0,10de:0e0b
/etc/modprobe.d/kvm.conf
Code:
options kvm ignore_msrs=1
/etc/modprobe.d/blacklist.conf
Code:
blacklist nouveau
blacklist nvidia
Adding a hostpci option to vmid.conf does not generate a working invocation of KVM for me, so I manually start VMs similar to this:
Code:
/usr/bin/kvm \
-id 110 \
-chardev socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait \
-mon chardev=qmp,mode=control \
-pidfile /var/run/qemu-server/110.pid \
-daemonize \
-smbios type=1,uuid=aecb408f-89ef-44ef-9a7a-a7fa9d6f75f8 \
-drive if=pflash,format=raw,readonly,file=/usr/share/kvm/OVMF-pure-efi.fd \
-drive if=pflash,format=raw,file=/tmp/110-OVMF_VARS-pure-efi.fd \
-name Test-PC-1 \
-smp 8,sockets=1,cores=8,maxcpus=8 \
-nodefaults \
-boot menu=on,strict=on,reboot-timeout=1000 \
-vga none \
-nographic \
-no-hpet \
-cpu host,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off \
-m 8196 \
-k en-us \
-readconfig /usr/share/qemu-server/pve-q35.cfg \
-device usb-tablet,id=tablet,bus=ehci.0,port=1 \
-device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0 \
-device vfio-pci,host=04:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0 \
-device usb-host,hostbus=1,hostport=6.1,id=usb0 \
-device usb-host,hostbus=1,hostport=6.2,id=usb1 \
-device usb-host,hostbus=1,hostport=6.3,id=usb2 \
-device usb-host,hostbus=1,hostport=6.4,id=usb3 \
-device usb-host,hostbus=1,hostport=6.5,id=usb4 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \
-iscsi initiator-name=iqn.1993-08.org.debian:01:3f1e9afe6fdb \
-drive file=/dev/zvol/rpool/data/vm-110-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on \
-device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 \
-drive file=/dev/zvol/tank0/vm-110-disk-1,if=none,id=drive-virtio1,cache=writeback,format=raw,aio=threads,detect-zeroes=on \
-device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb \
-netdev type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on \
-device virtio-net-pci,mac=62:63:65:65:32:31,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 \
-rtc driftfix=slew,base=localtime \
-machine type=q35 \
-global kvm-pit.lost_tick_policy=discard
I'm trying to load the vfio modules in the initrd to avoid a potential conflict with the host OS gpu by adding the following to /etc/initramfs-tools/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
I'm not sure if I am successfully avoiding conflict, however. Even though the boot GPU is reporting that it's bound to vfio-pci...
Code:
lspci -v -s 03:00
03:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 660] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. Device 2662
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Memory at b8000000 (64-bit, prefetchable) [size=128M]
Memory at b6000000 (64-bit, prefetchable) [size=32M]
I/O ports at ac00
Expansion ROM at f7f00000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: vfio-pci
03:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
Subsystem: eVga.com. Corp. Device 2662
Flags: bus master, fast devsel, latency 0, IRQ 5
Memory at f7ffc000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Kernel driver in use: vfio-pci
... it is the only GPU that I cannot pass through to a VM at all. The screen attached to the device freezes at the initrd when loading the vfio-pci module, which I would more or less expect, but when I attempt to start a VM with that GPU, the screen just goes idle/powersave.
Unsure if that is related to my stability problem.
I have a second motherboard to test, but due to a kernel bug and some difficulties I'm having patching it (as described in this thread), it will likely take a while longer before I can report if the problem follows the board or not.
While I work on that, if anyone has any pointers, I'd really appreciate it.
Last edited: