Win 10 VDI w/GPU Passthru - Desktop Lag/Jitter Troubleshooting

garlicknots · Sep 20, 2023

Hello,

I'm building out a VDI model for specialized workstations for a host of reasons, but the primary being malware resilience. I believe if I build a Windows desktop platform underpinned by Proxmox, I'll be able to leverage the PVE for rapid restore of the guest in the case of a Windows credentials-based or exploit-based outbreak... as said there are many other benefits but this is my primary reason for building stations with this model.

I am experiencing intermittent jitter that presents as a brief (<1s) reduction in responsiveness over the entire Guest. A visual indicator of the behavior is that the mouse stops moving then jumps to a new location on screen. Running Unigine Superposition Benchmark I get a measure of GPU utilization. When a jitter event is experienced, the GPU utilization drops below 99/100% down to sometimes as low as ~50%.

The architecture of the platform is pretty simple, that said -one way this may differ from other VDI implementations is that the operator is physically in front of the PVE host and is not connecting via a remote access tool. GPU passthrough to the guest is enabled with Primary GPU checked as is passthrough for the USB controller.

PVE Specs:
13900k (in performance mode and boosting per proc output)
64gb RAM
Quadro A4000
vmbr0=prodnet
vmbr1=privnet
1tb nvme (lvm PVE & guest)
1tb spinning (xfs, vm backups only)
hugepages enabled
Kernel Version Linux 6.2.16-12-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-12 (2023-09-04T13:21Z)
PVE Manager Version pve-manager/8.0.4/d258a813cfa6b390
CPU C-States Disabled
HT Enabled
Turboboost Enabled
Speedstep Disabled

Guest Specs:
q35, 8.0
32 cores, 24 vCPUs @ Host Type (no defined affinity, have attempted several affinity definitions with no improvement)
---Originally 24 cores / 24 vCPUs but was playing with these allocations extensively yesterday and ended up leaving it here.
---Limiting the guest threads through CPU limit can result in this behavior
---Utilizing all P cores and P Hyperthreads through affinity definition compounds the effect
32gb RAM no balloon
300gb disk on primary nvme - virtio scsi single, iothread, threads
virtio nic- 8 queues
PCIe GPU passthrough- pcie, all functions, primary gpu
PCIe USB passthrough- pcie, all functions
---Logitech G502 reduced polling to 125
MSI is enabled for all supporting hardware (used MSI Util V3)

Emulated ICH9 USB devices generate latency, I've attempted to disable them on the PVE level but failed. These are disabled within the guest to reduce latency. This latency is observed when pinging the Guest over vmbr1 from the PVE. Average latency with these devices left enabled was ~0.500ms, while disabled ~0.130ms. I'd really like to be able to just get rid of that ICH9 emulated controller completely if anyone has a method that works with Proxmox 8. The posts on this forum where people have removed that controller were with previous PVE versions and no longer work.

Here's my vm.conf, Please help me get to the root of this!

Code:

agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 32
cpu: host,flags=+spec-ctrl;+pdpe1gb;+aes
cpuunits: 2048
efidisk0: local-lvm:vm-1009-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:00:14.0
hostpci1: 0000:01:00.0,pcie=1,x-vga=1
hotplug: disk,network,usb
ide2: cdrom,media=cdrom
machine: pc-q35-8.0
memory: 32768
meta: creation-qemu=8.0.2,ctime=1693933670
name: FLUBBER
net0: virtio=C2:51:6D:DC:5F:F6,bridge=vmbr1,firewall=1,queues=8
numa: 0
onboot: 1
ostype: win10
scsi0: local-lvm:vm-1009-disk-1,aio=threads,discard=on,iothread=1,size=300G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=280b2ac6-cbf5-41d6-b90f-893212834b21
sockets: 1
tablet: 0
tpmstate0: local-lvm:vm-1009-disk-2,size=4M,version=v2.0
vcpus: 24
vmgenid: a42f51de-8cf8-4b75-b4b7-39797ef2f0f5

/edit:
When running LatencyMon within the guest, the effect is highly exaggerated
tcpip.sys, afd.sys, wfd01000.sys, nvlddmkm.sys have the highest latencies.\

/edit2:

Highest latency for the nvidia driver is 174ms. That's a slowwww frame for such a GPU.

Code:

nvlddmkm.sys
NVIDIA Windows Kernel Mode Driver, Version 537.13
0
46839
174.346490
556.589485
0xFFFFF801'55210000
60256256
NVIDIA Corporation
NVIDIA Windows Kernel Mode Driver, Version 537.13
31.0.15.3713
C:\WINDOWS\system32\driverstore\filerepository\nv_dispwi.inf_amd64_91711286cccb35d0\nvlddmkm.sys

garlicknots · Sep 21, 2023

May have a clue - the PID running this guest is utilizing ~50% CPU per top while the guest itself is idle.

Removed the PCIe passthroughs to see if they may have been influencing the usage. No apparent change booting the guest without them.

/edit:
Changing CPU type from Host has reduced usage, but my GPU PCIe passthrough don't seem to work with anything other than Host. If I boot the guest with any other CPU type (that actually starts) the screen never draws.

mac.linux.free · Sep 21, 2023

pls try not to use queues on an win vm

garlicknots · Sep 21, 2023

@mac.linux.free - queues removed at your instruction.

mac.linux.free · Sep 21, 2023

garlicknots said:
@mac.linux.free - queues removed at your instruction.

and cpu=host (flags are useless)

mac.linux.free · Sep 21, 2023

mac.linux.free said:
and cpu=host (flags are useless)

+ scsi discard off

garlicknots · Sep 21, 2023

Done and done

mac.linux.free · Sep 21, 2023

Turboboost also useless in a VM.

garlicknots · Sep 21, 2023

There has been no tangible impact to performance with these adjustments.

leesteken · Sep 21, 2023

As an experiment, could you try running the VM with 8 cores (8 vcpus) and 16GB? Can you try without Primary GPU (x-vga=1), which is only intended for consumer NVidia GPUs?

garlicknots · Sep 21, 2023

@leesteken done, still experiencing the jitter

mac.linux.free · Sep 21, 2023

my only good experience is with server boards

garlicknots · Sep 21, 2023

After reverting the changes between my OP and now I am seeing more exaggerated and frequent stutter than previously.

garlicknots · Sep 22, 2023

Finally got those pesky ich9 controllers gone. I'm really not making any tangible progress with the stuttering though.

garlicknots · Sep 25, 2023

By changing the machine type to i440 and then back to q35, the machine performance improved. Noticed the qemu string sent hpet=off as a parameter. That appears to possibly be the biggest impacting piece of the config. I'm still getting slight stutter/mouse jitter though.

I need this passthrough VDI to perform like bare metal. Or at least well enough to convince a non-savvy user as such. I'm getting close...

leesteken · Sep 25, 2023

WIndows 10 with GPU and USB controller passthrough works fine here. I don't have Hyper-Threading and E-cores., so maybe it's Intel or 13th-gen specific. Maybe jumping work between P- and E-cores or trying to use a HT as a full core causes the stuttering? Try disabling both in the BIOS for a test (and don't give the VM all remaining cores to Proxmox can do work in the background).

garlicknots · Sep 25, 2023

I was thinking something similar RE: 13th gen core scheduling. I did try affinity for the vm threads, placing them on the 8 real cores and 16 e cores. Then I stopped specifically defining the affinity and that's exactly what Proxmox did by default so apparently they are already doing a good job of finding the best cores to execute on. I haven't tried to just disable those features in BIOS, but that's now on the list. Thanks.

I'm about to build another one of these and instead of fiddling with the machine type in Proxmox, I'm just going to disable the HPET in Windows and see how it performs. Thanks again for the assists @leesteken I really appreciate it.

garlicknots · Sep 28, 2023

Somehow I managed to get rid of ich9 PCIe host ports as well on one of my machines. Can't successfully recreate and don't know if it's worth doing but thought it was kind of interesting

I've almost completely eliminated the stutter, but it's not gone. If you listen to a youtube video you will very periodically hear the audio glitch very slightly. I'm also seeing correlating drops in GPU usage when benchmarking. The whole Guest lurches briefly, GPU usage drops from 99/100% to as low as 50% and then climbs back until the next instance.

Code:

/usr/bin/kvm -id 1009 -name FLUBBER2,debug-threads=on
-no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/1009.qmp,server=on,wait=off
-mon chardev=qmp,mode=control
-chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5
-mon chardev=qmp-event,mode=control
-pidfile /var/run/qemu-server/1009.pid
-daemonize
-smbios type=1,uuid=280b2ac6-cbf5-41d6-b90f-893212834b21
-drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd
-drive if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/pve/vm-1009-disk-0,size=540672
-smp 24,sockets=1,cores=24,maxcpus=24 -nodefaults
-boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg
-vga none
-nographic
-cpu host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+pdpe1gb
-m 32768 -object iothread,id=iothread-virtioscsi0
-readconfig /usr/share/qemu-server/pve-q35-4.0.cfg
-device vmgenid,guid=a42f51de-8cf8-4b75-b4b7-39797ef2f0f5
-device vfio-pci,host=0000:01:00.0,id=hostpci0
-device vfio-pci,host=0000:00:14.0,id=hostpci1.0.0,multifunction=on
-device vfio-pci,host=0000:00:14.2,id=hostpci1.1.1
-chardev socket,id=tpmchar,path=/var/run/qemu-server/1009.swtpm
-tpmdev emulator,id=tpmdev,chardev=tpmchar -device tpm-tis,tpmdev=tpmdev
-chardev socket,path=/var/run/qemu-server/1009.qga,server=on,wait=off,id=qga0
-device virtio-serial,id=qga0,bus=pci.0,addr=0x8
-device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
-iscsi initiator-name=iqn.1993-08.org.debian:01:82746687d257
-drive file=/dev/cdrom,if=none,id=drive-ide2,media=cdrom,aio=io_uring
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101
-device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0
-drive file=/dev/pve/vm-1009-disk-1,if=none,id=drive-scsi0,cache=writeback,aio=threads,discard=on,format=raw,detect-zeroes=unmap
-device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=100
-netdev type=tap,id=net0,ifname=tap1009i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=8
-device virtio-net-pci,mac=C2:51:6D:DC:5F:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,vectors=18,mq=on,packed=on,rx_queue_size=1024,tx_queue_size=256,bootindex=102
-rtc driftfix=slew,base=localtime -machine hpet=off,type=pc-q35-8.0+pve0
-global kvm-pit.lost_tick_policy=discard

Code:

/usr/bin/kvm -id 1004 -name FLUBBER,debug-threads=on
-no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/1004.qmp,server=on,wait=off
-mon chardev=qmp,mode=control
-chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5
-mon chardev=qmp-event,mode=control
-pidfile /var/run/qemu-server/1004.pid
-daemonize
-smbios type=1,uuid=b42d5ca8-89a6-4c71-ae16-559742b91a20
-drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd
-drive if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/pve/vm-1004-disk-0,size=540672
-smp 24,sockets=1,cores=24,maxcpus=24 -nodefaults
-boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg
-vga none
-nographic
-cpu host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt
-m 32768 -object iothread,id=iothread-virtio0
-readconfig /usr/share/qemu-server/pve-q35-4.0.cfg
-device vmgenid,guid=3be7ef44-00a6-40db-9aeb-ab917b7175b0
-device vfio-pci,host=0000:01:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0
-device vfio-pci,host=0000:00:14.0,id=hostpci1.0,bus=ich9-pcie-port-2,addr=0x0.0,multifunction=on
-device vfio-pci,host=0000:00:14.2,id=hostpci1.1,bus=ich9-pcie-port-2,addr=0x0.1
-chardev socket,id=tpmchar,path=/var/run/qemu-server/1004.swtpm
-tpmdev emulator,id=tpmdev,chardev=tpmchar
-device tpm-tis,tpmdev=tpmdev
-chardev socket,path=/var/run/qemu-server/1004.qga,server=on,wait=off,id=qga0
-device virtio-serial,id=qga0,bus=pci.0,addr=0x8
-device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
-iscsi initiator-name=iqn.1993-08.org.debian:01:d84acb8f6f9f
-drive file=/dev/cdrom,if=none,id=drive-ide2,media=cdrom,aio=io_uring
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101
-drive file=/dev/pve/vm-1004-disk-1,if=none,id=drive-virtio0,cache=writeback,aio=io_uring,discard=on,format=raw,detect-zeroes=unmap
-device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100
-netdev type=tap,id=net0,ifname=tap1004i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
-device virtio-net-pci,mac=9A:3A:9B:B8:15:B1,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=102
-rtc driftfix=slew,base=localtime -machine hpet=off,type=pc-q35-8.0+pve0 -global kvm-pit.lost_tick_policy=discard

Search

Search

Win 10 VDI w/GPU Passthru - Desktop Lag/Jitter Troubleshooting

garlicknots

New Member

garlicknots

New Member

mac.linux.free

Renowned Member

garlicknots

New Member

mac.linux.free

Renowned Member

mac.linux.free

Renowned Member

garlicknots

New Member

mac.linux.free

Renowned Member

garlicknots

New Member

leesteken

Distinguished Member

garlicknots

New Member

mac.linux.free

Renowned Member

garlicknots

New Member

garlicknots

New Member

garlicknots

New Member

leesteken

Distinguished Member

garlicknots

New Member

garlicknots

New Member