Hello,
I'm building out a VDI model for specialized workstations for a host of reasons, but the primary being malware resilience. I believe if I build a Windows desktop platform underpinned by Proxmox, I'll be able to leverage the PVE for rapid restore of the guest in the case of a Windows credentials-based or exploit-based outbreak... as said there are many other benefits but this is my primary reason for building stations with this model.
I am experiencing intermittent jitter that presents as a brief (<1s) reduction in responsiveness over the entire Guest. A visual indicator of the behavior is that the mouse stops moving then jumps to a new location on screen. Running Unigine Superposition Benchmark I get a measure of GPU utilization. When a jitter event is experienced, the GPU utilization drops below 99/100% down to sometimes as low as ~50%.
The architecture of the platform is pretty simple, that said -one way this may differ from other VDI implementations is that the operator is physically in front of the PVE host and is not connecting via a remote access tool. GPU passthrough to the guest is enabled with Primary GPU checked as is passthrough for the USB controller.
PVE Specs:
13900k (in performance mode and boosting per proc output)
64gb RAM
Quadro A4000
vmbr0=prodnet
vmbr1=privnet
1tb nvme (lvm PVE & guest)
1tb spinning (xfs, vm backups only)
hugepages enabled
Kernel Version Linux 6.2.16-12-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-12 (2023-09-04T13:21Z)
PVE Manager Version pve-manager/8.0.4/d258a813cfa6b390
CPU C-States Disabled
HT Enabled
Turboboost Enabled
Speedstep Disabled
Guest Specs:
q35, 8.0
32 cores, 24 vCPUs @ Host Type (no defined affinity, have attempted several affinity definitions with no improvement)
---Originally 24 cores / 24 vCPUs but was playing with these allocations extensively yesterday and ended up leaving it here.
---Limiting the guest threads through CPU limit can result in this behavior
---Utilizing all P cores and P Hyperthreads through affinity definition compounds the effect
32gb RAM no balloon
300gb disk on primary nvme - virtio scsi single, iothread, threads
virtio nic- 8 queues
PCIe GPU passthrough- pcie, all functions, primary gpu
PCIe USB passthrough- pcie, all functions
---Logitech G502 reduced polling to 125
MSI is enabled for all supporting hardware (used MSI Util V3)
Emulated ICH9 USB devices generate latency, I've attempted to disable them on the PVE level but failed. These are disabled within the guest to reduce latency. This latency is observed when pinging the Guest over vmbr1 from the PVE. Average latency with these devices left enabled was ~0.500ms, while disabled ~0.130ms. I'd really like to be able to just get rid of that ICH9 emulated controller completely if anyone has a method that works with Proxmox 8. The posts on this forum where people have removed that controller were with previous PVE versions and no longer work.
Here's my vm.conf, Please help me get to the root of this!
/edit:
When running LatencyMon within the guest, the effect is highly exaggerated
tcpip.sys, afd.sys, wfd01000.sys, nvlddmkm.sys have the highest latencies.\
/edit2:
Highest latency for the nvidia driver is 174ms. That's a slowwww frame for such a GPU.
I'm building out a VDI model for specialized workstations for a host of reasons, but the primary being malware resilience. I believe if I build a Windows desktop platform underpinned by Proxmox, I'll be able to leverage the PVE for rapid restore of the guest in the case of a Windows credentials-based or exploit-based outbreak... as said there are many other benefits but this is my primary reason for building stations with this model.
I am experiencing intermittent jitter that presents as a brief (<1s) reduction in responsiveness over the entire Guest. A visual indicator of the behavior is that the mouse stops moving then jumps to a new location on screen. Running Unigine Superposition Benchmark I get a measure of GPU utilization. When a jitter event is experienced, the GPU utilization drops below 99/100% down to sometimes as low as ~50%.
The architecture of the platform is pretty simple, that said -one way this may differ from other VDI implementations is that the operator is physically in front of the PVE host and is not connecting via a remote access tool. GPU passthrough to the guest is enabled with Primary GPU checked as is passthrough for the USB controller.
PVE Specs:
13900k (in performance mode and boosting per proc output)
64gb RAM
Quadro A4000
vmbr0=prodnet
vmbr1=privnet
1tb nvme (lvm PVE & guest)
1tb spinning (xfs, vm backups only)
hugepages enabled
Kernel Version Linux 6.2.16-12-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-12 (2023-09-04T13:21Z)
PVE Manager Version pve-manager/8.0.4/d258a813cfa6b390
CPU C-States Disabled
HT Enabled
Turboboost Enabled
Speedstep Disabled
Guest Specs:
q35, 8.0
32 cores, 24 vCPUs @ Host Type (no defined affinity, have attempted several affinity definitions with no improvement)
---Originally 24 cores / 24 vCPUs but was playing with these allocations extensively yesterday and ended up leaving it here.
---Limiting the guest threads through CPU limit can result in this behavior
---Utilizing all P cores and P Hyperthreads through affinity definition compounds the effect
32gb RAM no balloon
300gb disk on primary nvme - virtio scsi single, iothread, threads
virtio nic- 8 queues
PCIe GPU passthrough- pcie, all functions, primary gpu
PCIe USB passthrough- pcie, all functions
---Logitech G502 reduced polling to 125
MSI is enabled for all supporting hardware (used MSI Util V3)
Emulated ICH9 USB devices generate latency, I've attempted to disable them on the PVE level but failed. These are disabled within the guest to reduce latency. This latency is observed when pinging the Guest over vmbr1 from the PVE. Average latency with these devices left enabled was ~0.500ms, while disabled ~0.130ms. I'd really like to be able to just get rid of that ICH9 emulated controller completely if anyone has a method that works with Proxmox 8. The posts on this forum where people have removed that controller were with previous PVE versions and no longer work.
Here's my vm.conf, Please help me get to the root of this!
Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 32
cpu: host,flags=+spec-ctrl;+pdpe1gb;+aes
cpuunits: 2048
efidisk0: local-lvm:vm-1009-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:00:14.0
hostpci1: 0000:01:00.0,pcie=1,x-vga=1
hotplug: disk,network,usb
ide2: cdrom,media=cdrom
machine: pc-q35-8.0
memory: 32768
meta: creation-qemu=8.0.2,ctime=1693933670
name: FLUBBER
net0: virtio=C2:51:6D:DC:5F:F6,bridge=vmbr1,firewall=1,queues=8
numa: 0
onboot: 1
ostype: win10
scsi0: local-lvm:vm-1009-disk-1,aio=threads,discard=on,iothread=1,size=300G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=280b2ac6-cbf5-41d6-b90f-893212834b21
sockets: 1
tablet: 0
tpmstate0: local-lvm:vm-1009-disk-2,size=4M,version=v2.0
vcpus: 24
vmgenid: a42f51de-8cf8-4b75-b4b7-39797ef2f0f5
/edit:
When running LatencyMon within the guest, the effect is highly exaggerated
tcpip.sys, afd.sys, wfd01000.sys, nvlddmkm.sys have the highest latencies.\
/edit2:
Highest latency for the nvidia driver is 174ms. That's a slowwww frame for such a GPU.
Code:
nvlddmkm.sys
NVIDIA Windows Kernel Mode Driver, Version 537.13
0
46839
174.346490
556.589485
0xFFFFF801'55210000
60256256
NVIDIA Corporation
NVIDIA Windows Kernel Mode Driver, Version 537.13
31.0.15.3713
C:\WINDOWS\system32\driverstore\filerepository\nv_dispwi.inf_amd64_91711286cccb35d0\nvlddmkm.sys
Last edited: