[SOLVED] Hyper-V kills GPU performance on windows vm

mio19

New Member
Jan 27, 2024
7
0
1
I found that KVM nested virtualization is enabled by default and I can enable hyper-v in windows 11 vm without any configuration. However GPU passthough performance is very bad. I can have stable 60fps without hyper-v but with hyper-v enable, framerate is less than 30fps and unstable. How can I trobuleshoot this problem?

Host CPU: E5 2666 v3
GPU: 3080ti

Code:
# cat /etc/pve/qemu-server/111.conf
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide0;net0;ide2
cores: 20
cpu: host
cpuunits: 10000
efidisk0: local-zfs:vm-111-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:03:00,pcie=1,x-vga=1
ide2: local:iso/virtio-win.iso,media=cdrom,size=612812K
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.2,ctime=1706379477
name: wincomp
net0: virtio=BC:24:11:FD:95:CF,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
parent: s10-running
scsi0: local-zfs:vm-111-disk-1,discard=on,iothread=1,size=2T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=cb10f63d-def4-4be8-b18d-28bb6393ed2c
sockets: 1
tpmstate0: local-zfs:vm-111-disk-2,size=1M,version=v2.0
usb0: host=1ea7:0002
usb1: host=1ea7:0064
vga: none
vmgenid: 2655db95-eadb-4ebf-ab20-a286a80384d2

[s10-running]
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide0;net0;ide2
cores: 20
cpu: host
cpuunits: 10000
efidisk0: local-zfs:vm-111-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:03:00,pcie=1,x-vga=1
ide2: local:iso/virtio-win.iso,media=cdrom,size=612812K
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.2,ctime=1706379477
name: wincomp
net0: virtio=BC:24:11:FD:95:CF,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
scsi0: local-zfs:vm-111-disk-1,discard=on,iothread=1,size=2T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=cb10f63d-def4-4be8-b18d-28bb6393ed2c
snaptime: 1710395847
sockets: 1
tpmstate0: local-zfs:vm-111-disk-2,size=1M,version=v2.0
usb0: host=067b:2731
vga: none
vmgenid: 2655db95-eadb-4ebf-ab20-a286a80384d2
 
hi,

how do you measure the performance? are you sure it's the gpu that's the problem and not e.g. the cpu ?
AFAIR windows sometimes moves some processes into transparent vms if hyper-v is enabled (for security i guess?)

do you need hyper-v in that vm though?
 
  • Like
Reactions: Hqu
hi,

how do you measure the performance? are you sure it's the gpu that's the problem and not e.g. the cpu ?
AFAIR windows sometimes moves some processes into transparent vms if hyper-v is enabled (for security i guess?)

do you need hyper-v in that vm though?
cpu could be the problem. I will try to measure the performance more precisely.

> transparent vms
Is is called Core Isolation?

Yes I want hyper-v in that vm for WSA and WSL
 
mhmm. no good idea sadly, but it seems more like a hyper-v/windows problem ...
 
Have you tired to enable Hyper-V Enlightenments (see my other post here) by adding something like the following to your vm config:
Code:
args: -cpu host,hv_passthrough
I'm using nested Hyper-V with approx. 10-15% CPU performance loss in Windows 10 Pro and no GPU effect with an RTX 4070 + Intel 13700T (6P):
1710624568209.png
 
Last edited:
  • Like
Reactions: mio19
great, would be interesting which one of the hv flags makes the real difference (instead of using the catchall hv_passthrough) so we could maybe enable it for future machine types (in combination with pci passthrough)
but i understand if you don't have the time to test all from the list ;)
 
great, would be interesting which one of the hv flags makes the real difference (instead of using the catchall hv_passthrough) so we could maybe enable it for future machine types (in combination with pci passthrough)
but i understand if you don't have the time to test all from the list ;)
Attached is a file with benchmark results for several Hyper-V enlightenment combinations for qemu/kvm. Below is a summary of my findings compared to the Proxmox defaults. Those are based on a Windows 10 Pro VM with q35-8.0 system and host CPU based on 13th Gen Raptor Lake Intel Core i7-13700T (Intel W680 Chipset), 12 vCPUS (exclusive cpuset of P-Cores only), 32 GB assigned VM RAM, encrypted+compressed zfs zvol on Samsung SSD 980 Pro and passed through PCIe Nvidia RTX 4070 GPU:

Strong effect (must have ~ 50-60% graphics performance):
  • hv-evmcs [requires: hv-vapic, nested specific, Intel only]
  • hv-reenlightenment [requires: hv-frequencies, nested specific, breaks migration]
Medium effect:
  • hv-stimer-direct [requires: hv-stimer, nested related]
Low or no effect:
  • hv-emsr-bitmap [requires: hv-evmcs (Intel only), nested specific]
  • hv-frequencies
  • hv-tlbflush [requires: hv-vpindex]
  • hv-tlbflush-ext [requires: hv-tlbflush]
  • hv-tlbflush-direct [requires: hv-vapic, hv-evmcs (Intel only), nested specific]
  • hv-xmm-input
Negative (unclear) effect:
  • hv-avic/hv-apicv
Untestet effect:
  • hv-no-nonarch-coresharing=on/off/auto
  • hv-crash
  • hv-syndbg
  • hv-vendor-id
  • hv-version-id-*
If I understood it correctly, running Windows 10/11 in Proxmox with enabled virtualization-based security (VBS) means running it as L2 VM in a L1 Hyper-V hypervisor. On bare metal Windows 10/11 would run as L1 VM on a L0 hypervisor. To get good performance for the L2 VMs in Proxmox, the nesting related/specific enlightenments are of utmost interest here.

I will most likely keep on using/recommending hv-passthrough since the host CPU type is also passing through all flags and usage is convenient and simple. Migration will break anyway if hv-reenlightenment is enabled and VM performance of using hv-passthrough is similar to enabling all enlightenments with strong and medium effect only. In Windows 11 Pro it seems possible to reach bare metal performance with hv-passthrough. Would be awesome if Hyper-V Enlightenment Passthrough was available from the Proxmox Web Interface / "GUI" within the CPU settings. ;)
 

Attachments

Last edited:
  • Like
Reactions: fireon
In Windows 11 Pro it seems possible to reach bare metal performance with hv-passthrough
I enabled hv-passthrough and disabled waitpkg. Also I assigned dedicated cores to the vm using CPU Affinity. But I get lag spikes sometimes on Windows 11 Pro. Do you use other performance optimizations like hugepages?
 
I enabled hv-passthrough and disabled waitpkg. Also I assigned dedicated cores to the vm using CPU Affinity. But I get lag spikes sometimes on Windows 11 Pro. Do you use other performance optimizations like hugepages?
Hi. I'm not exactly sure what kind of "lag spikes" you get. But apart from CPU isolation and Hyper-V Enlightenments I have not applied more optimizations. I have not tested Windows 11 long enough yet - just did several benchmarks and (to my surprise) got results as if the OS was running directly on the hardware.

However, roughly a year ago I had massive latency issues (freezes, sound distortion, lags, ...) with my desktop Windows 10 VM until I found this bug report about kernel freezes in KVM and applied the proposed workaround there. It still seems to be an unsolved problem and reportedly affecting the latest 6.5 kernel. Haven't tested other settings since then.

Have you already tried using their mitigation proposal for all storages connected to your VM like this below?
1711449450124.png

Edit: When you asked for other performance optimizations, I thought of "CPU affinity" in terms of CPU isolation. But there seems to be an option in the Proxmox CPU settings which allows to set the CPU affinity for a VM (using taskset according to the help). So if you are referring "only" to this CPU affinity option, there is likely some performance optimization that I have applied in addition:

Setting the CPU affinity of a single VM does not help much unless you set the affinity of all other processes and VMs, too. In addition to processes and cores, IRQ affinity also plays a role if you'd like to avoid freezes/lags. If you really want to get a low latency VM, I'd recommend using the cgroup v2 interface in combination with SMP IRQ affinity. There is a quite extensive but very useful thread/tutorial about resource isolation here in the forum. An ideal layout really depends a bit on how many physical cores you have and what exactly you'd want to achieve. :cool:

For my Core i7-13700T (8x2 P-Cores, 8 E-Cores), I found the layout below to be most efficient for having a single low-latency VM (6x2 P-Cores) next to several (10+) background VMs (8 E-Cores) and some exclusive resources for the host (2x2 P-Cores). A dedicated disk for the desktop VM is also beneficial to avoid interruptions form the host or other VMs reading/writing on the same disk (e.g. during backups, migration,etc.).
1711491754132.png

But this has gotton far away from Hyper-V now...
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!