vGPU / host CPU setting / Win11 Auto repair loops

scyto

Active Member
Aug 8, 2023
392
70
28
Here are the top line facts:
  • I have i915 vGPU working on my host.
  • I install a new win11 VM.
  • I get the win11 VM working with VGPU
  • at some point later on a reboot the Win11 machine goes into automatic repair (this seems to be time based, not due to windows updates, not installing anything, not changing anything
  • if i set the CPU type to x86-64-v2-AES then the system boots...
  • ...but this breaks vGPU with the dreaded code 43
My gut is this is related to some of the issues seen with WSL on windows 11 with host CPU set - i assume because both use vt-d between host and guest

I have found other qemu posts (not proxmox) where this host passtrough on win11 seems to cause issues.
Things i have tried:

I need a way to keep host CPU enabled and avoid the reboot loops.
OR
I need a way with a virtualized CPU type to not have the windows intel drivers fail

I realize not everyone has this issue so i wonder if it is something to do with the specifics of my cpu?

  • Code:
    root@pve1:/etc/pve/qemu-server# cat /proc/cpuinfo
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 186
    model name      : 13th Gen Intel(R) Core(TM) i7-1360P
    stepping        : 2
    microcode       : 0x410e
    cpu MHz         : 691.790
    cache size      : 18432 KB
    physical id     : 0
    siblings        : 16
    core id         : 0
    cpu cores       : 12
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 32
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities
    vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling usr_wait_pause
    bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb
    bogomips        : 5222.40
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 39 bits physical, 48 bits virtual
    power management:
 
Last edited:
this is my VM config
Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2
cores: 4
cpu: host,hidden=1
efidisk0: vDisks:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:00:02.1,x-vga=1
hotplug: disk,network,usb
ide2: ISOs-Templates:iso/virtio-win-0.1.240.iso,media=cdrom,size=612812K
machine: pc-q35-8.1
memory: 4096
meta: creation-qemu=8.1.5,ctime=1712446324
net0: virtio=BC:24:11:6F:38:6F,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: vDisks:vm-102-disk-1,cache=writeback,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4a38e490-22ea-4c79-954f-514de484091e
sockets: 1
tpmstate0: vDisks:vm-102-disk-2,size=4M,version=v2.0
vga: virtio
vmgenid: 61c12c7c-2778-4db7-a46f-1046278510a3
 
when i have CPU set to x86-64-v2-AES i believe this is the root issue stopping vGPU from working

1712467112934.png
 
I have recreated the VM from scratch.
  • host CPU
  • install windows, bypass joining to AAD or using ms account (i have a thesis windows hello enabling VBS is causing an issue on QEMU based systems, similarly to those having issues with WSL2)
  • dns was up long enough to download intel drivers
  • then disabled DNS in guest, then enabled RDP
  • remove default video and added vGPU PCIE device
  • rebooted
  • RDP in and install drivers
  • i installed no virtio drivers whatsoever and did not install guest tools
  • added bcdedit /set hypervisorlaunchtype off for later - i am hoping this prevents things installing i don't want to....
System is working with vGPU
Only difference I can see between working system and the system that did automatic repair is that VM secuity and nested virtualization is all turned off in this working system.

Now for a few reboot to see if that causes the failure that causes automatic repair loops
 

Attachments

  • Screenshot 2024-04-07 141522 (Large).png
    Screenshot 2024-04-07 141522 (Large).png
    721 KB · Views: 10
Last edited:
After two reboots:
1. worked on first start after rebooting pve node
2. code 43 appeared on the intel device that was working after 1 reboot
2. then after another vm reboot the intel device was working fine
3. then after another 4 vm reboots still gine

Code:
$ dmesg | grep -i kvm

[  215.914651] x86/split lock detection: #AC: CPU 2/KVM/4941 took a split_lock trap at address: 0x7ef1d050
[  215.914651] x86/split lock detection: #AC: CPU 3/KVM/4942 took a split_lock trap at address: 0x7ef1d050
[  215.914653] x86/split lock detection: #AC: CPU 1/KVM/4940 took a split_lock trap at address: 0x7ef1d050
[  221.196100] x86/split lock detection: #AC: CPU 1/KVM/5096 took a split_lock trap at address: 0x7ef3d050
[  233.697853] kvm: kvm [5026]: ignored rdmsr: 0xc0011029 data 0x0
[  235.680886] kvm: kvm [5026]: ignored rdmsr: 0x309 data 0x0
[  235.680917] kvm: kvm [5026]: ignored rdmsr: 0x30a data 0x0
[  235.680928] kvm: kvm [5026]: ignored rdmsr: 0x30b data 0x0
[  235.680938] kvm: kvm [5026]: ignored rdmsr: 0x38d data 0x0
[  235.680948] kvm: kvm [5026]: ignored rdmsr: 0x38e data 0x0
[  235.680959] kvm: kvm [5026]: ignored rdmsr: 0x38f data 0x0
[  235.680969] kvm: kvm [5026]: ignored rdmsr: 0x390 data 0x0
[  235.680981] kvm: kvm [5026]: ignored rdmsr: 0xc3 data 0x0
[  235.680991] kvm: kvm [5026]: ignored rdmsr: 0xc4 data 0x0
[  240.804382] x86/split lock detection: #AC: CPU 0/KVM/4939 took a split_lock trap at address: 0xfffff80534c316bd

I get the split_locks when the VM adcn vGPU works ok
I saw the Ignore rdmsr on the boot where the vGPU had a code 43

oh as a point of note i also updated the intel microcode just incase....

Code:
root@pve1:~# dmesg | grep -i microcode
[    0.000000] microcode: updated early: 0x410e -> 0x411c, date = 2023-08-30
[    1.345209] microcode: Microcode Update Driver: v2.2.
 
Last edited:
  • enabled DNS and applied all windows updates and rebooted when prompted, then did 4 more reboots - no adverse issues
  • installed virtio guest tools and then 4 more reboots - no adverse issues
  • bound my microsoft account - 2 reboots later automatic repair
tl;dr -
  1. anything that enables virtualization protection (WSL, windows hello, etc) will cause the repair boot loop issue when host is set as CPU
  2. yes this can be mitigated with cpu args, however this breaks vGPU passthrough for me
hope that finally answers for many the mystery of what is causing the issue, don't use host CPU and dont' enroll the guest OS inAAD or use a Microsoft account or anything that uses windows hello / hyper-v / wsl

if you want to use nested virtualization or windows hello accept you are not using vGPU
 
Last edited:
I am having the same issue on a Minisforum MS-01, installed all BIOS and microcode updates, fully updated Proxmox to latest version, enable secure boot on Proxmox host.

Windows 11 VM will boot normally with CPU set to x86-64-v2-AES but not when set to host.

Only seems to occur after adding Microsoft account, definitely seems to be related to some kind of nested virtualization being done by Windows 11.

Confirmed memory isolation is disabled in VM but still not working.

Spent hours troubleshooting what I thought was a failed VM migration, turns out it's an issue with Proxmox/KVM.

Really disappointing as I don't want to have to use x86-64-v2-AES and none of the CPU arguments I've tried work for me,

I don't care much about vGPU but not being able to boot the VM at all is a problem.
 
Thanks for validating I am not crazy! I think others don't hit it because they don't use Microsoft accounts.

Its definitely to do with the secure enclave that's created in a variety of scenarios (ms account, whfb, wsl, etc), I must admin i haven't tested with out a TPM, but that's out for me as i need WhFB to work. This has been going on so long i suspect it's a fundamental upstream KVM bug possibly related to the type of hardware you and I have (i am on a NUC 13 which is very similar to your MS-01). i haven't had the time or will to prove this with normal Debian and ubuntu installations of KVM. i just used a separate hyper-v node for my testing instead :-(
 
@n64thewin you did try this args in the linked thread right? args: -cpu host,hv_passthrough,level=30,-waitpkg
 
Last edited:
just to add for whatever it is worth many vGPU's do NOT work well with nesting virtualization and will disable at some point literally because of WSL/hypervisor being enabled in the VM i have ran into this before with Nvidia vGPU etc, too it just will not work and is not designed to support nesting at all

windows 11 24H2 also shows many issues with GPUs in general, i am having a lot of trouble with windows 11 and vGPU, despite the vGPU working properly, applications that are supposed to use gpu and worked fine in windows 10 are hit and miss in 11, some just will not run right anymore.
 
  • Like
Reactions: n64thewin
Thanks for validating I am not crazy! I think others don't hit it because they don't use Microsoft accounts.

Its definitely to do with the secure enclave that's created in a variety of scenarios (ms account, whfb, wsl, etc), I must admin i haven't tested with out a TPM, but that's out for me as i need WhFB to work. This has been going on so long i suspect it's a fundamental upstream KVM bug possibly related to the type of hardware you and I have (i am on a NUC 13 which is very similar to your MS-01). i haven't had the time or will to prove this with normal Debian and ubuntu installations of KVM. i just used a separate hyper-v node for my testing instead :-(
You're definitely not crazy mate!

What is crazy though is the fact that the latest version of the Windows OS is seemingly unusable in KVM with host CPU passthrough in a pretty standard configuration.

A lot of the testing I had hoped to use Proxmox for would involve AAD joining the VM - or at the very least signing in with a Microsoft account... which breaks the VM... meaning I am now in a worse position than I was back when I was running VMs under VMware Workstation on my main PC.

I have run into similar "weirdness" with Windows 11 on KVM when I had previously tried running it in Unraid on another system, didn't have these sorts of issues on Windows 10.

Ironically enough, VMware Workstation also seems to be more performant than Proxmox/KVM has been, particularly with GPU emulation, as I notice random stutters when using Proxmox and VirtIO-GPU (which is odd considering VMware Workstation is a type-2 hypervisor runs INSIDE Windows as opposed to bare-metal like Promxox....)

I have also been considering using Hyper-V as well since it "just works", I've always been impressed with Hyper-V performance for Windows VMs, but I can't justify running Windows Server + Hyper-V (or Hyper-V Core) on my MS-01 as this presents its own challenges - not sure if it still requires AD join on the device you want to manage Hyper-V from, but I remember going through this a few years ago and I don't want to have to spin up AD just for this, nor do I want to go through hacking it together to work with local accounts.

I really wanted to find a good use for this MS-01, I even tried installing XCP-ng yesterday since I have experience with XenServer in the enterprise space, but this too had its own issues.

My goal was to use Proxmox for test VMs + as my daily driver for work which I have been doing with VMware Workstation and previously Hyper-V before that, but had to uninstall Hyper-V as it was negatively impacting gaming performance on my main PC.

So much potential in this little PC but at this rate I may just end up returning it :(
 
@n64thewin you did try this args in the linked thread right? args: -cpu host,hv_passthrough,level=30,-waitpkg
Yep - I've tried all the different args I could find in my research. They did allow the VM to get slightly further along in the boot process (I saw a few spins of the Windows loading circle) but ultimately still crashed (actually resulted in a kernel panic on the VM - go figure!)
 
just to add for whatever it is worth many vGPU's do NOT work well with nesting virtualization and will disable at some point literally because of WSL/hypervisor being enabled in the VM i have ran into this before with Nvidia vGPU etc, too it just will not work and is not designed to support nesting at all

windows 11 24H2 also shows many issues with GPUs in general, i am having a lot of trouble with windows 11 and vGPU, despite the vGPU working properly, applications that are supposed to use gpu and worked fine in windows 10 are hit and miss in 11, some just will not run right anymore.
Sounds to me like Microsoft are implementing all this virtualization-based security and "features" and that just isn't playing nice with KVM. Seems to work perfectly fine on VMware and Hyper-V... makes sense considering Azure is just Hyper-V so I guess they kind of needed to ensure it played nice with Windows.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!