New Windows 11 VM Fails Boot After Update

DFMurphy33

New Member
Dec 3, 2023
5
3
3
Hi all,

I'm brand new to Proxmox, and so far have been really impressed. It was relatively easy to get up and running, and the UI is pretty straight forward. It took me a little while to find all the right references to get GPU passthrough working but, after some tinkering, I seemingly had a working Windows 11 VM that recognized the graphics card. I did not install all of the drivers I should have during the installation process, so I figured there might be some issues there... but all in all, I was happy with my progress for the evening.

The next day, I decided to start a new Win 11 VM from scratch so I could get a good clean install without any possible remanence of the tinkering I did to get the first one working and to make sure I understood the process. Everything seemed to go well... until I attempted to update Windows. After the update completes, the VM needs to reboot, after which it fails to boot and brings me to the Windows Automatic Repair screen. Luckily, I took a snapshot prior to pulling updates, so I was able to roll back... but nothing I've done since will get me past this point. If I don't check for updates, I can use the VM, reboot it as many times as I want... no issues. But as soon as I check for updates, install, and reboot... it fails to boot and I'm right back to the Automatic Repair screen.

Since then, I've read every blog/post I can find and tried the following, rolling back each time it didn't work...
  • Installing one update at a time, in different orders.
  • Running the VirtlO driver update wizard (even though there doesn't appear to be any missing drivers)... same result.
  • Rebooting with and without the VirtlO and Win11 installation media ISOs.
  • Stopped/started the VM multiple times.
  • Installed applicable drivers via CMD then ran Startup Repair.
  • Launched Windows CMD from the recovery tools, installed the appropriate drivers, and confirmed all partitions are intact and the EFI partition is still there.
  • Rebuilt the EFI boot partition by copying the applicable files from the C: drive to the EFI partition.
  • Deleted and recreated the EFI partition from PVE.
Each time, I get the same result, with little more information to go on.

After all of this, the only additional indicator I was able to find was when running the Windows Automatic Repair. After it fails, the "SrtTrail.txt" file has a line at the bottom that says "a recently serviced boot binary is corrupt"... which makes me think one of the KBs is modifying the EFI boot file/partition in some way that's causing issues with Proxmox?

Host Specs:

Proxmox 8.1.3
Motherboard: ASUS PRIME Z790-V WIFI D5 ATX
CPU: Intel 13th Gen i9-13900K
RAM: 128GB (4x 32GB) CORSAIR VENGENCE DDR5 6400 XMP
GPU: EVGA RTX 3080 FTW3
Hypervisor Drive: Samsung SSD 980 PRO with Heatsink 2TB (MZ-V8P2T0)
VM Storage Drive: SK hynix Gold S31 1TB SSD

#pveversion -v
Linux MSOL-PVE 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

root@MSOL-PVE:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-4-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.0.9
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.5: 6.5.11-4
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.4
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.9
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-1
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.2
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3
root@MSOL-PVE:~# ^C
root@MSOL-PVE:~#

#qm config
root@MSOL-PVE:~# qm config 113
agent: 1
bios: ovmf
boot: order=scsi0;ide0;ide2;net0
cores: 8
cpu: host
efidisk0: VM_SSD:vm-113-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: local:iso/virtio-win-0.1.240.iso,media=cdrom,size=612812K
ide2: local:iso/Win11_22H2_English_x64v1.iso,media=cdrom,size=5427180K
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.2,ctime=1701527894
name: MSOL-VM-1700
net0: virtio=BC:24:11:B2:7F:A0,bridge=vmbr113,firewall=1
numa: 0
ostype: win11
parent: CMD_w_Drivers
scsi0: VM_SSD:vm-113-disk-1,cache=writeback,discard=on,iothread=1,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=e7bcbaf7-feac-422e-ae14-afa7907ee7b7
sockets: 1
tpmstate0: VM_SSD:vm-113-disk-2,size=4M,version=v2.0
unused1: VM_SSD:vm-113-disk-3
vga: virtio
vmgenid: 1ee7b917-1a2d-43a7-ab81-34b044bffc9e


Any help greatly appreciated. I had/have hopes of replacing my current workstation with a Win 11 VM hosted on Proxmox... but I don't seem to be off to a very good start here... o_O
 
  • Like
Reactions: tm81
try : remove vm CDROM
Thanks for the reply,

I tried that. In addition to the list of things I tried above, I've also now tried...
- Using a newer Win 11 23H2 ISO for installation
- Changing the order in which I install various drivers/updates

All with he same result. Even with the network disconnected and both ISOs ejected... If I do a fresh install and give it a reboot or two, the system goes into automatic repair.

In contrast, I also created a Win 10 VM with the same settings, and that's been running just fine since my first post.
 
I decided to start from scratch again and go as slowly as possible to identify the root cause of the issue... and the results are maddening...
- Built a new VM.
- Installed Win 11 23H2.
- The moment the installation finished and the desktop was available, I shutdown the VM from the guest OS.
- Set the media in each of the virtual CD ROMs to "none".
- Set the network adapter to "disconnected". (to prevent new updates from downloading/installing)
- Started the VM and rebooted from the Guest OS 3x, giving it a few minutes to sit there between reboots...
- After the 3rd reboot, the system goes into the "Automatic Repair"...

So... it would seem whatever is going on is present during/after the initial install and may be just waiting to fail? Not sure what else to try or where else to look for more clues.

Current VM config is as follows, the rest of the system is still the same as my initial post.

root@MSOL-PVE:~# qm config 102
agent: 1
bios: ovmf
boot: order=scsi0
cores: 8
cpu: host
efidisk0: VM_SSD:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: none,media=cdrom
ide2: none,media=cdrom
machine: pc-q35-8.1
memory: 8192
meta: creation-qemu=8.1.2,ctime=1702254435
name: Win11-Test
net0: virtio=BC:24:11:DF:40:08,bridge=vmbr113,firewall=1,link_down=1
numa: 0
ostype: win11
parent: Removed-Devices
scsi0: VM_SSD:vm-102-disk-1,cache=writeback,discard=on,iothread=1,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=74b0a916-5851-45ee-abbb-3c2540caa218
sockets: 1
tpmstate0: VM_SSD:vm-102-disk-2,size=4M,version=v2.0
vga: virtio
vmgenid: 1658ae19-2b68-4785-9f65-cac62b03c37c
 
Try to disable the writeback cache feature for the VM‘s vhdd (Default none).
 
Try to disable the writeback cache feature for the VM‘s vhdd (Default none).
Thanks for the reply,

I rolled back and changed the setting as suggested this morning... and after 3x reboots (waiting a few minutes in-between), the system fails to boot and goes into automatic recovery.

I'm not really seeing anyone else, at least on this forum, that's having this issue... so I can only assume that either there aren't too many people running Win11 VMs, or there's some subtle nuance to my build/config that's giving me issues that no one else has seen yet?
 
Try graphics: default
Try cpu: x86-64-v2-AES

delete the 2nd ide too

try this without reinstalling

It looks like this was the culprit... or at least an indicator to the root cause.

I decided to make one change at a time, changed "cpu: host" to "cpu: x86-64-v2-AES", and it came right back up.

I've since downloaded all Windows updates, added PCI passthrough for the GPU, installed the GPU drivers, rebooted several times... and so far its stable. Still not sure why this would be the case, since it installs and reboots the first 2x with the CPU set to "host". You would think it would fail install, fail after the first reboot, or be good to go.

Reading through the forum and other how-to guides leads me to believe there may be a performance impact from changing the CPU from "host", so I'd definitely still be interested in a fix that doesn't require me to change the CPU type, but I'm happy to have this VM running for now.
 
I'm glad it worked. It will make it compatible if you ever switch host computers too which is nice. You'd be surprised how fast this x86-64-v2-AES is. I think you can try v3 it may give you a bonus too.

Enjoy! Glad it helped, this setting helped solve something completely unrelated for me that took me 4 days to figure out...
 
  • Like
Reactions: DFMurphy33
I also had this exact same problem in Proxmox 8.1 and similar qemu settings; using all virtio drivers in the Windows 11 vm (scsi, network, balloon, vioserial, qxldod).
I also already tried everything OP tried.

For me it began like this:
In november, I think, just when the Windows 11 23H2 ISO was released, windows update within vm also offered the version update. This went "too fast", but apparently succeeded. No problems with reboots after this. The problems came precisely after trying to install specifically the *cumulative updates*; back then it was an optional update, this week was now mandatory, instantly beginning to download and install itself along other pendant updates.
Afterwards, same results as OP.
What I additionally tried was manually updating by mounting the windows ISO within the vm and running the installer. Same results, except that installer showed an error code at the end, which by googling seemed to point a fatal error related to drivers or outdated bios according to MS docs.

Haven't tried changing the cpu setting and going again yet, I think I'd need to try as well.

But in another forum, where OP was using pure QEMU in another distro, I read that when creating the vm from scratch and installing Windows 11 23h2 from zero, the windows installer did copy stuff to the virtual disk and rebooted, but seemingly always failed to properly create the ESP (EFI) partition, which resulted in just UEFI shell prompt instead of windows boot manager.

I myself wanted to try using pure QEMU, but it seems it lacks the virtio-scsi device type for some reason (it just gives errors when trying to use).

Has anyone had this last described issue?
What could be the problem with latest Windows 11 version and cpu host setting in QEMU?
I've never seen this issue before
 
It looks like this was the culprit... or at least an indicator to the root cause.

I decided to make one change at a time, changed "cpu: host" to "cpu: x86-64-v2-AES", and it came right back up.

I've since downloaded all Windows updates, added PCI passthrough for the GPU, installed the GPU drivers, rebooted several times... and so far its stable. Still not sure why this would be the case, since it installs and reboots the first 2x with the CPU set to "host". You would think it would fail install, fail after the first reboot, or be good to go.

Reading through the forum and other how-to guides leads me to believe there may be a performance impact from changing the CPU from "host", so I'd definitely still be interested in a fix that doesn't require me to change the CPU type, but I'm happy to have this VM running for now.
Ignore my PM @DFMurphy33 this solved it! For now...
 
Unfortunately setting "x86-64-v2-AES" breaks vGPU so while it fixes some scenarios, it isn't a practical fix for other scenarios.
 
I've never seen this issue before
i have been seeing this for over a year... it's a persistent issue... what's interesting about the OPs experience is that is time based

what's interesting to me is that in repair mode the c:\ drive is inaccessible, it is possible to load the driver from the virtio ISO using drvload command - however on reboot and repair the drive is once more inaccessible.... this makes me wonder if something is changing for the virtual scsi driver / virtual block driver....
 
I'm not using Proxmox but use the same QEMU/KVM backend as Proxmox and for me, my Windows 11 VM 21H2 couldn't take 22H2. Back then I did not have the patience nor time to invest in finding a fix. However, when I couldn't install 23H2, it was time to resolve the issue. I had to disable CPU Passthrough and also clear CPU configuration before 23H2 could successfully complete the second reboot during install. Without this measure, it always rolled back when it failed to successfully complete the second reboot.

I noticed the following addition in the XML file:

XML:
<cpu mode="custom" match="exact" check="none">
  <model fallback="forbid">qemu64</model>
</cpu>

Oddly, if there's been a performance impact, I haven't noticed as I only use the VM for Office applications - no graphics intensive stuff at all.
 
I'm in the same boat on this.

  • Winodws 10 VMs working fine with "host" CPU "emulation"
  • Upgrade to Windows 11 - keeps working
  • Reboot a few times - system boots for 2-3 times, then wont' boot (potentially a Windows update?)
  • Remove PCI passthrough device - system still wont' boot
  • Change CPU to emulated (x86-64-v2-AES) - system will now boot
It seems to have something to do with the specific combination of host CPU and some Windows 11 changes that happen to the OS after an update or upgrade, since it works for a few times, at first.

Have tried about 7 VMs and different combinations of getting to Windows 11 now and it seems to be consistent. Even re-installed Proxmox from scratch and the same issue emerges shortly after a Windows 11 upgrade (or a direct install), but NOT with Windows 1 guests.
 
I had the same issue, using "x86-64-v2-AES" as CPU-type resolved it. But now i can't use nested virtualization, which really is a pity because i use the machine for devops stuff and i really need docker to work.
What i've tried is this:
https://forum.proxmox.com/threads/selecting-cpu-type-x86-64-v2-aes.142869/
So for my Intel CPU i created this profile:

Code:
cpu-model: x86-64-v2-AES-nested
   flags +aes;+popcnt;+pni;+sse4.1;+sse4.2;+ssse3;+vmx
   reported-model qemu64

But this didn't help :( If i switch the CPU-type back to "host" i get a bluescreen when booting... Do you have any suggestions what i can do?

Edit: lscpu says this:

Code:
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   20
  On-line CPU(s) list:    0-19
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel(R) Corporation
  Model name:             Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
    BIOS Model name:      Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz To Be Filled By O.E.M. CPU @ 3.5GHz
    BIOS CPU family:      207
    CPU family:           6
    Model:                165
    Thread(s) per core:   2
    Core(s) per socket:   10
    Socket(s):            1
    Stepping:             5
    CPU(s) scaling MHz:   65%
    CPU max MHz:          5300.0000
    CPU min MHz:          800.0000
    BogoMIPS:             7399.70
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm
                          2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx r
                          dseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi pku ospke md_clear flush_l1d arch_capabilities
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!