New Windows 11 VM Fails Boot After Update

DFMurphy33

New Member
Dec 3, 2023
5
3
1
Hi all,

I'm brand new to Proxmox, and so far have been really impressed. It was relatively easy to get up and running, and the UI is pretty straight forward. It took me a little while to find all the right references to get GPU passthrough working but, after some tinkering, I seemingly had a working Windows 11 VM that recognized the graphics card. I did not install all of the drivers I should have during the installation process, so I figured there might be some issues there... but all in all, I was happy with my progress for the evening.

The next day, I decided to start a new Win 11 VM from scratch so I could get a good clean install without any possible remanence of the tinkering I did to get the first one working and to make sure I understood the process. Everything seemed to go well... until I attempted to update Windows. After the update completes, the VM needs to reboot, after which it fails to boot and brings me to the Windows Automatic Repair screen. Luckily, I took a snapshot prior to pulling updates, so I was able to roll back... but nothing I've done since will get me past this point. If I don't check for updates, I can use the VM, reboot it as many times as I want... no issues. But as soon as I check for updates, install, and reboot... it fails to boot and I'm right back to the Automatic Repair screen.

Since then, I've read every blog/post I can find and tried the following, rolling back each time it didn't work...
  • Installing one update at a time, in different orders.
  • Running the VirtlO driver update wizard (even though there doesn't appear to be any missing drivers)... same result.
  • Rebooting with and without the VirtlO and Win11 installation media ISOs.
  • Stopped/started the VM multiple times.
  • Installed applicable drivers via CMD then ran Startup Repair.
  • Launched Windows CMD from the recovery tools, installed the appropriate drivers, and confirmed all partitions are intact and the EFI partition is still there.
  • Rebuilt the EFI boot partition by copying the applicable files from the C: drive to the EFI partition.
  • Deleted and recreated the EFI partition from PVE.
Each time, I get the same result, with little more information to go on.

After all of this, the only additional indicator I was able to find was when running the Windows Automatic Repair. After it fails, the "SrtTrail.txt" file has a line at the bottom that says "a recently serviced boot binary is corrupt"... which makes me think one of the KBs is modifying the EFI boot file/partition in some way that's causing issues with Proxmox?

Host Specs:

Proxmox 8.1.3
Motherboard: ASUS PRIME Z790-V WIFI D5 ATX
CPU: Intel 13th Gen i9-13900K
RAM: 128GB (4x 32GB) CORSAIR VENGENCE DDR5 6400 XMP
GPU: EVGA RTX 3080 FTW3
Hypervisor Drive: Samsung SSD 980 PRO with Heatsink 2TB (MZ-V8P2T0)
VM Storage Drive: SK hynix Gold S31 1TB SSD

#pveversion -v
Linux MSOL-PVE 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

root@MSOL-PVE:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-4-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.0.9
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.5: 6.5.11-4
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.4
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.9
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-1
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.2
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3
root@MSOL-PVE:~# ^C
root@MSOL-PVE:~#

#qm config
root@MSOL-PVE:~# qm config 113
agent: 1
bios: ovmf
boot: order=scsi0;ide0;ide2;net0
cores: 8
cpu: host
efidisk0: VM_SSD:vm-113-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: local:iso/virtio-win-0.1.240.iso,media=cdrom,size=612812K
ide2: local:iso/Win11_22H2_English_x64v1.iso,media=cdrom,size=5427180K
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.2,ctime=1701527894
name: MSOL-VM-1700
net0: virtio=BC:24:11:B2:7F:A0,bridge=vmbr113,firewall=1
numa: 0
ostype: win11
parent: CMD_w_Drivers
scsi0: VM_SSD:vm-113-disk-1,cache=writeback,discard=on,iothread=1,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=e7bcbaf7-feac-422e-ae14-afa7907ee7b7
sockets: 1
tpmstate0: VM_SSD:vm-113-disk-2,size=4M,version=v2.0
unused1: VM_SSD:vm-113-disk-3
vga: virtio
vmgenid: 1ee7b917-1a2d-43a7-ab81-34b044bffc9e


Any help greatly appreciated. I had/have hopes of replacing my current workstation with a Win 11 VM hosted on Proxmox... but I don't seem to be off to a very good start here... o_O
 
  • Like
Reactions: tm81
try : remove vm CDROM
Thanks for the reply,

I tried that. In addition to the list of things I tried above, I've also now tried...
- Using a newer Win 11 23H2 ISO for installation
- Changing the order in which I install various drivers/updates

All with he same result. Even with the network disconnected and both ISOs ejected... If I do a fresh install and give it a reboot or two, the system goes into automatic repair.

In contrast, I also created a Win 10 VM with the same settings, and that's been running just fine since my first post.
 
I decided to start from scratch again and go as slowly as possible to identify the root cause of the issue... and the results are maddening...
- Built a new VM.
- Installed Win 11 23H2.
- The moment the installation finished and the desktop was available, I shutdown the VM from the guest OS.
- Set the media in each of the virtual CD ROMs to "none".
- Set the network adapter to "disconnected". (to prevent new updates from downloading/installing)
- Started the VM and rebooted from the Guest OS 3x, giving it a few minutes to sit there between reboots...
- After the 3rd reboot, the system goes into the "Automatic Repair"...

So... it would seem whatever is going on is present during/after the initial install and may be just waiting to fail? Not sure what else to try or where else to look for more clues.

Current VM config is as follows, the rest of the system is still the same as my initial post.

root@MSOL-PVE:~# qm config 102
agent: 1
bios: ovmf
boot: order=scsi0
cores: 8
cpu: host
efidisk0: VM_SSD:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: none,media=cdrom
ide2: none,media=cdrom
machine: pc-q35-8.1
memory: 8192
meta: creation-qemu=8.1.2,ctime=1702254435
name: Win11-Test
net0: virtio=BC:24:11:DF:40:08,bridge=vmbr113,firewall=1,link_down=1
numa: 0
ostype: win11
parent: Removed-Devices
scsi0: VM_SSD:vm-102-disk-1,cache=writeback,discard=on,iothread=1,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=74b0a916-5851-45ee-abbb-3c2540caa218
sockets: 1
tpmstate0: VM_SSD:vm-102-disk-2,size=4M,version=v2.0
vga: virtio
vmgenid: 1658ae19-2b68-4785-9f65-cac62b03c37c
 
Try to disable the writeback cache feature for the VM‘s vhdd (Default none).
 
Try to disable the writeback cache feature for the VM‘s vhdd (Default none).
Thanks for the reply,

I rolled back and changed the setting as suggested this morning... and after 3x reboots (waiting a few minutes in-between), the system fails to boot and goes into automatic recovery.

I'm not really seeing anyone else, at least on this forum, that's having this issue... so I can only assume that either there aren't too many people running Win11 VMs, or there's some subtle nuance to my build/config that's giving me issues that no one else has seen yet?
 
Try graphics: default
Try cpu: x86-64-v2-AES

delete the 2nd ide too

try this without reinstalling

It looks like this was the culprit... or at least an indicator to the root cause.

I decided to make one change at a time, changed "cpu: host" to "cpu: x86-64-v2-AES", and it came right back up.

I've since downloaded all Windows updates, added PCI passthrough for the GPU, installed the GPU drivers, rebooted several times... and so far its stable. Still not sure why this would be the case, since it installs and reboots the first 2x with the CPU set to "host". You would think it would fail install, fail after the first reboot, or be good to go.

Reading through the forum and other how-to guides leads me to believe there may be a performance impact from changing the CPU from "host", so I'd definitely still be interested in a fix that doesn't require me to change the CPU type, but I'm happy to have this VM running for now.
 
I'm glad it worked. It will make it compatible if you ever switch host computers too which is nice. You'd be surprised how fast this x86-64-v2-AES is. I think you can try v3 it may give you a bonus too.

Enjoy! Glad it helped, this setting helped solve something completely unrelated for me that took me 4 days to figure out...
 
  • Like
Reactions: DFMurphy33
I also had this exact same problem in Proxmox 8.1 and similar qemu settings; using all virtio drivers in the Windows 11 vm (scsi, network, balloon, vioserial, qxldod).
I also already tried everything OP tried.

For me it began like this:
In november, I think, just when the Windows 11 23H2 ISO was released, windows update within vm also offered the version update. This went "too fast", but apparently succeeded. No problems with reboots after this. The problems came precisely after trying to install specifically the *cumulative updates*; back then it was an optional update, this week was now mandatory, instantly beginning to download and install itself along other pendant updates.
Afterwards, same results as OP.
What I additionally tried was manually updating by mounting the windows ISO within the vm and running the installer. Same results, except that installer showed an error code at the end, which by googling seemed to point a fatal error related to drivers or outdated bios according to MS docs.

Haven't tried changing the cpu setting and going again yet, I think I'd need to try as well.

But in another forum, where OP was using pure QEMU in another distro, I read that when creating the vm from scratch and installing Windows 11 23h2 from zero, the windows installer did copy stuff to the virtual disk and rebooted, but seemingly always failed to properly create the ESP (EFI) partition, which resulted in just UEFI shell prompt instead of windows boot manager.

I myself wanted to try using pure QEMU, but it seems it lacks the virtio-scsi device type for some reason (it just gives errors when trying to use).

Has anyone had this last described issue?
What could be the problem with latest Windows 11 version and cpu host setting in QEMU?
I've never seen this issue before
 
It looks like this was the culprit... or at least an indicator to the root cause.

I decided to make one change at a time, changed "cpu: host" to "cpu: x86-64-v2-AES", and it came right back up.

I've since downloaded all Windows updates, added PCI passthrough for the GPU, installed the GPU drivers, rebooted several times... and so far its stable. Still not sure why this would be the case, since it installs and reboots the first 2x with the CPU set to "host". You would think it would fail install, fail after the first reboot, or be good to go.

Reading through the forum and other how-to guides leads me to believe there may be a performance impact from changing the CPU from "host", so I'd definitely still be interested in a fix that doesn't require me to change the CPU type, but I'm happy to have this VM running for now.
Ignore my PM @DFMurphy33 this solved it! For now...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!