Windows Server 2022 Crashing Randomly

bilalwaheedch

New Member
May 20, 2022
6
1
3
Hello! I am facing a rather unusual issue, perhaps it is unusual for a noob like me. I hope someone here would be able to help me out here.

Have a few Windows 2022 VMs that randomly shutdown.

Here is the VM config file:

Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0;ide0
cores: 4
efidisk0: local-lvm:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide0: local:iso/virtio-win-0.1.215.iso,media=cdrom,size=528322K
machine: pc-q35-6.2
memory: 8192
meta: creation-qemu=6.2.0,ctime=1652014079
name: W22-DC
net0: virtio=C6:C8:FE:41:14:87,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
scsi0: local-lvm:vm-101-disk-1,cache=writeback,discard=on,size=100G
scsi1: local-ssd:vm-101-disk-0,cache=writeback,discard=on,size=800G
scsihw: virtio-scsi-pci
smbios1: uuid=f59f73de-77b5-4448-a612-4303fff00528
sockets: 1
tpmstate0: local-lvm:vm-101-disk-2,size=4M,version=v2.0

The below log seems to be the one pointing to cause of crashing:

Code:
May 20 04:53:45 bigbro QEMU[517464]: KVM: entry failed, hardware error 0x80000021
May 20 04:53:45 bigbro QEMU[517464]: If you're running a guest on an Intel machine without unrestricted mode
May 20 04:53:45 bigbro QEMU[517464]: support, the failure can be most likely due to the guest entering an invalid
May 20 04:53:45 bigbro QEMU[517464]: state for Intel VT. For example, the guest maybe running in big real mode
May 20 04:53:45 bigbro QEMU[517464]: which is not supported on less recent Intel processors.
May 20 04:53:45 bigbro kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 20 04:53:45 bigbro QEMU[517464]: EAX=000059ef EBX=ea9e2180 ECX=00000001 EDX=00000000
May 20 04:53:45 bigbro QEMU[517464]: ESI=a8fe5280 EDI=ea9ee140 EBP=00000000 ESP=d9e37d40
May 20 04:53:45 bigbro QEMU[517464]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
May 20 04:53:45 bigbro QEMU[517464]: ES =0000 00000000 ffffffff 00809300
May 20 04:53:45 bigbro QEMU[517464]: CS =c200 7ffc2000 ffffffff 00809300
May 20 04:53:45 bigbro QEMU[517464]: SS =0000 00000000 ffffffff 00809300
May 20 04:53:45 bigbro QEMU[517464]: DS =0000 00000000 ffffffff 00809300
May 20 04:53:45 bigbro QEMU[517464]: FS =0000 00000000 ffffffff 00809300
May 20 04:53:45 bigbro QEMU[517464]: GS =0000 00000000 ffffffff 00809300
May 20 04:53:45 bigbro QEMU[517464]: LDT=0000 00000000 000fffff 00000000
May 20 04:53:45 bigbro QEMU[517464]: TR =0040 ea9f1000 00000067 00008b00
May 20 04:53:45 bigbro QEMU[517464]: GDT=     ea9f2fb0 00000057
May 20 04:53:45 bigbro QEMU[517464]: IDT=     00000000 00000000
May 20 04:53:45 bigbro QEMU[517464]: CR0=00050032 CR2=22fcadb8 CR3=001ae000 CR4=00000000
May 20 04:53:45 bigbro QEMU[517464]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
May 20 04:53:45 bigbro QEMU[517464]: DR6=00000000ffff0ff0 DR7=0000000000000400
May 20 04:53:45 bigbro QEMU[517464]: EFER=0000000000000000
May 20 04:53:45 bigbro QEMU[517464]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
May 20 04:54:00 bigbro kernel: fwbr101i0: port 2(tap101i0) entered disabled state
May 20 04:54:00 bigbro kernel: fwbr101i0: port 2(tap101i0) entered disabled state
May 20 04:54:00 bigbro systemd[1]: 101.scope: Succeeded.
May 20 04:54:00 bigbro systemd[1]: 101.scope: Consumed 15h 34min 50.984s CPU time.
May 20 04:54:02 bigbro qmeventd[1074241]: Starting cleanup for 101
May 20 04:54:02 bigbro kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
May 20 04:54:02 bigbro kernel: vmbr0: port 3(fwpr101p0) entered disabled state
May 20 04:54:02 bigbro kernel: device fwln101i0 left promiscuous mode
May 20 04:54:02 bigbro kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
May 20 04:54:02 bigbro kernel: device fwpr101p0 left promiscuous mode
May 20 04:54:02 bigbro kernel: vmbr0: port 3(fwpr101p0) entered disabled state
May 20 04:54:03 bigbro qmeventd[1074241]: Finished cleanup for 101

Only Windows 2022 seems to be acting this way, Windows 10 and other Linux machines are working flawlessly.

Thank you in advance!
 
Same problem here, on Proxmox Virtual Environment 7.2-3

Code:
May 22 04:19:36 HPDL350 QEMU[1536739]: KVM: entry failed, hardware error 0x80000021
May 22 04:19:36 HPDL350 QEMU[1536739]: If you're running a guest on an Intel machine without unrestricted mode
May 22 04:19:36 HPDL350 QEMU[1536739]: support, the failure can be most likely due to the guest entering an invalid
May 22 04:19:36 HPDL350 QEMU[1536739]: state for Intel VT. For example, the guest maybe running in big real mode
May 22 04:19:36 HPDL350 QEMU[1536739]: which is not supported on less recent Intel processors.
May 22 04:19:36 HPDL350 QEMU[1536739]: EAX=e6a8e1bd EBX=fc4622f0 ECX=c2925520 EDX=c292c000
May 22 04:19:36 HPDL350 QEMU[1536739]: ESI=c26e81a0 EDI=fc4622f0 EBP=bc022ca0 ESP=b9897fb0
May 22 04:19:36 HPDL350 QEMU[1536739]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
May 22 04:19:36 HPDL350 QEMU[1536739]: ES =0000 00000000 ffffffff 00809300
May 22 04:19:36 HPDL350 QEMU[1536739]: CS =a800 7ffa8000 ffffffff 00809300
May 22 04:19:36 HPDL350 QEMU[1536739]: SS =0000 00000000 ffffffff 00809300
May 22 04:19:36 HPDL350 QEMU[1536739]: DS =0000 00000000 ffffffff 00809300
May 22 04:19:36 HPDL350 QEMU[1536739]: FS =0000 00000000 ffffffff 00809300
May 22 04:19:36 HPDL350 QEMU[1536739]: GS =0000 00000000 ffffffff 00809300
May 22 04:19:36 HPDL350 QEMU[1536739]: LDT=0000 00000000 000fffff 00000000
May 22 04:19:36 HPDL350 QEMU[1536739]: TR =0040 b97f9000 00000067 00008b00
May 22 04:19:36 HPDL350 QEMU[1536739]: GDT=     b97fafb0 00000057
May 22 04:19:36 HPDL350 QEMU[1536739]: IDT=     00000000 00000000
May 22 04:19:36 HPDL350 QEMU[1536739]: CR0=00050032 CR2=438feff8 CR3=1c2bc000 CR4=00000000
May 22 04:19:36 HPDL350 QEMU[1536739]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
May 22 04:19:36 HPDL350 QEMU[1536739]: DR6=00000000ffff0ff0 DR7=0000000000000400
May 22 04:19:36 HPDL350 QEMU[1536739]: EFER=0000000000000000
May 22 04:19:36 HPDL350 QEMU[1536739]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
May 22 04:19:36 HPDL350 kernel: [475631.763211] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 22 04:19:36 HPDL350 kernel: [475631.801644] fwbr103i0: port 2(tap103i0) entered disabled state
May 22 04:19:36 HPDL350 kernel: [475631.802277] fwbr103i0: port 2(tap103i0) entered disabled state
May 22 04:19:36 HPDL350 systemd[1]: 103.scope: Succeeded.
May 22 04:19:36 HPDL350 systemd[1]: 103.scope: Consumed 1h 47min 1.308s CPU time.
May 22 04:19:37 HPDL350 qmeventd[1638295]: Starting cleanup for 103
May 22 04:19:37 HPDL350 kernel: [475632.995950] fwbr103i0: port 1(fwln103i0) entered disabled state
May 22 04:19:37 HPDL350 kernel: [475632.996122] vmbr0: port 3(fwpr103p0) entered disabled state
May 22 04:19:37 HPDL350 kernel: [475632.996651] device fwln103i0 left promiscuous mode
May 22 04:19:37 HPDL350 kernel: [475632.996658] fwbr103i0: port 1(fwln103i0) entered disabled state
May 22 04:19:37 HPDL350 kernel: [475633.035401] device fwpr103p0 left promiscuous mode
May 22 04:19:37 HPDL350 kernel: [475633.035405] vmbr0: port 3(fwpr103p0) entered disabled state
May 22 04:19:37 HPDL350 qmeventd[1638295]: Finished cleanup for 103
May 22 05:12:39 HPDL350 pmxcfs[1153]: [dcdb] notice: data verification successful
May 22 05:17:01 HPDL350 CRON[1650104]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 22 05:42:37 HPDL350 smartd[850]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 74 to 72
May 22 06:12:37 HPDL350 smartd[850]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 72 to 74
May 22 06:12:39 HPDL350 pmxcfs[1153]: [dcdb] notice: data verification successful
May 22 06:17:01 HPDL350 CRON[1662448]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 22 06:17:14 HPDL350 systemd[1]: Starting Daily apt upgrade and clean activities...
May 22 06:17:14 HPDL350 systemd[1]: apt-daily-upgrade.service: Succeeded.
May 22 06:17:14 HPDL350 systemd[1]: Finished Daily apt upgrade and clean activities.
May 22 06:25:01 HPDL350 CRON[1664149]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
May 22 06:42:37 HPDL350 smartd[850]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 74 to 72
May 22 06:47:01 HPDL350 CRON[1668684]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ))
May 22 07:12:39 HPDL350 pmxcfs[1153]: [dcdb] notice: data verification successful
 
Uninstall Windows Server CU 05/22
2022-05 Cumulative Update for Microsoft server operating system version 21H2 for x64-based Systems (KB5013944)
and check if the issue still exists.
 
that update is not installed yet.
Do you have installed the server with "TPM" enabled? Is it possible to remove the virtual TPM and test again?
 
After removing the TPM it lasted a while - but it actually just did the same thing today.

Code:
Jun 05 02:59:18 HPDL350 QEMU[25898]: KVM: entry failed, hardware error 0x80000021
Jun 05 02:59:18 HPDL350 QEMU[25898]: If you're running a guest on an Intel machine without unrestricted mode
Jun 05 02:59:18 HPDL350 QEMU[25898]: support, the failure can be most likely due to the guest entering an invalid
Jun 05 02:59:18 HPDL350 QEMU[25898]: state for Intel VT. For example, the guest maybe running in big real mode
Jun 05 02:59:18 HPDL350 QEMU[25898]: which is not supported on less recent Intel processors.
Jun 05 02:59:18 HPDL350 kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
Jun 05 02:59:18 HPDL350 QEMU[25898]: EAX=d3acdf93 EBX=00000001 ECX=caff0360 EDX=caff1000
Jun 05 02:59:18 HPDL350 QEMU[25898]: ESI=caed1e20 EDI=00000001 EBP=7a6ed7f0 ESP=795bbfb0
Jun 05 02:59:18 HPDL350 QEMU[25898]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
Jun 05 02:59:18 HPDL350 QEMU[25898]: ES =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: CS =9e00 7ff9e000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: SS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: DS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: FS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: GS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: LDT=0000 00000000 000fffff 00000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: TR =0040 795b0000 00000067 00008b00
Jun 05 02:59:18 HPDL350 QEMU[25898]: GDT=     795b1fb0 00000057
Jun 05 02:59:18 HPDL350 QEMU[25898]: IDT=     00000000 00000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: CR0=00050032 CR2=6830c000 CR3=1c741000 CR4=00000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: DR6=00000000ffff0ff0 DR7=0000000000000400
Jun 05 02:59:18 HPDL350 QEMU[25898]: EFER=0000000000000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
Jun 05 02:59:19 HPDL350 kernel: fwbr103i0: port 2(tap103i0) entered disabled state
Jun 05 02:59:19 HPDL350 kernel: fwbr103i0: port 2(tap103i0) entered disabled state
Jun 05 02:59:19 HPDL350 systemd[1]: 103.scope: Succeeded.
Jun 05 02:59:19 HPDL350 systemd[1]: 103.scope: Consumed 6h 36min 4.842s CPU time.
Jun 05 02:59:20 HPDL350 qmeventd[458341]: Starting cleanup for 103
Jun 05 02:59:20 HPDL350 kernel: fwbr103i0: port 1(fwln103i0) entered disabled state
Jun 05 02:59:20 HPDL350 kernel: vmbr0: port 4(fwpr103p0) entered disabled state
Jun 05 02:59:20 HPDL350 kernel: device fwln103i0 left promiscuous mode
Jun 05 02:59:20 HPDL350 kernel: fwbr103i0: port 1(fwln103i0) entered disabled state
Jun 05 02:59:20 HPDL350 kernel: device fwpr103p0 left promiscuous mode
Jun 05 02:59:20 HPDL350 kernel: vmbr0: port 4(fwpr103p0) entered disabled state
Jun 05 02:59:20 HPDL350 qmeventd[458341]: Finished cleanup for 103
 
After removing the TPM it lasted a while - but it actually just did the same thing today.

Code:
Jun 05 02:59:18 HPDL350 QEMU[25898]: KVM: entry failed, hardware error 0x80000021
Jun 05 02:59:18 HPDL350 QEMU[25898]: If you're running a guest on an Intel machine without unrestricted mode
Jun 05 02:59:18 HPDL350 QEMU[25898]: support, the failure can be most likely due to the guest entering an invalid
Jun 05 02:59:18 HPDL350 QEMU[25898]: state for Intel VT. For example, the guest maybe running in big real mode
Jun 05 02:59:18 HPDL350 QEMU[25898]: which is not supported on less recent Intel processors.
Jun 05 02:59:18 HPDL350 kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
Jun 05 02:59:18 HPDL350 QEMU[25898]: EAX=d3acdf93 EBX=00000001 ECX=caff0360 EDX=caff1000
Jun 05 02:59:18 HPDL350 QEMU[25898]: ESI=caed1e20 EDI=00000001 EBP=7a6ed7f0 ESP=795bbfb0
Jun 05 02:59:18 HPDL350 QEMU[25898]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
Jun 05 02:59:18 HPDL350 QEMU[25898]: ES =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: CS =9e00 7ff9e000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: SS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: DS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: FS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: GS =0000 00000000 ffffffff 00809300
Jun 05 02:59:18 HPDL350 QEMU[25898]: LDT=0000 00000000 000fffff 00000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: TR =0040 795b0000 00000067 00008b00
Jun 05 02:59:18 HPDL350 QEMU[25898]: GDT=     795b1fb0 00000057
Jun 05 02:59:18 HPDL350 QEMU[25898]: IDT=     00000000 00000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: CR0=00050032 CR2=6830c000 CR3=1c741000 CR4=00000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: DR6=00000000ffff0ff0 DR7=0000000000000400
Jun 05 02:59:18 HPDL350 QEMU[25898]: EFER=0000000000000000
Jun 05 02:59:18 HPDL350 QEMU[25898]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
Jun 05 02:59:19 HPDL350 kernel: fwbr103i0: port 2(tap103i0) entered disabled state
Jun 05 02:59:19 HPDL350 kernel: fwbr103i0: port 2(tap103i0) entered disabled state
Jun 05 02:59:19 HPDL350 systemd[1]: 103.scope: Succeeded.
Jun 05 02:59:19 HPDL350 systemd[1]: 103.scope: Consumed 6h 36min 4.842s CPU time.
Jun 05 02:59:20 HPDL350 qmeventd[458341]: Starting cleanup for 103
Jun 05 02:59:20 HPDL350 kernel: fwbr103i0: port 1(fwln103i0) entered disabled state
Jun 05 02:59:20 HPDL350 kernel: vmbr0: port 4(fwpr103p0) entered disabled state
Jun 05 02:59:20 HPDL350 kernel: device fwln103i0 left promiscuous mode
Jun 05 02:59:20 HPDL350 kernel: fwbr103i0: port 1(fwln103i0) entered disabled state
Jun 05 02:59:20 HPDL350 kernel: device fwpr103p0 left promiscuous mode
Jun 05 02:59:20 HPDL350 kernel: vmbr0: port 4(fwpr103p0) entered disabled state
Jun 05 02:59:20 HPDL350 qmeventd[458341]: Finished cleanup for 103
I have the same issue with win 2022 svr. I also had the issue with win 11. When I changed the disk type to ide on setup with ssd emulation it seemed to resolve the issue, u can give that a try.
 
Can report the same issue with similar log messages.
For me the Windows Server 2022 VM crashes when using more RAM than usual mostly during updates.
The VM instantly shuts off.
I tried freeing up more RAM on the host, deactivated the TPM, tried different q35 versions, nothing helped.
 
Can report the same issue with similar log messages.
For me the Windows Server 2022 VM crashes when using more RAM than usual mostly during updates.
The VM instantly shuts off.
I tried freeing up more RAM on the host, deactivated the TPM, tried different q35 versions, nothing helped.
mitigations=off? Intel-Microcode? What did you test so far after reading the tread?
 
mitigations=off? Intel-Microcode? What did you test so far after reading the tread?
Not sure if it is helpful, but I came to this thread because I have the same issue, but on a system that I am not running proxmox on. I have tried everything I have found across the web on this. The main suggestion has been to disable nested virtualization, which I have done but I still get this error.
 
Hi guys, same random shutdown with same logs brought me here : like OP, only Win2022 is unhappy with his host.

Problem has been seen on a host with Intel Xeon D-1541 CPU.

On another host I have with AMD cpu Windows server 2022 behave normally.

but on a system that I am not running proxmox on.

What do you mean ? Running your VM with qemu from command line? What's your host' OS?


To all : what's your host' CPU?
Let's see if there is a pattern here...
 
Last edited:
I have found that I only have the issues with the tpm and uefi disk configured, if I don't have those selected I am good. Also I am running on dell opti 7040 so not sure everything is completely compatible. It runs well for my purposes in my home lab. I have a i5 6500
 
For those returning to this thread after experiencing this issue in the future, the fix is in all kernels > 5.15.39-4. So do your apt upgrade, reboot and you should be set.
 
Hello, I have Proxmox with kernel 6.2 and host with Windows server 2022 which randomly crashes every day. What can I do?

Code:
pve-manager/7.4-3/9002ab8a (running kernel: 6.2.6-1-pve)

Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-zfs:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide2: none,media=cdrom
machine: pc-q35-7.1
memory: 32768
meta: creation-qemu=7.1.0,ctime=1677141982
name: windows2022srv
net0: virtio=F2:AE:5A:AA:FB:4B,bridge=vmbr10,firewall=1
numa: 0
ostype: win11
scsi0: local-zfs:vm-100-disk-2,cache=writeback,discard=on,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=5b06996b-c2d6-4c0b-b3fa-bca5b6c616f0
sockets: 2
tpmstate0: local-zfs:vm-100-disk-1,size=4M,version=v2.0
unused0: disk4:100/vm-100-disk-0.qcow2
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!