VM shutdown, KVM: entry failed, hardware error 0x80000021

After changing power plan in windows from balanced to performance I reached 3 days uptime, I will update if there are changes.
Thanks for this tip! I've had issues with Windows 11 VMs for few weeks and looked absolutely everywhere until I found these conversations and saw that I'm not the only one having the issue.

At the time of writing this power plan change on Windows VM side has brought stability back to normal. No microcode update or anything else mentioned here has brought the solution so far for me... We'll see with the time if this is truly the case.
 
I have changed and pinned the kernel to 5.13.19-6 from 5.15.35-2 following @nick.kopas' advice and now windows 11 not crash for more than one day...keep observing ...and also there is another openwrt vm instance got internal soft reboot timely within 24hours(vm not crash) along with the windows crash and now seems is also stablized for 1+ day, will see how long it could stand. I am wondering they both related to the kernel of 5.15. just for your reference.
 
One thing that would help us here tremendously is if someone having this issue reproducibly (meaning it occurs deterministically on a VM of theirs, when certain actions are performed) - could explain how they arrive there and/or share the VM with us (I'm hoping that the issue does occur on our host as well then)

Step 1

Load Windows 2022 server iso from Microsoft Eval center and virtio-win iso from federa people onto your pve node

Step 2

Create a new VM via gui

OS tab : Windows iso and and 11/2022 as guest OS

System tab :
Capture d’écran 2022-06-16 à 19.16.46.png

Disk tab :

Capture d’écran 2022-06-16 à 19.17.02.png

CPU tab :
(host is Xeon D-1541 which is broadwell, extra flags left as is)

Capture d’écran 2022-06-16 à 19.18.52.png

Memory Tab :

Capture d’écran 2022-06-16 à 19.19.14.png

Network :

Capture d’écran 2022-06-16 à 19.19.54.png

Step 3

With gui, attach virtio iso :

Capture d’écran 2022-06-16 à 19.20.17.png


Step 4

Boot the VM, press a key to boot from iso, follow instructions until you can add virtio drivers for scsi, ballooning and netkvm, clic install and let windows do his voodoo (install prep, updates and all...)

And voila : witness the crash, head to syslog section to find this post' title in logs.


Happened 2 times in a row (on the same host though...)

A few more details regarding this host :

Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.30-2-pve: 5.15.30-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1


Let me know if you need more details, I'll happily share any input you might need !


Edit :
My small and silly brain got mixed up beetwin VirtIO Block (unwanted) and SCSI (wanted) during VM creation... Obviously when you do this the VM crashes...
 
Last edited:
  • Like
Reactions: rursache and itNGO
Have been following this thread for a while. Just disabled nested virtualization last week. Running Windows 11 (One Louder).
Jun 16 15:00:59 PVE QEMU[28320]: KVM: entry failed, hardware error 0x80000021 Jun 16 15:00:59 PVE QEMU[28320]: If you're running a guest on an Intel machine without unrestricted mode Jun 16 15:00:59 PVE QEMU[28320]: support, the failure can be most likely due to the guest entering an invalid Jun 16 15:00:59 PVE QEMU[28320]: state for Intel VT. For example, the guest maybe running in big real mode Jun 16 15:00:59 PVE QEMU[28320]: which is not supported on less recent Intel processors. Jun 16 15:00:59 PVE QEMU[28320]: EAX=000001ac EBX=00c33010 ECX=80002eb8 EDX=80002f1c Jun 16 15:00:59 PVE QEMU[28320]: ESI=00000000 EDI=00c33010 EBP=59eba6d0 ESP=59eba650 Jun 16 15:00:59 PVE QEMU[28320]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 Jun 16 15:00:59 PVE QEMU[28320]: ES =0000 00000000 ffffffff 00809300 Jun 16 15:00:59 PVE QEMU[28320]: CS =c200 7ffc2000 ffffffff 00809300 Jun 16 15:00:59 PVE QEMU[28320]: SS =0000 00000000 ffffffff 00809300 Jun 16 15:00:59 PVE QEMU[28320]: DS =0000 00000000 ffffffff 00809300 Jun 16 15:00:59 PVE QEMU[28320]: FS =0000 00000000 ffffffff 00809300 Jun 16 15:00:59 PVE QEMU[28320]: GS =0000 00000000 ffffffff 00809300 Jun 16 15:00:59 PVE QEMU[28320]: LDT=0000 00000000 000fffff 00000000 Jun 16 15:00:59 PVE QEMU[28320]: TR =0040 e47e9000 00000067 00008b00 Jun 16 15:00:59 PVE QEMU[28320]: GDT= e47eafb0 00000057 Jun 16 15:00:59 PVE QEMU[28320]: IDT= 00000000 00000000 Jun 16 15:00:59 PVE QEMU[28320]: CR0=00050032 CR2=8f97dfe8 CR3=150c1000 CR4=00000000 Jun 16 15:00:59 PVE QEMU[28320]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 Jun 16 15:00:59 PVE QEMU[28320]: DR6=00000000ffff0ff0 DR7=0000000000000400 Jun 16 15:00:59 PVE QEMU[28320]: EFER=0000000000000000 Jun 16 15:00:59 PVE QEMU[28320]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
Some additional information from the windows OS running at the time this successful event was captured right before crash
XML:
- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-a5ba-3e3b0328c30d}" />
  <EventID>4672</EventID>
  <Version>0</Version>
  <Level>0</Level>
  <Task>12548</Task>
  <Opcode>0</Opcode>
  <Keywords>0x8020000000000000</Keywords>
  <TimeCreated SystemTime="2022-06-16T21:59:47.4646524Z" />
  <EventRecordID>76064</EventRecordID>
  <Correlation ActivityID="{be222f70-805b-0001-ee2f-22be5b80d801}" />
  <Execution ProcessID="760" ThreadID="4680" />
  <Channel>Security</Channel>
  <Computer>Cappy</Computer>
  <Security />
  </System>
- <EventData>
  <Data Name="SubjectUserSid">S-1-5-18</Data>
  <Data Name="SubjectUserName">SYSTEM</Data>
  <Data Name="SubjectDomainName">NT AUTHORITY</Data>
  <Data Name="SubjectLogonId">0x3e7</Data>
  <Data Name="PrivilegeList">SeAssignPrimaryTokenPrivilege SeTcbPrivilege SeSecurityPrivilege SeTakeOwnershipPrivilege SeLoadDriverPrivilege SeBackupPrivilege SeRestorePrivilege SeDebugPrivilege SeAuditPrivilege SeSystemEnvironmentPrivilege SeImpersonatePrivilege SeDelegateSessionUserImpersonatePrivilege</Data>
  </EventData>
  </Event>
During the subsequent boot these processes created errors in the event log
XML:
- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="EventLog" />
  <EventID Qualifiers="32768">6008</EventID>
  <Version>0</Version>
  <Level>2</Level>
  <Task>0</Task>
  <Opcode>0</Opcode>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2022-06-16T22:34:58.5483672Z" />
  <EventRecordID>16147</EventRecordID>
  <Correlation />
  <Execution ProcessID="0" ThreadID="0" />
  <Channel>System</Channel>
  <Computer>Cappy</Computer>
  <Security />
  </System>
- <EventData>
  <Data>3:00:18 PM</Data>
  <Data>‎6/‎16/‎2022</Data>
  <Data />
  <Data />
  <Data>158366</Data>
  <Data />
  <Data />
  <Binary>E6070600040010000F00000012005603E60706000400100016000000120056033C0000003C000000000000000000000000000000000000000100000000000000</Binary>
  </EventData>
  </Event>
XML:
- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331c3b3a-2005-44c2-ac5e-77220c37d6b4}" />
  <EventID>41</EventID>
  <Version>8</Version>
  <Level>1</Level>
  <Task>63</Task>
  <Opcode>0</Opcode>
  <Keywords>0x8000400000000002</Keywords>
  <TimeCreated SystemTime="2022-06-16T22:34:55.3127681Z" />
  <EventRecordID>16158</EventRecordID>
  <Correlation />
  <Execution ProcessID="4" ThreadID="8" />
  <Channel>System</Channel>
  <Computer>Cappy</Computer>
  <Security UserID="S-1-5-18" />
  </System>
- <EventData>
  <Data Name="BugcheckCode">0</Data>
  <Data Name="BugcheckParameter1">0x0</Data>
  <Data Name="BugcheckParameter2">0x0</Data>
  <Data Name="BugcheckParameter3">0x0</Data>
  <Data Name="BugcheckParameter4">0x0</Data>
  <Data Name="SleepInProgress">0</Data>
  <Data Name="PowerButtonTimestamp">0</Data>
  <Data Name="BootAppStatus">0</Data>
  <Data Name="Checkpoint">0</Data>
  <Data Name="ConnectedStandbyInProgress">false</Data>
  <Data Name="SystemSleepTransitionsToOn">0</Data>
  <Data Name="CsEntryScenarioInstanceId">0</Data>
  <Data Name="BugcheckInfoFromEFI">false</Data>
  <Data Name="CheckpointStatus">0</Data>
  <Data Name="CsEntryScenarioInstanceIdV2">0</Data>
  <Data Name="LongPowerButtonPressDetected">false</Data>
  </EventData>
  </Event>
 
Last edited:
Let me try to make up for my previous mistake...

Here is my syslog during which I created a VM (with the right settings this time...), managed to install virtio drivers then let Windows install up to the "Need to restart" screen.

VM then starts to boot, reboot one more time and then crashes.

Code:
Jun 17 00:09:44 pve-axsp pvedaemon[2734428]: <root@pam> starting task UPID:pve-axsp:002A2979:034D7EF6:62ABAA28:download:SERVER_EVAL_x64FRE_en-us.iso:root@pam:

Jun 17 00:10:28 pve-axsp pvedaemon[2734428]: <root@pam> end task UPID:pve-axsp:002A2979:034D7EF6:62ABAA28:download:SERVER_EVAL_x64FRE_en-us.iso:root@pam: OK

Jun 17 00:11:04 pve-axsp pveproxy[2740103]: worker exit

Jun 17 00:11:04 pve-axsp pveproxy[1782]: worker 2740103 finished

Jun 17 00:11:04 pve-axsp pveproxy[1782]: starting 1 worker(s)

Jun 17 00:11:04 pve-axsp pveproxy[1782]: worker 2777889 started

Jun 17 00:11:34 pve-axsp pvedaemon[2675704]: <root@pam> successful auth for user 'root@pam'

Jun 17 00:13:17 pve-axsp pvedaemon[2737894]: <root@pam> starting task UPID:pve-axsp:002A712C:034DD1F9:62ABAAFD:qmcreate:101:root@pam:

Jun 17 00:13:20 pve-axsp pvedaemon[2737894]: <root@pam> end task UPID:pve-axsp:002A712C:034DD1F9:62ABAAFD:qmcreate:101:root@pam: OK

Jun 17 00:13:32 pve-axsp pvedaemon[2734428]: <root@pam> update VM 101: -ide0 local:iso/virtio-win.iso,media=cdrom

Jun 17 00:13:52 pve-axsp pvedaemon[2737894]: <root@pam> starting task UPID:pve-axsp:002A7604:034DDFD7:62ABAB20:qmstart:101:root@pam:

Jun 17 00:13:52 pve-axsp pvedaemon[2782724]: start VM 101: UPID:pve-axsp:002A7604:034DDFD7:62ABAB20:qmstart:101:root@pam:

Jun 17 00:13:53 pve-axsp systemd[1]: Started 101.scope.

Jun 17 00:13:53 pve-axsp systemd-udevd[2783083]: Using default interface naming scheme 'v247'.

Jun 17 00:13:53 pve-axsp systemd-udevd[2783083]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

Jun 17 00:13:53 pve-axsp kernel: device tap101i0 entered promiscuous mode

Jun 17 00:13:54 pve-axsp systemd-udevd[2783183]: Using default interface naming scheme 'v247'.

Jun 17 00:13:54 pve-axsp systemd-udevd[2783183]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

Jun 17 00:13:54 pve-axsp systemd-udevd[2783183]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

Jun 17 00:13:54 pve-axsp systemd-udevd[2783083]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

Jun 17 00:13:54 pve-axsp kernel: vmbr0: port 3(fwpr101p0) entered blocking state

Jun 17 00:13:54 pve-axsp kernel: vmbr0: port 3(fwpr101p0) entered disabled state

Jun 17 00:13:54 pve-axsp kernel: device fwpr101p0 entered promiscuous mode

Jun 17 00:13:54 pve-axsp kernel: vmbr0: port 3(fwpr101p0) entered blocking state

Jun 17 00:13:54 pve-axsp kernel: vmbr0: port 3(fwpr101p0) entered forwarding state

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 1(fwln101i0) entered blocking state

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 1(fwln101i0) entered disabled state

Jun 17 00:13:54 pve-axsp kernel: device fwln101i0 entered promiscuous mode

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 1(fwln101i0) entered blocking state

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 1(fwln101i0) entered forwarding state

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 2(tap101i0) entered blocking state

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 2(tap101i0) entered disabled state

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 2(tap101i0) entered blocking state

Jun 17 00:13:54 pve-axsp kernel: fwbr101i0: port 2(tap101i0) entered forwarding state

Jun 17 00:13:54 pve-axsp pvedaemon[2737894]: <root@pam> end task UPID:pve-axsp:002A7604:034DDFD7:62ABAB20:qmstart:101:root@pam: OK

Jun 17 00:13:54 pve-axsp pvedaemon[2783265]: starting vnc proxy UPID:pve-axsp:002A7821:034DE098:62ABAB22:vncproxy:101:root@pam:

Jun 17 00:13:54 pve-axsp pvedaemon[2734428]: <root@pam> starting task UPID:pve-axsp:002A7821:034DE098:62ABAB22:vncproxy:101:root@pam:

Jun 17 00:14:44 pve-axsp pvedaemon[2737894]: <root@pam> successful auth for user 'root@pam'

Jun 17 00:17:01 pve-axsp CRON[2824612]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)

Jun 17 00:17:01 pve-axsp CRON[2824613]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

Jun 17 00:17:01 pve-axsp CRON[2824612]: pam_unix(cron:session): session closed for user root

Jun 17 00:19:56 pve-axsp pveproxy[2740105]: worker exit

Jun 17 00:19:56 pve-axsp pveproxy[1782]: worker 2740105 finished

Jun 17 00:19:56 pve-axsp pveproxy[1782]: starting 1 worker(s)

Jun 17 00:19:56 pve-axsp pveproxy[1782]: worker 3053389 started

Jun 17 00:19:56 pve-axsp pveproxy[1782]: worker 2740104 finished

Jun 17 00:19:56 pve-axsp pveproxy[1782]: starting 1 worker(s)

Jun 17 00:19:56 pve-axsp pveproxy[1782]: worker 3053390 started

Jun 17 00:19:57 pve-axsp pveproxy[3053388]: got inotify poll request in wrong process - disabling inotify

Jun 17 00:20:01 pve-axsp QEMU[2783177]: KVM: entry failed, hardware error 0x80000021

Jun 17 00:20:01 pve-axsp QEMU[2783177]: If you're running a guest on an Intel machine without unrestricted mode

Jun 17 00:20:01 pve-axsp QEMU[2783177]: support, the failure can be most likely due to the guest entering an invalid

Jun 17 00:20:01 pve-axsp QEMU[2783177]: state for Intel VT. For example, the guest maybe running in big real mode

Jun 17 00:20:01 pve-axsp QEMU[2783177]: which is not supported on less recent Intel processors.

Jun 17 00:20:01 pve-axsp QEMU[2783177]: EAX=000b40b4 EBX=affe2180 ECX=00000000 EDX=00000000

Jun 17 00:20:01 pve-axsp QEMU[2783177]: ESI=affee1c0 EDI=09cb00c0 EBP=0ad46690 ESP=0ad464b0

Jun 17 00:20:01 pve-axsp QEMU[2783177]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0

Jun 17 00:20:01 pve-axsp QEMU[2783177]: ES =0000 00000000 ffffffff 00809300

Jun 17 00:20:01 pve-axsp QEMU[2783177]: CS =be00 7ffbe000 ffffffff 00809300

Jun 17 00:20:01 pve-axsp QEMU[2783177]: SS =0000 00000000 ffffffff 00809300

Jun 17 00:20:01 pve-axsp QEMU[2783177]: DS =0000 00000000 ffffffff 00809300

Jun 17 00:20:01 pve-axsp QEMU[2783177]: FS =0000 00000000 ffffffff 00809300

Jun 17 00:20:01 pve-axsp QEMU[2783177]: GS =0000 00000000 ffffffff 00809300

Jun 17 00:20:01 pve-axsp QEMU[2783177]: LDT=0000 00000000 000fffff 00000000

Jun 17 00:20:01 pve-axsp QEMU[2783177]: TR =0040 afff1000 00000067 00008b00

Jun 17 00:20:01 pve-axsp QEMU[2783177]: GDT=     afff2fb0 00000057

Jun 17 00:20:01 pve-axsp QEMU[2783177]: IDT=     00000000 00000000

Jun 17 00:20:01 pve-axsp QEMU[2783177]: CR0=00050032 CR2=1f368000 CR3=15d4f000 CR4=00000000

Jun 17 00:20:01 pve-axsp QEMU[2783177]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

Jun 17 00:20:01 pve-axsp QEMU[2783177]: DR6=00000000ffff0ff0 DR7=0000000000000400

Jun 17 00:20:01 pve-axsp QEMU[2783177]: EFER=0000000000000000

Jun 17 00:20:01 pve-axsp QEMU[2783177]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.

Jun 17 00:20:01 pve-axsp kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.

Jun 17 00:20:02 pve-axsp pvedaemon[2734428]: <root@pam> end task UPID:pve-axsp:002A7821:034DE098:62ABAB22:vncproxy:101:root@pam: OK

Jun 17 00:20:02 pve-axsp pveproxy[3053388]: worker exit

Jun 17 00:20:03 pve-axsp kernel: fwbr101i0: port 2(tap101i0) entered disabled state

Jun 17 00:20:03 pve-axsp kernel:  zd80: p1 p2 p3 p4

Jun 17 00:20:03 pve-axsp kernel: fwbr101i0: port 2(tap101i0) entered disabled state

Jun 17 00:20:03 pve-axsp systemd[1]: 101.scope: Succeeded.

Jun 17 00:20:03 pve-axsp systemd[1]: 101.scope: Consumed 8min 40.324s CPU time.

Jun 17 00:20:04 pve-axsp qmeventd[3053923]: Starting cleanup for 101

Jun 17 00:20:04 pve-axsp kernel: fwbr101i0: port 1(fwln101i0) entered disabled state

Jun 17 00:20:04 pve-axsp kernel: vmbr0: port 3(fwpr101p0) entered disabled state

Jun 17 00:20:04 pve-axsp kernel: device fwln101i0 left promiscuous mode

Jun 17 00:20:04 pve-axsp kernel: fwbr101i0: port 1(fwln101i0) entered disabled state

Jun 17 00:20:04 pve-axsp kernel: device fwpr101p0 left promiscuous mode

Jun 17 00:20:04 pve-axsp kernel: vmbr0: port 3(fwpr101p0) entered disabled state

Jun 17 00:20:04 pve-axsp qmeventd[3053923]: Finished cleanup for 101

Code:
root@pve-axsp:/etc/pve/qemu-server# cat /etc/pve/qemu-server/101.conf 
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0;ide0
cores: 6
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: local:iso/virtio-win.iso,media=cdrom,size=519172K
ide2: local:iso/SERVER_EVAL_x64FRE_en-us.iso,media=cdrom,size=4925874K
machine: pc-q35-6.2
memory: 8192
meta: creation-qemu=6.2.0,ctime=1655417597
name: test-WIn2022
net0: virtio=02:00:00:62:12:1f,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: local-zfs:vm-101-disk-1,cache=writeback,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=24184bf1-086e-451c-a719-a8c6a4ad120f
sockets: 1
tpmstate0: local-zfs:vm-101-disk-2,size=4M,version=v2.0
vmgenid: e10257ab-4cbf-4a58-a8e2-e2623b201416



A few minutes later, I destroyed the VM, created a new one with exact same settings : VM made it to admin password setup and I successfully logged into windows. It seems I simply can't replicate the issue with consistancy...
 
  • Like
Reactions: rursache
I spoke to soon, one just crashed...

Jun 17 08:12:36 SRV-01 QEMU[96489]: KVM: entry failed, hardware error 0x80000021
 
  • Like
Reactions: rursache
Going 4 days and 2 more servers with 2 days
The higher uptime (even if ultimately crashed after some time) with "Performance" power plan could point to what I previously mentioned - that the issue comes from Windows kernel scheduler doing something when switching between its internal idle/non-idle states that KVM does not like. This would also neatly explain why the issue is so hard to reproduce reliably, and especially why PVE team has issues reproducing it on freshly booted/installed OSes, as they generally tend to do more background work.

It's might not be necessarily regression either, it might be something new that KVM implemented.
 
Last edited:
Tonight another proxmox VE server updated two days ago, with two VMs Windows server 2022 gave problems on a machine that shut down at 01:43 AM .. now I proceed to go back to the previous kernel (proxmox-boot-tool kernel pin 5.13.19-6-pve) that with the other three proxmox VE servers that I manage I have solved the problem; but I would really like to understand what are the current official fixes for this problem, or a series of steps that need to be done, as these updates being available in the enterprise repo I expected them to be tested a lot .. and that these things did not happen or that in any case a fix is produced in a short time, so we have been with this problem for more than 20 days ... I am waiting for their comment from the Proxmox Staff, thanks

I enclose the configuration with which I have the blocking problem:

proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4 / ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1 + pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1 + pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.1-1
proxmox-backup-file-restore: 2.2.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1 ~ bpo11 + 1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

----------------------------------------------
And this is the syslog:


Jun 17 00:02:45 italprox smartd[808]: Device: /dev/bus/0 [megaraid_disk_02] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 65 to 64
Jun 17 00:10:26 italprox rsyslogd[806]: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="806" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Jun 17 00:13:41 italprox pvescheduler[233409]: INFO: Finished Backup of VM 1050 (00:26:31)
Jun 17 00:13:41 italprox pvescheduler[233409]: INFO: Backup job finished successfully
Jun 17 00:17:01 italprox CRON[241732]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 17 00:17:01 italprox CRON[241733]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 17 00:17:01 italprox CRON[241732]: pam_unix(cron:session): session closed for user root
Jun 17 00:32:45 italprox smartd[808]: Device: /dev/bus/0 [megaraid_disk_02] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 64 to 65
Jun 17 01:17:01 italprox CRON[252475]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 17 01:17:01 italprox CRON[252476]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 17 01:17:01 italprox CRON[252475]: pam_unix(cron:session): session closed for user root
Jun 17 01:43:37 italprox kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
Jun 17 01:43:37 italprox QEMU[1187]: KVM: entry failed, hardware error 0x80000021
Jun 17 01:43:37 italprox QEMU[1187]: If you're running a guest on an Intel machine without unrestricted mode
Jun 17 01:43:37 italprox QEMU[1187]: support, the failure can be most likely due to the guest entering an invalid
Jun 17 01:43:37 italprox QEMU[1187]: state for Intel VT. For example, the guest maybe running in big real mode
Jun 17 01:43:37 italprox QEMU[1187]: which is not supported on less recent Intel processors.
Jun 17 01:43:37 italprox QEMU[1187]: EAX=001a0f30 EBX=c29a0180 ECX=00000000 EDX=00000000
Jun 17 01:43:37 italprox QEMU[1187]: ESI=c29ac440 EDI=698ea080 EBP=226b5690 ESP=226b54b0
Jun 17 01:43:37 italprox QEMU[1187]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
Jun 17 01:43:37 italprox QEMU[1187]: ES =0000 00000000 ffffffff 00809300
Jun 17 01:43:37 italprox QEMU[1187]: CS =ae00 7ffae000 ffffffff 00809300
Jun 17 01:43:37 italprox QEMU[1187]: SS =0000 00000000 ffffffff 00809300
Jun 17 01:43:37 italprox QEMU[1187]: DS =0000 00000000 ffffffff 00809300
Jun 17 01:43:37 italprox QEMU[1187]: FS =0000 00000000 ffffffff 00809300
Jun 17 01:43:37 italprox QEMU[1187]: GS =0000 00000000 ffffffff 00809300
Jun 17 01:43:37 italprox QEMU[1187]: LDT=0000 00000000 000fffff 00000000
Jun 17 01:43:37 italprox QEMU[1187]: TR =0040 c29b0000 00000067 00008b00
Jun 17 01:43:37 italprox QEMU[1187]: GDT= c29b1fb0 00000057
Jun 17 01:43:37 italprox QEMU[1187]: IDT= 00000000 00000000
Jun 17 01:43:37 italprox QEMU[1187]: CR0=00050032 CR2=cc7de000 CR3=001ae000 CR4=00000000
Jun 17 01:43:37 italprox QEMU[1187]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Jun 17 01:43:37 italprox QEMU[1187]: DR6=00000000ffff0ff0 DR7=0000000000000400
Jun 17 01:43:37 italprox QEMU[1187]: EFER=0000000000000000
Jun 17 01:43:37 italprox QEMU[1187]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
Jun 17 01:43:37 italprox kernel: fwbr1010i0: port 2(tap1010i0) entered disabled state
Jun 17 01:43:37 italprox kernel: fwbr1010i0: port 2(tap1010i0) entered disabled state
Jun 17 01:43:37 italprox systemd[1]: 1010.scope: Succeeded.
Jun 17 01:43:37 italprox systemd[1]: 1010.scope: Consumed 4h 49min 19.701s CPU time.
Jun 17 01:43:38 italprox qmeventd[257151]: Starting cleanup for 1010
Jun 17 01:43:38 italprox kernel: fwbr1010i0: port 1(fwln1010i0) entered disabled state
Jun 17 01:43:38 italprox kernel: vmbr0: port 2(fwpr1010p0) entered disabled state
Jun 17 01:43:38 italprox kernel: device fwln1010i0 left promiscuous mode
Jun 17 01:43:38 italprox kernel: fwbr1010i0: port 1(fwln1010i0) entered disabled state
Jun 17 01:43:38 italprox kernel: device fwpr1010p0 left promiscuous mode
Jun 17 01:43:38 italprox kernel: vmbr0: port 2(fwpr1010p0) entered disabled state
Jun 17 01:43:38 italprox qmeventd[257151]: Finished cleanup for 1010
 
Last edited:
Step 1

Load Windows 2022 server iso from Microsoft Eval center and virtio-win iso from federa people onto your pve node

Step 2

Create a new VM via gui

OS tab : Windows iso and and 11/2022 as guest OS
...
Pretty much my test-scenario

Happened 2 times in a row (on the same host though...)
Sadly for me it happens once every 2-10 installs only

Thanks for the input in any case!
 
  • Like
Reactions: rursache
but I would really like to understand what are the current official fixes for this problem, or a series of steps that need to be done, as these updates being available in the enterprise repo I expected them to be tested a lot .. and that these things did not happen or that in any case a fix is produced in a short time, so we have been with this problem for more than 20 days ... I am waiting for their comment from the Proxmox Staff, thanks
I did comment on Wednesday (and yesterday was a public holiday here)..
As said before - we're working on isolating the commit that introduced this issue in the kernel - the problem while trying this, is that we have no reliable way of triggering this ... - and are installing Windows 2k22 in a row for 10-15 times to consider a commit as potentially not affected.

For the time being the mitigations we can offer are the ones discussed here in this thread.

I hope this helps!
 
If someone, who can reproduce this issue reliably would be willing to test the older pve-kernel-5.15. packages:
* pve-kernel-5.15.19-2-pve
* pve-kernel-5.15.12-1-pve
* pve-kernel-5.15.5-1-pve

this would help us quite a bit in narrowing this down!

Thanks!!
 
I thought this problem was isolated to "Coffee Lake" Family, I have a Lenovo with 2 x 6234 Gold and I got today

"Jun 19 02:59:17 pve QEMU[2610]: KVM: entry failed, hardware error 0x80000021"
 
I thought this problem was isolated to "Coffee Lake" Family, I have a Lenovo with 2 x 6234 Gold and I got today
No - sadly it happens with all kinds of (to my knowledge only) Intel CPUs:
* the one host that shows this for us is an older IvyBridge (with outdated BIOS)
* but in this thread we also have reports from Broadwell systems, and newer ones as well
 
  • Like
Reactions: rursache
Seems that at least for me the power plan change to performance seemed to bring stability. No crashes since enabling that few days ago. Before that those crashes occurred multiple times per day.
 
Hi, I installed pve7.1.2 in April, version 5.15.30 for the kernel, motherboard ASUS tuf b660m, cpu 12400, running win server2022 March version, directly connected to the sata controller, open file sharing, there has been no problem. I wanted to upgrade to 7.2.3 on May 7th, so I backed up the virtual machine, and then upgraded it. The kernel did not change. This problem first appeared on May 8th. I don’t know what happened. I searched on google and reported an error The reason, I saw this post, and I have been following it. During this period, I tried to re-brush the system and replace the kernel (I am 12400 and need to use the nuclear display, so I didn't use the 5.13 version of the kernel, and I have been using 5.15).

I don't know much about the operating principle of the system and the kernel. According to my attempts, I have drawn the following inferences, I hope it will help you to fix the error,
1. I think when the pve system is upgraded from 7.1 to 7.2, there should be some changes in the virtual environment. It may be the abnormal stop of win server2022 caused by these changes, not necessarily the reason of the kernel, because before I upgraded the 7.2 system, I Started using the 5.15 kernel, and no such bugs have appeared for about two weeks.
2. It has nothing to do with the system version of the windows server. I tried the first version of the 2022 system, and the latest version, there are such bugs.
3. It may be related to the type of CPU. There is a high probability of bugs when using host. Sometimes it occurs once a day. It takes three or four days for qemu max to appear once.
4. This bug may be easier to reproduce on Intel 12th generation processors...QQ图片20220620212934.png
 
Seems that at least for me the power plan change to performance seemed to bring stability. No crashes since enabling that few days ago. Before that those crashes occurred multiple times per day.
I did this same change on my server 2022 install and still crashing.
 
  • Like
Reactions: rursache
I did this same change on my server 2022 install and still crashing.
Seems
So far 5.13.19-6 kernel is the most reliable solution/workaround, my WS 2022 VM has been running since 2 weeks ago with no crash.
seems for me the latest 5.15.35-3-pve(PVE 5.15.35-6, btw: the version # really confused me) is no crash as well, I have run Win 11 for almost 4 days w/o crash.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!