VM shutdown, KVM: entry failed, hardware error 0x80000021

The same problem started for me after upgrading pve-kernel to version 5.15.35-2 (from 5.13.19-6).
  • Host
    • CPU: Intel Xeon E-2146G (Coffee Lake)
    • Nested virtualization: On (/sys/module/kvm_intel/parameters/nested = Y)
    • Kernel: 5.15.35-2
  • Guest
    • Machine: pc-q35-6.1
    • OS: Windows 11 (Insider build 22616)
    • Hyper-V: On (via Windows Subsystem for Android)
All other things unchanged, the problem did not occur with kernel version 5.13.19-6.
 
Last edited:
With regards to Kernel 5.13.19-6 it seems to happen less frequent, but I double checked the logs and It crashed for me once on that kernel.
Since I don't have any mechanism figured out to manually reproduce that behavior I'm on a hunt for white ravens here.

I checked syslog (and syslog.1, 2.gz) for occurunces of said QEMU error, but couldn't find anything before May 10.
Bash:
root@tom:~# zcat /var/log/syslog.3.gz | grep "0x80000021"
root@tom:~# zcat /var/log/syslog.2.gz | grep "0x80000021"
root@tom:~# cat /var/log/syslog.1 | grep "0x80000021"
May 10 22:30:17 tom QEMU[651818]: KVM: entry failed, hardware error 0x80000021
May 12 14:30:24 tom QEMU[1400269]: KVM: entry failed, hardware error 0x80000021
May 12 15:30:20 tom QEMU[2757184]: KVM: entry failed, hardware error 0x80000021
May 12 18:55:07 tom QEMU[12386]: KVM: entry failed, hardware error 0x80000021
May 13 14:30:14 tom QEMU[12008]: KVM: entry failed, hardware error 0x80000021
May 13 19:00:15 tom QEMU[8912]: KVM: entry failed, hardware error 0x80000021
May 13 22:38:42 tom QEMU[11597]: KVM: entry failed, hardware error 0x80000021
May 14 00:04:40 tom QEMU[12048]: KVM: entry failed, hardware error 0x80000021
root@tom:~# cat /var/log/syslog | grep "0x80000021"
May 15 10:30:20 tom QEMU[2043102]: KVM: entry failed, hardware error 0x80000021
May 15 11:00:20 tom QEMU[1325855]: KVM: entry failed, hardware error 0x80000021

Bash:
root@tom:~# cat /var/log/syslog.1 | grep "Linux version"
May  8 12:01:06 tom kernel: [    0.000000] Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-2 (Thu, 05 May 2022 13:54:35 +0200) ()
May 12 17:31:42 tom kernel: [    0.000000] Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-2 (Thu, 05 May 2022 13:54:35 +0200) ()
May 12 21:43:17 tom kernel: [    0.000000] Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-2 (Thu, 05 May 2022 13:54:35 +0200) ()
May 13 18:25:33 tom kernel: [    0.000000] Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) ()
May 13 20:06:07 tom kernel: [    0.000000] Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) ()
May 13 22:26:50 tom kernel: [    0.000000] Linux version 5.15.30-2-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.30-3 (Fri, 22 Apr 2022 18:08:27 +0200) ()
May 13 22:41:58 tom kernel: [    0.000000] Linux version 5.13.19-6-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.13.19-15 (Tue, 29 Mar 2022 15:59:50 +0200) ()
May 14 13:34:19 tom kernel: [    0.000000] Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) ()
root@tom:~# cat /var/log/syslog | grep "Linux version"
May 15 21:24:41 tom kernel: [    0.000000] Linux version 5.13.19-6-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.13.19-15 (Tue, 29 Mar 2022 15:59:50 +0200) ()
 
Yesterday evening again a crash. Nested Virtualization is turned off: cat /sys/module/kvm_intel/parameters/nested N
Whats the best way to change the kernel to an older version?
 
Yesterday evening again a crash. Nested Virtualization is turned off: cat /sys/module/kvm_intel/parameters/nested N
Whats the best way to change the kernel to an older version?
proxmox-boot-tool is a handy option
proxmox-boot-tool kernel list to list available kernels
proxmox-boot-tool kernel pin 5.13.19-6-pve to choose 5.13.19-6-pve for example.
 
We upgraded our cluster to 7.2 and from all the vm's also 1 have it. Every day is crashes.

May 17 04:27:21 prox-s23 QEMU[3387740]: KVM: entry failed, hardware error 0x80000021 May 17 04:27:21 prox-s23 QEMU[3387740]: If you're running a guest on an Intel machine without unrestricted mode May 17 04:27:21 prox-s23 QEMU[3387740]: support, the failure can be most likely due to the guest entering an invalid May 17 04:27:21 prox-s23 QEMU[3387740]: state for Intel VT. For example, the guest maybe running in big real mode May 17 04:27:21 prox-s23 QEMU[3387740]: which is not supported on less recent Intel processors. May 17 04:27:21 prox-s23 kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. May 17 04:27:21 prox-s23 QEMU[3387740]: EAX=00000080 EBX=0f67e954 ECX=00000000 EDX=335f5000 May 17 04:27:21 prox-s23 QEMU[3387740]: ESI=22f00f90 EDI=0020a000 EBP=de692b60 ESP=2047efb0 May 17 04:27:21 prox-s23 QEMU[3387740]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 May 17 04:27:21 prox-s23 QEMU[3387740]: ES =0000 00000000 ffffffff 00809300 May 17 04:27:21 prox-s23 QEMU[3387740]: CS =b600 7ffb6000 ffffffff 00809300 May 17 04:27:21 prox-s23 QEMU[3387740]: SS =0000 00000000 ffffffff 00809300 May 17 04:27:21 prox-s23 QEMU[3387740]: DS =0000 00000000 ffffffff 00809300 May 17 04:27:21 prox-s23 QEMU[3387740]: FS =0000 00000000 ffffffff 00809300 May 17 04:27:21 prox-s23 QEMU[3387740]: GS =0000 00000000 ffffffff 00809300 May 17 04:27:21 prox-s23 QEMU[3387740]: LDT=0000 00000000 000fffff 00000000 May 17 04:27:21 prox-s23 QEMU[3387740]: TR =0040 20465000 00000067 00008b00 May 17 04:27:21 prox-s23 QEMU[3387740]: GDT= 20466fb0 00000057 May 17 04:27:21 prox-s23 QEMU[3387740]: IDT= 00000000 00000000 May 17 04:27:21 prox-s23 QEMU[3387740]: CR0=00050032 CR2=0a96afe0 CR3=3fe65002 CR4=00000000 May 17 04:27:21 prox-s23 QEMU[3387740]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 May 17 04:27:21 prox-s23 QEMU[3387740]: DR6=00000000ffff0ff0 DR7=0000000000000400 May 17 04:27:21 prox-s23 QEMU[3387740]: EFER=0000000000000000 May 17 04:27:21 prox-s23 QEMU[3387740]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed. May 17 04:27:21 prox-s23 pvestatd[3372]: VM 2023 qmp command failed - VM 2023 not running May 17 04:27:21 prox-s23 kernel: fwbr2023i0: port 2(tap2023i0) entered disabled state May 17 04:27:22 prox-s23 kernel: fwbr2023i0: port 2(tap2023i0) entered disabled state May 17 04:27:22 prox-s23 systemd[1]: 2023.scope: Succeeded. May 17 04:27:22 prox-s23 systemd[1]: 2023.scope: Consumed 1d 6h 48min 837ms CPU time. May 17 04:27:22 prox-s23 qmeventd[3897486]: Starting cleanup for 2023 May 17 04:27:22 prox-s23 kernel: fwbr2023i0: port 1(fwln2023i0) entered disabled state May 17 04:27:22 prox-s23 kernel: vmbr1v49: port 5(fwpr2023p0) entered disabled state May 17 04:27:23 prox-s23 kernel: device fwln2023i0 left promiscuous mode May 17 04:27:23 prox-s23 kernel: fwbr2023i0: port 1(fwln2023i0) entered disabled state May 17 04:27:23 prox-s23 kernel: device fwpr2023p0 left promiscuous mode May 17 04:27:23 prox-s23 kernel: vmbr1v49: port 5(fwpr2023p0) entered disabled state May 17 04:27:23 prox-s23 qmeventd[3897486]: Finished cleanup for 2023
 
May 10 22:30:17 tom QEMU[651818]: KVM: entry failed, hardware error 0x80000021
May 12 14:30:24 tom QEMU[1400269]: KVM: entry failed, hardware error 0x80000021
May 12 15:30:20 tom QEMU[2757184]: KVM: entry failed, hardware error 0x80000021
May 13 14:30:14 tom QEMU[12008]: KVM: entry failed, hardware error 0x80000021
This looks like a pattern.
Is there anything running at those times?
Backup, replication on the PVE host?
Something in the VM?
 
This looks like a pattern.
Is there anything running at those times?
Backup, replication on the PVE host?
Something in the VM?
Yes, I do run replication */30, this sometimes lines up, sometimes it does not, and my VM isn't crashing every 30min.

I'll evaluate the hint with SMM to see what my options are.
 
After disabling NestedVirtualisation ( via /etc/modprobe.d/kvm-intel.conf ), my problematic VM (WinServer 2022, MSsql server) on Xeon W-2125 is running for "1d 14h" and no crashes yet, holding my fingers for this to last...

i hope this will narrow down the problematic behaviour for some ppl
 
After disabling NestedVirtualisation ( via /etc/modprobe.d/kvm-intel.conf ), my problematic VM (WinServer 2022, MSsql server) on Xeon W-2125 is running for "1d 14h" and no crashes yet, holding my fingers for this to last...

i hope this will narrow down the problematic behaviour for some ppl

And is it simple possible to migrate the vm from the host, then change the option. Reboot host and then migrate back?
Or in what order did you do it?
 
And is it simple possible to migrate the vm from the host, then change the option. Reboot host and then migrate back?
Or in what order did you do it?
I didnt migrate anything (just backup to be sure nothing gets broken)

but if you need the VM to be running, sure, migrate to secondary node, make changes on the "affected" node and migrate back

BUT im not sure if the VM will notice change of CPU's options without reboot (like the disabled NestedVirt), so i would hard-reboot the VM if you can

Someone earlier described the options but here you are :)
to make changes, just run these in CLI of PVE:

#create file, insert desired options
touch /etc/modprobe.d/kvm-intel.conf | echo 'options kvm-intel nested=0' > /etc/modprobe.d/kvm-intel.conf
#check contents
cat /etc/modprobe.d/kvm-intel.conf
#update initrams to apply config changes and reboot node
update-initframs -u
reboot

after reboot check if nested is REALY OFF with:
cat /sys/module/kvm_intel/parameters/nested

it MUST result negative
 
Last edited:
After disabling NestedVirtualisation ( via /etc/modprobe.d/kvm-intel.conf ), my problematic VM (WinServer 2022, MSsql server) on Xeon W-2125 is running for "1d 14h" and no crashes yet, holding my fingers for this to last...

i hope this will narrow down the problematic behaviour for some ppl
Okay, nevermind, VM crashed again this morning at 5:15, roughly arround time of previous crashes. Going back to 5.13.x kernell
 
I think I'm having this issue with my poweredge R730XD which has dual E5-2640 v3's. I get the same error as the OP about the "hardware error 0x80000021". I am also using 5.15.35-1-pve.
 
Can you try disabling SMM?
To do so you'll have to run the VM manually. First run qm showcmd <VMID> --pretty and copy the content to a file.
Modify the -machine line by adding ,smm=off.
Then run that command.


One more question, do all of you use UEFI with pre-enrolled keys and perhaps even secure boot?
 
Last edited:
Having the same issue once per 1 to 4 days. What I've already tried:
Disabling/enabling nested virtualization
Disabling/enabling ignore_msrs
Disabling/enabling CPU microcode
Disabling/enabling CFG (Control flow guard)

Code:
May 20 04:42:07 m2404 QEMU[2035]: KVM: entry failed, hardware error 0x80000021
May 20 04:42:07 m2404 QEMU[2035]: If you're running a guest on an Intel machine without unrestricted mode
May 20 04:42:07 m2404 QEMU[2035]: support, the failure can be most likely due to the guest entering an invalid
May 20 04:42:07 m2404 QEMU[2035]: state for Intel VT. For example, the guest maybe running in big real mode
May 20 04:42:07 m2404 QEMU[2035]: which is not supported on less recent Intel processors.
May 20 04:42:07 m2404 QEMU[2035]: EAX=00177df0 EBX=e02d7180 ECX=00000000 EDX=00000000
May 20 04:42:07 m2404 QEMU[2035]: ESI=e02e31c0 EDI=0c9f6080 EBP=7d8d8870 ESP=7d8d8690
May 20 04:42:07 m2404 QEMU[2035]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
May 20 04:42:07 m2404 QEMU[2035]: ES =0000 00000000 ffffffff 00809300
May 20 04:42:07 m2404 kernel: [132860.740916] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 20 04:42:07 m2404 QEMU[2035]: CS =c000 7ffc0000 ffffffff 00809300
May 20 04:42:07 m2404 QEMU[2035]: SS =0000 00000000 ffffffff 00809300
May 20 04:42:07 m2404 QEMU[2035]: DS =0000 00000000 ffffffff 00809300
May 20 04:42:07 m2404 QEMU[2035]: FS =0000 00000000 ffffffff 00809300
May 20 04:42:07 m2404 QEMU[2035]: GS =0000 00000000 ffffffff 00809300
May 20 04:42:07 m2404 QEMU[2035]: LDT=0000 00000000 000fffff 00000000
May 20 04:42:07 m2404 QEMU[2035]: TR =0040 e02e6000 00000067 00008b00
May 20 04:42:07 m2404 QEMU[2035]: GDT=     e02e7fb0 00000057
May 20 04:42:07 m2404 QEMU[2035]: IDT=     00000000 00000000
May 20 04:42:07 m2404 QEMU[2035]: CR0=00050032 CR2=6ceab000 CR3=3f935000 CR4=00000000
May 20 04:42:07 m2404 QEMU[2035]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
May 20 04:42:07 m2404 QEMU[2035]: DR6=00000000ffff0ff0 DR7=0000000000000400
May 20 04:42:07 m2404 QEMU[2035]: EFER=0000000000000000
May 20 04:42:07 m2404 QEMU[2035]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.

pveversion
pve-manager/7.2-4/ca9d43cc (running kernel: 5.15.35-1-pve)

CPU
Code:
processor       : [0 to 11]
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 3400.000
cache size      : 12288 KB
physical id     : 0
siblings        : 12
core id         : [0 to 5]
cpu cores       : 6
apicid          : [0 to 11]
initial apicid  : [0 to 11]
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d
vmx flags       : vnmi preemption_timer invvpid ept_x_only ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 6803.85
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

qm conf:
Code:
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=ide2;scsi0
cores: 6
cpu: IvyBridge
efidisk0: local:100/vm-100-disk-0.vmdk,efitype=4m,pre-enrolled-keys=1,size=528K
hotplug: disk,network,usb
ide0: local:iso/virtio-win.iso,media=cdrom,size=519096K
ide2: local:iso/20348.169.210806-2348.fe_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.iso,media=cdrom,size=5420734K
machine: pc-q35-6.2
memory: 51200
meta: creation-qemu=6.2.0,ctime=1652199152
name: win
net0: e1000=CA:56:64:87:DE:71,bridge=vmbr1
numa: 0
onboot: 1
ostype: win11
protection: 1
scsi0: local:100/vm-100-disk-3.qcow2,cache=writeback,discard=on,size=100G
scsi1: sdb:vm-100-disk-0,backup=0,cache=writethrough,size=1T
scsihw: virtio-scsi-pci
smbios1: uuid=8227f300-01f3-45bf-b649-96106ef72970
sockets: 1
tpmstate0: local:100/vm-100-disk-2.raw,size=4M,version=v2.0
vmgenid: e259b066-0776-432a-baa6-74638f2313d9
 
Can you try disabling SMM?
To do so you'll have to run the VM manually. First run qm showcmd <VMID> --pretty and copy the content to a file.
Modify the -machine line by adding ,smm=off.
Then run that command.


One more question, do all of you use UEFI with pre-enrolled keys and perhaps even secure boot?
Manual run won't find TPM device
kvm: -chardev socket,id=tpmchar,path=/var/run/qemu-server/100.swtpm: Failed to connect to '/var/run/qemu-server/100.swtpm': No such file or directory
 
Did someone test the kernel update to 5.15.35-3 to this issue? I am having the same issue with the 5.15.35-1 Kernel and there are two updates.
Maybe the updates solve the problem?
 
Did someone test the kernel update to 5.15.35-3 to this issue? I am having the same issue with the 5.15.35-1 Kernel and there are two updates.
Maybe the updates solve the problem?
I upgrade to the latest -3 when i disabled the nested virtualization and since then no crash.
Could been fixed in -3 or the nested option works.