VM shutdown during backup to QNAP, KVM: entry failed, hardware error 0x80000021

BeWu

Member
May 26, 2021
8
2
8
44
Hello,

This night one of my backups didn't went well. QNAP was its destination. Second backup of this machine went well but it was destined to PBS.
Kernel 5.15.30-2, QNAP NFS version set to 4. I attach email from proxmox and short info from syslog, lscpu and vm confguration.

Thats the first time since 2 years I have such an issue, that's why I consider it strange, especially that there is kernel-qnap case.
 

Attachments

Hm -
May 06 03:17:59 cssrv kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 06 03:17:59 cssrv QEMU[5460]: KVM: entry failed, hardware error 0x80000021
May 06 03:17:59 cssrv QEMU[5460]: If you're running a guest on an Intel machine without unrestricted mode
May 06 03:17:59 cssrv QEMU[5460]: support, the failure can be most likely due to the guest entering an invalid
May 06 03:17:59 cssrv QEMU[5460]: state for Intel VT. For example, the guest maybe running in big real mode
May 06 03:17:59 cssrv QEMU[5460]: which is not supported on less recent Intel processors.
May 06 03:17:59 cssrv QEMU[5460]: EAX=00000000 EBX=0fba3fb0 ECX=0fba3fb0 EDX=00000000
May 06 03:17:59 cssrv QEMU[5460]: ESI=78b9f080 EDI=0ff088e0 EBP=0fba4000 ESP=0ff088e0
May 06 03:17:59 cssrv QEMU[5460]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
May 06 03:17:59 cssrv QEMU[5460]: ES =0000 00000000 ffffffff 00809300
May 06 03:17:59 cssrv QEMU[5460]: CS =c000 7ffc0000 ffffffff 00809300
May 06 03:17:59 cssrv QEMU[5460]: SS =0000 00000000 ffffffff 00809300
May 06 03:17:59 cssrv QEMU[5460]: DS =0000 00000000 ffffffff 00809300
May 06 03:17:59 cssrv QEMU[5460]: FS =0000 00000000 ffffffff 00809300
May 06 03:17:59 cssrv QEMU[5460]: GS =0000 00000000 ffffffff 00809300
May 06 03:17:59 cssrv QEMU[5460]: LDT=0000 00000000 00000000 00000000
May 06 03:17:59 cssrv QEMU[5460]: TR =0040 cb8a0000 00000067 00008b00
May 06 03:17:59 cssrv QEMU[5460]: GDT= cb8a1fb0 00000057
May 06 03:17:59 cssrv QEMU[5460]: IDT= 00000000 00000000
May 06 03:17:59 cssrv QEMU[5460]: CR0=00050032 CR2=2dd81aa0 CR3=254a9002 CR4=00000000
May 06 03:17:59 cssrv QEMU[5460]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
May 06 03:17:59 cssrv QEMU[5460]: DR6=00000000ffff4ff0 DR7=0000000000000400
May 06 03:17:59 cssrv QEMU[5460]: EFER=0000000000000000
May 06 03:17:59 cssrv QEMU[5460]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.


this would indicate some issue in your cpu, the vm config, the kernel or qemu ...

Things I'd suggest to pin this down a bit more:
* try installing the latest available BIOS/Firmware version for your system
* make sure to have installed the latest intel-microcode package and reboot
* the VM's config looks a bit specific - but if you can try booting the cpu-type KVM64

If this does not help:
* try reproducing the issue with an older 5.13 kernel
* if it still happens - try downgrading pve-qemu-kvm: `apt install pve-qemu-kvm=6.1.1-2`

what kind of workload is running in the VM?



I did not manage to find out if the Ivy Bridge processors in your system do support the mentioned unrestricted mode.
 
Sorry for late anwser - wasn't able to reproduce the problem so far \-("_")-/

what kind of workload is running in the VM?
VM is running Windows Server 2022 app for accounting (many small databases on mssql) and IIS for web services of this app.

The configuration worked great for 2 years and now this one separate case happened.... propably some unexpected disturbance in the force..;-).
Will post here if happens again.
 
Sorry for late anwser - wasn't able to reproduce the problem so far \-("_")-/
no stress - if the issue did not reappear that also might point to it not being something really broken

Will post here if happens again.
Thanks!
 
I actually saw my Windows 11 VM crashed two times with the same error in the log after upgrading to 7.2, never saw on 7.1
thanks for the report - please try the suggestions from above - and tell us if they improve the situation
 
I saw 3 times same error after upgrade 7.2. PVE 7.2 host and windows server 2022 guest with kvm64.
after first error, install intel microcode and check my host's VT settings in bios.

My host:

CPU(s)

8 x Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz (1 Socket)
Kernel Version

Linux 5.15.35-1-pve #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200)
PVE Manager Version

pve-manager/7.2-4/ca9d43cc


1653644103610.png
 
Last edited:
Have some update on the situation... Unexpected VM stop happend 2 more times, on another host. Each time during Windows Srv 2022 update.
It was ONLY these two more times, and I haven't observed it since then.

@Stoiko Ivanov :
  • I cannot check KVM64 cpu as this is production VM and it slows very much‍
  • All microcodes/bioses are up to date
  • As it happened few times only so far I didn't test on 5.13 kernel or pve-qemu-kvm=6.1.1-2
Kernel:
Linux cssrv 5.15.35-1-pve #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) x86_64 GNU/Linux



---------EDIT 30.05.2022-----------

And almost just after posting above vm crashed again. 15 minutes after finished backup, possibly without connection to it.
Gonna switch back to 5.13 kernel.
 

Attachments

Last edited:
Me too, same thing.
Strange thing is the occurrence of patterns.
Backup runs at 00:30 a.m. It usually finishes at 3 a.m.
During backup, machine 103 crashes sometimes.
By 6 a.m., machine 100 and 103 are always crashed.
Sometimes, they crash again at 11 a.m.
Machine 102 hardly ever crashes.
Machine 101 mostly crashes on reboots.
They are all Windows Server 2019. 100 DC with CA, 101 DC, 102 RDS, 103 File and Print

After deleting EFI disks and recreating them, the problem disappeared for three days, maybe accidentally.
Two hours ago, I saw an occurrence again.
 
Having the exact same error right here. Windows server keeps on crashing with the same errors every 2 nights or so after backup.

Did op already try?
Hm -



this would indicate some issue in your cpu, the vm config, the kernel or qemu ...

Things I'd suggest to pin this down a bit more:
* try installing the latest available BIOS/Firmware version for your system
* make sure to have installed the latest intel-microcode package and reboot
* the VM's config looks a bit specific - but if you can try booting the cpu-type KVM64

If this does not help:
* try reproducing the issue with an older 5.13 kernel
* if it still happens - try downgrading pve-qemu-kvm: `apt install pve-qemu-kvm=6.1.1-2`

what kind of workload is running in the VM?



I did not manage to find out if the Ivy Bridge processors in your system do support the mentioned unrestricted mode.


Edit: I was running the KVM64 type from the beginning. So that is probably not it.
 
Last edited:
I've upgraded two hosts to 7.2 today and I have experienced a VM on both hosts randomly powered off.

One seemd to be just as the backup window started and the other not so.

Need to do some further digging as it is late but seeing this on both hosts so far:

Jun 2 19:01:10 pve-fd QEMU[3658]: KVM: entry failed, hardware error 0x80000021 Jun 2 19:01:10 pve-fd QEMU[3658]: If you're running a guest on an Intel machine without unrestricted mode Jun 2 19:01:10 pve-fd QEMU[3658]: support, the failure can be most likely due to the guest entering an invalid Jun 2 19:01:10 pve-fd QEMU[3658]: state for Intel VT. For example, the guest maybe running in big real mode Jun 2 19:01:10 pve-fd QEMU[3658]: which is not supported on less recent Intel processors. Jun 2 19:01:10 pve-fd QEMU[3658]: EAX=6e813550 EBX=00000102 ECX=00000000 EDX=00000000 Jun 2 19:01:10 pve-fd QEMU[3658]: ESI=00000000 EDI=e1a58040 EBP=3a4f43f0 ESP=3a4f4370 Jun 2 19:01:10 pve-fd QEMU[3658]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 Jun 2 19:01:10 pve-fd QEMU[3658]: ES =0000 00000000 ffffffff 00809300 Jun 2 19:01:10 pve-fd QEMU[3658]: CS =be00 7ffbe000 ffffffff 00809300 Jun 2 19:01:10 pve-fd QEMU[3658]: SS =0000 00000000 ffffffff 00809300 Jun 2 19:01:10 pve-fd QEMU[3658]: DS =0000 00000000 ffffffff 00809300 Jun 2 19:01:10 pve-fd QEMU[3658]: FS =0000 00000000 ffffffff 00809300 Jun 2 19:01:10 pve-fd QEMU[3658]: GS =0000 00000000 ffffffff 00809300 Jun 2 19:01:10 pve-fd QEMU[3658]: LDT=0000 00000000 000fffff 00000000 Jun 2 19:01:10 pve-fd QEMU[3658]: TR =0040 6afe8000 00000067 00008b00 Jun 2 19:01:10 pve-fd QEMU[3658]: GDT= 6afe9fb0 00000057 Jun 2 19:01:10 pve-fd QEMU[3658]: IDT= 00000000 00000000 Jun 2 19:01:10 pve-fd QEMU[3658]: CR0=00050032 CR2=68329190 CR3=001ae000 CR4=00000000 Jun 2 19:01:10 pve-fd QEMU[3658]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 Jun 2 19:01:10 pve-fd QEMU[3658]: DR6=00000000ffff0ff0 DR7=0000000000000400 Jun 2 19:01:10 pve-fd kernel: [17530.825546] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. Jun 2 19:01:10 pve-fd QEMU[3658]: EFER=0000000000000000 Jun 2 19:01:10 pve-fd QEMU[3658]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed. Jun 2 19:01:11 pve-fd systemd[1]: 110.scope: Succeeded. Jun 2 19:01:11 pve-fd systemd[1]: 110.scope: Consumed 30min 5.328s CPU time. Jun 2 19:01:12 pve-fd qmeventd[154004]: Starting cleanup for 110 Jun 2 19:01:12 pve-fd qmeventd[154004]: Finished cleanup for 110

Jun 2 18:17:44 pve-pa QEMU[3302]: KVM: entry failed, hardware error 0x80000021 Jun 2 18:17:44 pve-pa QEMU[3302]: If you're running a guest on an Intel machine without unrestricted mode Jun 2 18:17:44 pve-pa QEMU[3302]: support, the failure can be most likely due to the guest entering an invalid Jun 2 18:17:44 pve-pa QEMU[3302]: state for Intel VT. For example, the guest maybe running in big real mode Jun 2 18:17:44 pve-pa QEMU[3302]: which is not supported on less recent Intel processors. Jun 2 18:17:44 pve-pa QEMU[3302]: EAX=00024ca5 EBX=4019e180 ECX=00000000 EDX=00000000 Jun 2 18:17:44 pve-pa QEMU[3302]: ESI=401aa440 EDI=1e9690c0 EBP=ac82b0f0 ESP=ac82af10 Jun 2 18:17:44 pve-pa QEMU[3302]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0 Jun 2 18:17:44 pve-pa QEMU[3302]: ES =0000 00000000 ffffffff 00809300 Jun 2 18:17:44 pve-pa QEMU[3302]: CS =b400 7ffb4000 ffffffff 00809300 Jun 2 18:17:44 pve-pa QEMU[3302]: SS =0000 00000000 ffffffff 00809300 Jun 2 18:17:44 pve-pa QEMU[3302]: DS =0000 00000000 ffffffff 00809300 Jun 2 18:17:44 pve-pa QEMU[3302]: FS =0000 00000000 ffffffff 00809300 Jun 2 18:17:44 pve-pa QEMU[3302]: GS =0000 00000000 ffffffff 00809300 Jun 2 18:17:44 pve-pa QEMU[3302]: LDT=0000 00000000 00000000 00000000 Jun 2 18:17:44 pve-pa QEMU[3302]: TR =0040 401ae000 00000067 00008b00 Jun 2 18:17:44 pve-pa QEMU[3302]: GDT= 401affb0 00000057 Jun 2 18:17:44 pve-pa QEMU[3302]: IDT= 00000000 00000000 Jun 2 18:17:44 pve-pa QEMU[3302]: CR0=00050032 CR2=0179fd20 CR3=4335b000 CR4=00000000 Jun 2 18:17:44 pve-pa QEMU[3302]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 Jun 2 18:17:44 pve-pa QEMU[3302]: DR6=00000000ffff0ff0 DR7=0000000000000400 Jun 2 18:17:44 pve-pa QEMU[3302]: EFER=0000000000000000 Jun 2 18:17:44 pve-pa QEMU[3302]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed. Jun 2 18:17:44 pve-pa kernel: [14473.555487] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. Jun 2 18:17:44 pve-pa systemd[1]: 110.scope: Succeeded. Jun 2 18:17:44 pve-pa systemd[1]: 110.scope: Consumed 55min 28.794s CPU time. Jun 2 18:17:45 pve-pa qmeventd[46147]: Starting cleanup for 110 Jun 2 18:17:45 pve-pa qmeventd[46147]: Finished cleanup for 110
 
Hi everyone,

Actually I managed to make some experiments and - WITHOUT rolling back to 5.13 kernel, I changed machine type to i440...and had none of these VM's stops.
In windows event viewer there are still some kernel errors, but I am not sure if they are related.

Most important is that none of machines set to i440 crashed since change.
 
  • Like
Reactions: Stoiko Ivanov