VM shutdown, KVM: entry failed, hardware error 0x80000021

DIMECH_A · Jul 25, 2022

Hello everyone, We have a similar problem with the maj kernel in 5.15.39 and since the MAJ 7.2.7.
I explain myself VMs under Windows server 2019 cut without reason around 22h following a failure of the backup or SQLL maj have you been able to correct this problem?

mishki · Jul 25, 2022

t.lamprecht said:
No, currently you need to disable two-dimensional paging for the MMU (tdp_mmu) manually if your setup is affected, or better first check that you have the newest bios/firmware and CPU microcode installed, as then you may not even require the workaround anymore.

It's not an easy choice, but that TDP feature brings non-negligible performance gain and works well in most HW (more likely if released in the last 8 to maybe even 10 years), so we want to opt in to that sooner or later anyway. We'll still try to find a better way of making this transparent, as we naturally understand that users ideally wouldn't have to do anything.

Thanks. I started to check that I installed the latest updates, but apparently I forgot to do a full host reboot and saw the errors. Also, Lenovo, Fujitsu, etc, are blocking drivers/bios/firmware downloads in my country, causing some inconvenience.

ThinkSystem SR630 (2xCPU 1 Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz)

eider · Jul 25, 2022

t.lamprecht said:
Many thanks for such information, this can be valuable on nailing the actual range of models possibly affected and also possibly the underlying issue that could help in either avoiding the bug or atleast automatically disable the new feature.

That said, this is two years old consumer HW, Definitively not bad for home lab usage and also not old, but not exactly new either.
Note also that Comet Lake is also the last iteration of the Sky Lake micro architecture, so it's based on a bit older design (even if it def. went through quite a few evolutionary changes and improvements), so it could seem like all/more of the Sky Lake derived (consumer?) models (6th to 10th gen) may be affected.

That's pretty huge range and since it also catches most of Silver/Gold Xeons from past few years (almost no company replaces servers every year for newer CPU) it becomes quite a big issue. It definitely needs to be resolved one way or another. Preferably it should be resolved upstream, as I doubt this is acceptable behavior for KVM here on such a huge chunk of CPUs.

Also, there ~~is at least one (I have not checked all posts, only about recent half) report with 12th gen CPU~~ are two reports with 11th and 12th gens:
- https://forum.proxmox.com/threads/v...re-error-0x80000021.109410/page-6#post-476432 (i5-11400)
- https://forum.proxmox.com/threads/v...re-error-0x80000021.109410/page-8#post-478903 (i5-12400)

moonman · Jul 27, 2022

t.lamprecht said:
Many thanks for such information, this can be valuable on nailing the actual range of models possibly affected and also possibly the underlying issue that could help in either avoiding the bug or atleast automatically disable the new feature.

That said, this is two years old consumer HW, Definitively not bad for home lab usage and also not old, but not exactly new either.
Note also that Comet Lake is also the last iteration of the Sky Lake micro architecture, so it's based on a bit older design (even if it def. went through quite a few evolutionary changes and improvements), so it could seem like all/more of the Sky Lake derived (consumer?) models (6th to 10th gen) may be affected.

FYI I've been having this issue on 11th gen mobile CPU i5-1145G7 (before the workaround) and no issues on i7-9700. Both have latest proxmox and run Windows server 2022 Standard. The one with 9700 still has no workaround applied and it has been rock solid.

RUV · Jul 27, 2022

Older Hardware and New 5.15 Kernel
KVM: entry failed, hardware error 0x80000021
Background
With the 5.15 kernel, the two-dimensional paging (TDP) memory management unit (MMU) implementation got activated by default. The new implementation reduces the complexity of mapping the guest OS virtual memory address to the host's physical memory address and improves performance, especially during live migrations for VMs with a lot of memory and many CPU cores. However, the new TDP MMU feature has been shown to cause regressions on some (mostly) older hardware, likely due to assumptions about when the fallback is required not being met by that HW.

The problem manifests as crashes of the machine with a kernel (dmesg) or journalctl log entry with, among others, a line like this:

KVM: entry failed, hardware error 0x80000021

Normally there's also an assert error message logged from the QEMU process around the same time. Windows VMs are the most commonly affected in the user reports.

The affected models could not get pinpointed exactly, but it seems CPUs launched over 8 years ago are most likely triggering the issue. Note that there are known cases where updating to the latest available firmware (BIOS/EFI) and CPU microcode fixed the regression. Thus, before trying the workaround below, we recommend ensuring that you have the latest firmware and CPU microcode installed.

Workaround: Disable tdp_mmu
The tdp_mmu kvm module option can be used to force disabling the usage of the two-dimensional paging (TDP) MMU.

You can either add that parameter to the PVE host's kernel command line as kvm.tdp_mmu=N, see this reference documentation section.

Alternatively, set the module option using a modprobe config, for example:

echo "options kvm tdp_mmu=N" >/etc/modprobe.d/kvm-disable-tdp-mmu.conf
To finish applying the workaround, always run update-initramfs -k all -u to update the initramfs for all kernels and then reboot the Proxmox VE host.

You can confirm that the change is active by checking that the output ofcat /sys/module/kvm/parameters/tdp_mmu is N.

Good afternoon, tell me how to change this parameter on Debian? To position - N.
When executing a command from the manual, it outputs:


root@Line-host:~# cat /sys/module/kvm/parameters/tdp_mmu
Y
root@Line-host:~# echo "options kvm tdp_mmu=N" >/etc/modprobe.d/kvm-disable-tdp-mmu.conf
@Line-host:~# update-initramfs -k all -u
update-initramfs: Generating /boot/initrd.img-5.15.39-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.15.35-2-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.13.19-6-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.13.19-2-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.

Stoiko Ivanov · Jul 27, 2022

RUV said:
When executing a command from the manual, it outputs:

looks ok on a first glance?
you do need to reboot the PVE-node for the new parameter-setting to take effect.

t.lamprecht · Jul 27, 2022

Hi all,

we got a new available kernel on pvetest including a patch series that may alleviate this issue without disabling tdp_mmu, it's in the pve-kernel-5.15.39-3-pve package with version 5.15.39-3.

At least the reproducer we got doesn't trigger a crash/kernel error on that kernel after running for more than 2 hours. For context, typical time required for triggering the issue was below half an hour, and even the rare occurrences when it took longer were below 2 hours.

The backport of the series consisting of six patches wasn't trivial, but it also wasn't hard, upstream may want to adapt the approach a bit so while we did not notice any regression and our reproducer runs now stable I wouldn't yet recommend this on production work loads. But, if you got a test instance that could trigger this it would be great to update to the newer kernel pve-kernel-5.15.39-3-pve package with version 5.15.39-3, drop disabling the tdp_mmu and reboot to see if that also is a valid workaround, we'd appreciate any feedback.

For the pvetest repo details see https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

FYI, the following command's output must contain the line below with a 1:1 match for your system to ensure that it actually runs the kernel with the alternative fix, with which one doesn't have to disable tdp_mmu.

Code:

dmesg | head -2
[    0.000000] Linux version 5.15.39-3-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #2 SMP PVE 5.15.39-3 (Wed, 27 Jul 2022 13:45:39 +0200) ()

alfe · Jul 27, 2022

ToqQrrl said:
So I think that the two-dimensional paging problem (TDP) is linked to those windows features ... it's probably all related to nesting virtualization that is causing problems to TDP in the 5.15.x linux kernel.

Nested virtualization also doesn't work anymore with kernel 5.15 unrelated to modern OS like Windows 11 or Server 2022, but also on rather dated OS like Windows 8.1 and XP running Microsoft Virtual PC inside it to run even older OS like Windows 95, 98, 2000 and the likes. This worked perfectly using 5.13 but not anymore on 5.15.

t.lamprecht · Jul 27, 2022

alfe said:
Nested virtualization also doesn't work anymore with kernel 5.15 unrelated to modern OS like Windows 11 or Server 2022, but also on rather dated OS like Windows 8.1 and XP running Microsoft Virtual PC inside it to run even older OS like Windows 95, 98, 2000 and the likes. This worked perfectly using 5.13 but not anymore on 5.15.

It works here on a E5-2620 v3, a 12th gen intel (i7-12700K alder lake) workstation and my old i9-9900K and a dual socket epic 7351 one run also fine w.r.t. nested virtualization, most of my colleagues also use nested virt., and I don't know of any problem there...
FWIW, with the aforementioned kernel I also tested a nested VM in hyper v (win 2022) on Proxmox VE just today, worked out fine.

In any way, unrelated to this as the issue def. is not related directly to nesting, just like it wasn't to specter mitigations, both just make it trigger more easily. SMM (secure machine mode) required for secure boot in UEFI is more likely to be the underlying issue.

Open a new thread for your nested virt issues, include vm config, cpu details and issue descriptions or other info possibly relevant for reproducing.

basteagow · Jul 29, 2022

t.lamprecht said:
Hi all,

we got a new available kernel on pvetest including a patch series that may alleviate this issue without disabling tdp_mmu, it's in the pve-kernel-5.15.39-3-pve package with version 5.15.39-3.

At least the reproducer we got doesn't trigger a crash/kernel error on that kernel after running for more than 2 hours. For context, typical time required for triggering the issue was below half an hour, and even the rare occurrences when it took longer were below 2 hours.

The backport of the series consisting of six patches wasn't trivial, but it also wasn't hard, upstream may want to adapt the approach a bit so while we did not notice any regression and our reproducer runs now stable I wouldn't yet recommend this on production work loads. But, if you got a test instance that could trigger this it would be great to update to the newer kernel pve-kernel-5.15.39-3-pve package with version 5.15.39-3, drop disabling the tdp_mmu and reboot to see if that also is a valid workaround, we'd appreciate any feedback.

For the pvetest repo details see https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

FYI, the following command's output must match 1:1 for your system to ensure that it actually runs the kernel with the alternative fix, with which one doesn't have to disable tdp_mmu.

Code:

dmesg | head -1 [ 0.000000] Linux version 5.15.39-3-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #2 SMP PVE 5.15.39-3 (Wed, 27 Jul 2022 13:45:39 +0200) ()

The new 5.15.39-3 test kernel finally fixes the problem for me without having to turn off tdp_mmu.

Bash:

# dmesg | head -2
[    0.000000] microcode: microcode updated early to revision 0xf0, date = 2021-11-15
[    0.000000] Linux version 5.15.39-3-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #2 SMP PVE 5.15.39-3 (Wed, 27 Jul 2022 13:45:39 +0200) ()

Bash:

# cat /sys/module/kvm/parameters/tdp_mmu
Y

Thank you for making this happen!

t.lamprecht · Jul 29, 2022

basteagow said:
The new 5.15.39-3 test kernel finally fixes the problem

Many thanks for providing feedback!

jwbryan · Aug 1, 2022

Installed and am testing. My knee-jerk reaction is this helps, but there are some performance issues. The impacted machines no longer crash under load, but their network performance now struggles to keep up. Local speedtests from the machine have swings in performance and throughput (kind of critical when running a file server in this case). I'm not sure if it is a CPU issue or a network issue.

But - concur - am not seeing the VM's crash under load anymore.

Will continue to keep an eye on this thread. Thank you for the work resolving this!

t.lamprecht · Aug 1, 2022

jwbryan said:
Installed and am testing. My knee-jerk reaction is this helps, but there are some performance issues. The impacted machines no longer crash under load, but their network performance now struggles to keep up. Local speedtests from the machine have swings in performance and throughput (kind of critical when running a file server in this case). I'm not sure if it is a CPU issue or a network issue.

That's probably the new retbleed mitigations, which came in with that kernel but aren't itself part of the fix for this specific issue. Check lscpu to see if that theory holds up, it'll show something like
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
if that HW issue got detected and the mitigation is active or Not affected if it isn't.

jwbryan · Aug 7, 2022

Sorry for dragging my feet on this. I've been testing to see what is working and what isn't.

1. Windows workloads have issues. I run several Windows workloads and find the odd network slowdown happens on both 2019 and 2022 servers. It looks like a bunch of interrupts stack up and overwhelm the VM, causing degradation in network performance. In fairness, I'm not sure if the issues existed before the patch. I'm also not sure if it is the patch or the network drivers (paravirtual, RedHat) causing the problem. The point of the patch (for me) was to get a 2022 _FILE_ server online which would crash under load. Other servers - like a domain controller - never experienced network loads like the file server, which was new and currently runs on Hyper-V boxes.

2. I don't believe Linux-y/Unix-y workloads to be impacted. I run a firewall (OPNSense/FreeBSD) through the servers as well and do not _THINK_ I see network performance issues (more below). I get near-line rates (1GBps) for the box running routing, Suricata (in IPS mode), and a firewall.

Note: While performance is near line rates, I do occasionally see dips that I can't attribute to anything. The dips COULD be the perf server on the other side, a network traffic blip, or networking issues similar to the Windows servers. I don't know.

3. While I don't see networking issues <edit> on Linux Workloads</edit>, I am seeing other issues I've not seen before. On the FreeBSD/OPNSense firewall, I am seeing the clock run backwards:

calcru: runtime went backwards from 1373029 usec to 763269 usec for pid 14053 (dpinger)
calcru: runtime went backwards from 96102 usec to 53228 usec for pid 14053 (dpinger)

The system is using KVMCLOCK as the time counter. I'm also seeing something weird with the disk:

(da0:vtscsi0:0:0:0): WRITE(10). CDB: 2a 00 01 c9 73 28 00 01 00 00
(da0:vtscsi0:0:0:0): CAM status: Command timeout
(da0:vtscsi0:0:0:0): Retrying command, 3 more tries remain
(da0:vtscsi0:0:0:0): WRITE(10). CDB: 2a 00 01 ba 96 a8 00 00 40 00
(da0:vtscsi0:0:0:0): CAM status: Command timeout
(da0:vtscsi0:0:0:0): Retrying command, 3 more tries remain

Neither of these issues happened before the update (I've run the FW for 5+ years without ever seeing this), so am leaning to the patch as the issue.

4. LSCPU does not report a RETBLEED vulnerability on the CPUs in the affected PROXMOX hosts. Obviously am not checking the guests.

Again, apologize for the delay, wanted to run tests as able.

moonman · Aug 7, 2022

Code:

calcru: runtime went backwards from 1373029 usec to 763269 usec for pid 14053 (dpinger)
calcru: runtime went backwards from 96102 usec to 53228 usec for pid 14053 (dpinger)

I also run OPNsense on proxmox and have been seeing these for as long as I remember. Just a friendly FYI, not sure if it affects anything and what causes it though.

Younex · Aug 8, 2022

t.lamprecht said:
Hi all,

we got a new available kernel on pvetest including a patch series that may alleviate this issue without disabling tdp_mmu, it's in the pve-kernel-5.15.39-3-pve package with version 5.15.39-3.

At least the reproducer we got doesn't trigger a crash/kernel error on that kernel after running for more than 2 hours. For context, typical time required for triggering the issue was below half an hour, and even the rare occurrences when it took longer were below 2 hours.

The backport of the series consisting of six patches wasn't trivial, but it also wasn't hard, upstream may want to adapt the approach a bit so while we did not notice any regression and our reproducer runs now stable I wouldn't yet recommend this on production work loads. But, if you got a test instance that could trigger this it would be great to update to the newer kernel pve-kernel-5.15.39-3-pve package with version 5.15.39-3, drop disabling the tdp_mmu and reboot to see if that also is a valid workaround, we'd appreciate any feedback.

For the pvetest repo details see https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

FYI, the following command's output must contain the line below with a 1:1 match for your system to ensure that it actually runs the kernel with the alternative fix, with which one doesn't have to disable tdp_mmu.

Code:

dmesg | head -2 [ 0.000000] Linux version 5.15.39-3-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #2 SMP PVE 5.15.39-3 (Wed, 27 Jul 2022 13:45:39 +0200) ()

5.15.39-3 i don´t get on enterprise repository. Will it be pushed soon?

it-da.de · Aug 8, 2022

Hallo, I don't use Proxmox yet but got the same issue here:
Server is on Ubuntu 22.04 wich kernel 5.15.0-40-generic. CPU is Xeon(R) CPU E5-2640 v4. Only one VM is affectec, running win2k22 with sqlserver 2019. error happend 2 times in 8 Days. No workaround anabled yet. This Message is for you to determine the affectet CPUs. And I also use
<type arch='x86_64' machine='pc-q35-5.2'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS_4M.ms.fd'>/var/lib/libvirt/qemu/nvram/win2k22-sqlserver_VARS.fd</nvram>
With pc-q35-6.? i can't use the network neighter virtio and e1000.

t.lamprecht · Aug 8, 2022

it-da.de said:
Server is on Ubuntu 22.04 wich kernel 5.15.0-40-generic. CPU is Xeon(R) CPU E5-2640 v4. Only one VM is affectec, running win2k22 with sqlserver 2019. error happend 2 times in 8 Days. No workaround anabled yet. This Message is for you to determine the affectet CPUs. And I also use

I'd guess the error happens either inside the Ubuntu host, or at least due to it, as IIRC Ubuntu do not have any mitigations yet backported for this issue unlike the Proxmox kernel, so its kvm stack is still susceptible. Either use PVE in the nested VM or report the issue to Ubuntu, recommending the following, in development patch series:
https://lore.kernel.org/kvm/20220803155011.43721-1-mlevitsk@redhat.com/

rlcom974 · Aug 8, 2022

Hello

I have the same issue with Windows 2k22 server. i ve read that 5.15.39-3 kernel seems to solve this.

will it be release soon on enterprise repository ?

regards

Vincent

t.lamprecht · Aug 9, 2022

Hi,

experience with the package residing about two weeks on testing and one on no-subscription went good so far, so from that POV it would be now good to go for the enterprise.

But, upstream send out a slightly revised fix with some development feedback addressed, which we'd like to check out first. As the closer we're with the patch series that actually goes in upstream, the less friction potential is there on the longer run.

Testing that version with our reproducer will take a few hours, if that goes well we'll move that package a bit faster to the open repos, and as the change is relatively small compared to the last kernel build, it should get into the enterprise repo no later than start of next week, naturally only if nothing comes up. Depending on upstream feedback and also further feedback from the community here, we may be able to short cut that, but no promises here and if we're not very sure about impact potential we'll lean towards the slower and safer option. Anyhow, we'll keep you posted if that new kernel is available in public repos.

VM shutdown, KVM: entry failed, hardware error 0x80000021

New Member

Active Member

Well-Known Member

Well-Known Member

Member

Older Hardware and New 5.15 Kernel​

KVM: entry failed, hardware error 0x80000021​

Background​

Workaround: Disable tdp_mmu​

Proxmox Staff Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Well-Known Member

Well-Known Member

New Member

Proxmox Staff Member

Member

Proxmox Staff Member

We value your privacy

Older Hardware and New 5.15 Kernel

KVM: entry failed, hardware error 0x80000021

Background

Workaround: Disable tdp_mmu