Older Hardware and New 5.15 Kernel
KVM: entry failed, hardware error 0x80000021
Background
With the 5.15 kernel, the two-dimensional paging (TDP) memory management unit (MMU) implementation got activated by default. The new implementation reduces the complexity of mapping the guest OS virtual memory address to the host's physical memory address and improves performance, especially during live migrations for VMs with a lot of memory and many CPU cores. However, the new TDP MMU feature has been shown to cause regressions on some (mostly) older hardware, likely due to assumptions about when the fallback is required not being met by that HW.
The problem manifests as crashes of the machine with a kernel (dmesg) or journalctl log entry with, among others, a line like this:
KVM: entry failed, hardware error 0x80000021
Normally there's also an assert error message logged from the QEMU process around the same time. Windows VMs are the most commonly affected in the user reports.
The affected models could not get pinpointed exactly, but it seems CPUs launched over 8 years ago are most likely triggering the issue. Note that there are known cases where updating to the latest available firmware (BIOS/EFI) and CPU microcode fixed the regression. Thus, before trying the workaround below, we recommend ensuring that you have the latest firmware
and CPU microcode installed.
Workaround: Disable tdp_mmu
The tdp_mmu kvm module option can be used to force disabling the usage of the two-dimensional paging (TDP) MMU.
- You can either add that parameter to the PVE host's kernel command line as kvm.tdp_mmu=N, see this reference documentation section.
- Alternatively, set the module option using a modprobe config, for example:
echo "options kvm tdp_mmu=N" >/etc/modprobe.d/kvm-disable-tdp-mmu.conf
To finish applying the workaround,
always run update-initramfs -k all -u to update the initramfs for all kernels and
then reboot the Proxmox VE host.
You can confirm that the change is active by checking that the output ofcat /sys/module/kvm/parameters/tdp_mmu is N.