VM shutdown, KVM: entry failed, hardware error 0x80000021

alfe · Jul 16, 2022

itNGO said:
And did you follow the guide?
https://pve.proxmox.com/mediawiki/index.php?title=Upgrade_from_6.x_to_7.0&action=history

Setting tdp_mmu=N fixes the issue with the newest kernel 5.15 and Windows 11 Insider Builds.

Sp00nman · Jul 19, 2022

alfe said:
Settingtdp_mmu=N fixes the issue with the newest kernel 5.15 and Windows 11 Insider Builds.

Bad news - over the last few days i have had 3 separate instances of win 2022 vms & 1 win 11 vm shutting themselves down with tdp_mmu=N configured on my lab 3 node cluster running kernel 5.15.39-1.

Going to repin kernel 5.13 for now

mira · Jul 19, 2022

Did you get an error when those shut down?

t.lamprecht · Jul 19, 2022

Sp00nman said:
Bad news - over the last few days i have had 3 separate instances of win 2022 vms & 1 win 11 vm shutting themselves down with tdp_mmu=N configured on my lab 3 node cluster running kernel 5.15.39-1.

Shutting themselves down or crashing with KVM: entry failed, hardware error 0x80000021 are two quite different things, if the PVE host's kernel log doesn't show said error it is definitively another issue and should go in its own thread.

proteus · Jul 19, 2022

20+ servers with tdp_mmu=N and no crashes, definitely other problem.

Sp00nman · Jul 19, 2022

t.lamprecht said:
Shutting themselves down or crashing with KVM: entry failed, hardware error 0x80000021 are two quite different things, if the PVE host's kernel log doesn't show said error it is definitively another issue and should go in its own thread.

Apologies I should have been more specific defo crashing:

root@prox-lab-host-01:~# uname -a
Linux prox-lab-host-01 5.15.39-1-pve #1 SMP PVE 5.15.39-1 (Wed, 22 Jun 2022 17:22:00 +0200) x86_64 GNU/Linux
root@prox-lab-host-01:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt options kvm tdp_mmu=N

root@prox-lab-host-01:~# cat /var/log/syslog | grep 0x80000021
Jul 19 11:48:48 prox-lab-host-01 QEMU[6807]: KVM: entry failed, hardware error 0x80000021

root@prox-lab-host-02:~# cat /var/log/syslog | grep 0x80000021
Jul 18 15:52:54 prox-lab-host-02 QEMU[43162]: KVM: entry failed, hardware error 0x80000021

alfe · Jul 19, 2022

Sp00nman said:
Bad news - over the last few days i have had 3 separate instances of win 2022 vms & 1 win 11 vm shutting themselves down with tdp_mmu=N configured on my lab 3 node cluster running kernel 5.15.39-1.

Going to repin kernel 5.13 for now

What type of CPU? It works on a Intel(R) Xeon(R) CPU E5-2660 v2 at least, on
Linux 5.15.35-2-pve #1 SMP PVE 5.15.35-5 (Wed, 08 Jun 2022 15:02:51 +0200) x86_64 GNU/Linux.

Sp00nman · Jul 19, 2022

alfe said:
What type of CPU? It works on a Intel(R) Xeon(R) CPU E5-2660 v2 at least, on
Linux 5.15.35-2-pve #1 SMP PVE 5.15.35-5 (Wed, 08 Jun 2022 15:02:51 +0200) x86_64 GNU/Linux.

My lab hosts are all single socket Intel Xeon E-2186G (12) @ 4.700GHz

fransesco · Jul 19, 2022

"echo "options kvm tdp_mmu=N" >/etc/modprobe.d/kvm-disable-tdp-mmu.conf" added here as instructed at https://pve.proxmox.com/mediawiki/i...ldid=11400#Older_Hardware_and_New_5.15_Kernel

running Win11 on Xeon E5-2620 v3 with Linux 5.15.39-1-pve and no issues so far. Looking good. (fingers crossed, knock on the wood etc)

Hotsticker · Jul 20, 2022

I came across an interesting discovery. I have 2 Proxmox servers with identical CPUs, one had Windows 2022 server VMs randomly crashing on it, sometimes happening a few times a day. I updated that one with tdp_mmu=N and the crashes stopped. A few weeks later, I needed to migrate the VMs from that server to another one, which happened to have the same identical CPU, but this Proxmox server didn't have the ttdp_mmu=N option set. What's interesting is, the VMs have been running stable without any crashes for a few weeks now on this new Proxmox server.

The only difference between the 2 servers is that one was upgraded to Proxmox v7 from v6, while the other was a brand new install of v7. The one upgraded from v6 had random crashes happening. I was puzzled by the difference but was surprised to find everything is running ok, and running well, on the brand new install of v7 without tdp_mmu=N set. I'm not sure why, but that's what I found

bingsin · Jul 20, 2022

I agree with you.
We have a Huawei h1288V3(c612) server in company.I can't install pve7 with iso installer, So,I had installed 5.4 and upgrade to 7.0. In four months, it works well.
Yesterday,I upgraded to 7.1 and recived this error.
The same as you. I have two home server which has new installed 7.1 and runing well.

Hotsticker said:
I came across an interesting discovery. I have 2 Proxmox servers with identical CPUs, one had Windows 2022 server VMs randomly crashing on it, sometimes happening a few times a day. I updated that one with tdp_mmu=N and the crashes stopped. A few weeks later, I needed to migrate the VMs from that server to another one, which happened to have the same identical CPU, but this Proxmox server didn't have the ttdp_mmu=N option set. What's interesting is, the VMs have been running stable without any crashes for a few weeks now on this new Proxmox server.

The only difference between the 2 servers is that one was upgraded to Proxmox v7 from v6, while the other was a brand new install of v7. The one upgraded from v6 had random crashes happening. I was puzzled by the difference but was surprised to find everything is running ok, and running well, on the brand new install of v7 without tdp_mmu=N set. I'm not sure why, but that's what I found

Stoiko Ivanov · Jul 20, 2022

Sp00nman said:
kvm tdp_mmu=N

if you add the option to the kernel commandline (as opposed to adding it in a file in /etc/modprobe.d) you need to put a dot between module name and option: kvm.tdp_mmu=N
you can verify that the setting is set correctly with

Code:

cat /sys/module/kvm/parameters/tdp_mmu

I hope this helps!

Sp00nman · Jul 20, 2022

Stoiko Ivanov said:
if you add the option to the kernel commandline (as opposed to adding it in a file in /etc/modprobe.d) you need to put a dot between module name and option: kvm.tdp_mmu=N
you can verify that the setting is set correctly with

Code:

cat /sys/module/kvm/parameters/tdp_mmu

I hope this helps!

Hi Stoiko -

You are 100% correct - the kvm.tdp_mmu=N flag wasnt set correctly in /etc/kernel/cmdline

I have now rectified the problem and will monitor:

root@prox-lab-host-02:~# cat /sys/module/kvm/parameters/tdp_mmu
N

Thanks for your assistance

ToqQrrl · Jul 23, 2022

I seem to have the same problem on one of 2 Windows 11 Pro VMs.

The one that has the problem has the following features enabled:

- Virtual Machine Platform
- Windows Hypervisor Platform
- Windows Subsystem for Linux

And the VM that is stable doesn't have those 3 features enabled.

I will soon uninstall those 3 features on the faulty Windows 11 Pro VM and report back in a couple of days.

I will not be disabling -- tdp_mmu -- on my Proxmox server because I want to check if it's really those three features that cause the problem ... because right now those 3 features are the only difference between the 2 Windows 11 Pro VMs.

My Proxmox Host runs on:

- Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz
- Linux 5.15.39-1-pve #1 SMP PVE 5.15.39-1
- 128 GB RAM
- 2 TB Samsung 980 Pro NVME

mishki · Jul 24, 2022

Are there any changes to the issue on 5.15.39-2-pve?

t.lamprecht · Jul 25, 2022

mishki said:
Are there any changes to the issue on 5.15.39-2-pve?

No, currently you need to disable two-dimensional paging for the MMU (tdp_mmu) manually if your setup is affected, or better first check that you have the newest bios/firmware and CPU microcode installed, as then you may not even require the workaround anymore.

It's not an easy choice, but that TDP feature brings non-negligible performance gain and works well in most HW (more likely if released in the last 8 to maybe even 10 years), so we want to opt in to that sooner or later anyway. We'll still try to find a better way of making this transparent, as we naturally understand that users ideally wouldn't have to do anything.

rursache · Jul 25, 2022

I'm also getting constant freezes and lockups on one of my Linux VMs. However I don't see

KVM: entry failed, hardware error 0x80000021

in logs but the result is the same as people experience in this thread.

I made a separate thread with all the details of the problem here.

TomFIT · Jul 25, 2022

Hi,
I just want to add to this thread:
I'm having the 0x80000021 problem on a new server too.
Only the VM running Windows Server 2022 is crashing, VMs with Windows 10 Pro or Debian had no problems.
BIOS-Update didn't help. Had to disable tdp_mmu as workaround.

Server
20 x Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz (1 Socket), 48 GB RAM
Linux 5.15.39-1-pve #1 SMP PVE 5.15.39-1 (Wed, 22 Jun 2022 17:22:00 +0200)
pve-manager/7.2-7/d0dd0e85
PVE+VM-Storage is a ZFS (Mirror) on 2x Enterprise SSD

VM:
Windows 2022 Server Std, 16 GB RAM, 8 CPUs

After disabling tdp_mmu the Server-VM is running now for a week without crashing.

Best regards,
Tom

t.lamprecht · Jul 25, 2022

TomFIT said:
20 x Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz (1 Socket), 48 GB RAM

Many thanks for such information, this can be valuable on nailing the actual range of models possibly affected and also possibly the underlying issue that could help in either avoiding the bug or atleast automatically disable the new feature.

That said, this is two years old consumer HW, Definitively not bad for home lab usage and also not old, but not exactly new either.
Note also that Comet Lake is also the last iteration of the Sky Lake micro architecture, so it's based on a bit older design (even if it def. went through quite a few evolutionary changes and improvements), so it could seem like all/more of the Sky Lake derived (consumer?) models (6th to 10th gen) may be affected.

ToqQrrl · Jul 25, 2022

ToqQrrl said:
I seem to have the same problem on one of 2 Windows 11 Pro VMs.

The one that has the problem has the following features enabled:

- Virtual Machine Platform
- Windows Hypervisor Platform
- Windows Subsystem for Linux

And the VM that is stable doesn't have those 3 features enabled.

I will soon uninstall those 3 features on the faulty Windows 11 Pro VM and report back in a couple of days.

I will not be disabling -- tdp_mmu -- on my Proxmox server because I want to check if it's really those three features that cause the problem ... because right now those 3 features are the only difference between the 2 Windows 11 Pro VMs.

My Proxmox Host runs on:

- Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz
- Linux 5.15.39-1-pve #1 SMP PVE 5.15.39-1
- 128 GB RAM
- 2 TB Samsung 980 Pro NVME

I'm replying to my own message to do a follow up for you guys.

I haven't had another 0x80000021 crash since removing the 3 windows features mentioned I'm my original post.

So I think that the two-dimensional paging problem (TDP) is linked to those windows features ... it's probably all related to nesting virtualization that is causing problems to TDP in the 5.15.x linux kernel.

I'll report back again in a couple of days.

My Proxmox Host runs on:

- Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz
- Linux 5.15.39-1-pve #1 SMP PVE 5.15.39-1
- 128 GB RAM
- 2 TB Samsung 980 Pro NVME

VM shutdown, KVM: entry failed, hardware error 0x80000021

Active Member

Active Member

Proxmox Staff Member

Proxmox Staff Member

Member

Active Member

Active Member

Active Member

Member

Member

Member

Proxmox Staff Member

Active Member

New Member

Well-Known Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

New Member

We value your privacy