VM shutdown, KVM: entry failed, hardware error 0x80000021

Thanks to all those who provided the fix and confirmed it's working for them. After repeatedly having random shutdowns also, the kvm.tdp_mmu=N setting worked for me.

Question - Does it mean this isn't an issue for anyone who's set up a new Proxmox 7.2 install, and only a problem for those who've upgraded from 6.x to 7?
 
Can confirm newest Kernel and kvm.tdp_mmu=N for older CPUs finally helps....
 
What's considered an older CPU? Also out of curiosity what does that setting do?
I think the statement about older CPUs is misleading. I have 11th gen Intel host with this issue, yet another system with 9th gen Intel is fine.
kvm.tdp_mmu=N seems to have helped though. It's been 2 days now.

Somehow CPU Usage went up a little though, after applying kvm.tdp_mmu=N . Not sure if it's related, but it happened right at the time when I went back to 5.15* kernel from 5.13 and applied kvm.tdp_mmu=N workaround
1657315948795.png
 
Last edited:
I think the statement about older CPUs is misleading. I have 11th gen Intel host with this issue, yet another system with 9th gen Intel is fine.
kvm.tdp_mmu=N seems to have helped though. It's been 2 days now.

Somehow CPU Usage went up a little though, after applying kvm.tdp_mmu=N . Not sure if it's related, but it happened right at the time when I went back to 5.15* kernel from 5.13 and applied kvm.tdp_mmu=N workaround
View attachment 38790

Interesting, good noticing the higher CPU usage. I didn't pay attention to if there's a difference in that
 
  • Like
Reactions: moonman
From kernel PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) no more crash happend :D
With lvm.tcp_mmu=? ??? Only new Kernel or something else?
 
With lvm.tcp_mmu=? ??? Only new Kernel or something else?
Only with new kernel.
My machine setting is pc-q35-5.2 I dont know if version 6 work too.

I see in changelog of kernel and backup apk fix io_uring in backup and I think it is possible fix the bug.

Every day crash and now is running from 10 days ago
 
Can confirm newest Kernel and kvm.tdp_mmu=N for older CPUs finally helps....
Still stable.... so it looks this or PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) finally fixes this.....
 
  • Like
Reactions: Sp00nman
I also have this issue with 2 up-2-date Dell R720's in a cluster. Occasionally some VM's would fail with that error
1657756892582.png
This only happens to certain VM's and not all Windows VM's Running Xeon E5-2640 and E5-2643's in a cluster with CEPH
I will try applying that kvm.tdp_mmu=N Option and see if that would fix it
 
Last edited:
  • Like
Reactions: Hotsticker
Did you get an error when those shut down?
 
Bad news - over the last few days i have had 3 separate instances of win 2022 vms & 1 win 11 vm shutting themselves down with tdp_mmu=N configured on my lab 3 node cluster running kernel 5.15.39-1.
Shutting themselves down or crashing with KVM: entry failed, hardware error 0x80000021 are two quite different things, if the PVE host's kernel log doesn't show said error it is definitively another issue and should go in its own thread.
 
  • Like
Reactions: rursache