VM shutdown, KVM: entry failed, hardware error 0x80000021

After 2 weeks, my Windows Server 2019 VM was shutdown again :(
Update: I am sorry, false alarm. Proxmox itself was rebooted because of a power loss and a second disk on a NFS was unavailable.
 
Last edited:
  • Like
Reactions: rursache
After 2 weeks, my Windows Server 2019 VM was shutdown again :(

Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.15.30-2-pve: 5.15.30-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
And did you follow the guide?
https://pve.proxmox.com/mediawiki/index.php?title=Upgrade_from_6.x_to_7.0&action=history
 
Giving it a go... Unpinned kernel 5.13.19-6-pve and set kvm.tdp_mmu=N.
I don't have a reliable way to recreate the issue, but my Windows 11 VM would usually dump on me within a few days time. I'll report back!
@Stoiko Ivanov
It's been about a week and I haven't had any problems with normal usage after applying the workaround.

Here's a breakdown of my setup:
  • ASUS ROG Strix Z490-E Gaming Motherboard (BIOS 2403 - 2021/12/16)
  • Intel Core i9-10850K Processor
  • G.Skill Ripjaws V 64 GB (4 x 16 GB) DDR4-3600 CL16 Memory
  • EVGA GeForce RTX 3060 Ti 8 GB XC GAMING Video Card
  • intel-microcode is installed
  • Latest updates from enterprise repository
  • Windows 11 VM with GPU passthrough used as "daily driver"
  • Mix of Ubuntu VMs and LCX Containers
 
  • Like
Reactions: Stoiko Ivanov
Thanks to all those who provided the fix and confirmed it's working for them. After repeatedly having random shutdowns also, the kvm.tdp_mmu=N setting worked for me.

Question - Does it mean this isn't an issue for anyone who's set up a new Proxmox 7.2 install, and only a problem for those who've upgraded from 6.x to 7?
 
Can confirm newest Kernel and kvm.tdp_mmu=N for older CPUs finally helps....
 
What's considered an older CPU? Also out of curiosity what does that setting do?
I think the statement about older CPUs is misleading. I have 11th gen Intel host with this issue, yet another system with 9th gen Intel is fine.
kvm.tdp_mmu=N seems to have helped though. It's been 2 days now.

Somehow CPU Usage went up a little though, after applying kvm.tdp_mmu=N . Not sure if it's related, but it happened right at the time when I went back to 5.15* kernel from 5.13 and applied kvm.tdp_mmu=N workaround
1657315948795.png
 
Last edited:
I think the statement about older CPUs is misleading. I have 11th gen Intel host with this issue, yet another system with 9th gen Intel is fine.
kvm.tdp_mmu=N seems to have helped though. It's been 2 days now.

Somehow CPU Usage went up a little though, after applying kvm.tdp_mmu=N . Not sure if it's related, but it happened right at the time when I went back to 5.15* kernel from 5.13 and applied kvm.tdp_mmu=N workaround
View attachment 38790

Interesting, good noticing the higher CPU usage. I didn't pay attention to if there's a difference in that
 
  • Like
Reactions: moonman
From kernel PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) no more crash happend :D
With lvm.tcp_mmu=? ??? Only new Kernel or something else?
 
With lvm.tcp_mmu=? ??? Only new Kernel or something else?
Only with new kernel.
My machine setting is pc-q35-5.2 I dont know if version 6 work too.

I see in changelog of kernel and backup apk fix io_uring in backup and I think it is possible fix the bug.

Every day crash and now is running from 10 days ago
 
Can confirm newest Kernel and kvm.tdp_mmu=N for older CPUs finally helps....
Still stable.... so it looks this or PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) finally fixes this.....
 
  • Like
Reactions: Sp00nman
I also have this issue with 2 up-2-date Dell R720's in a cluster. Occasionally some VM's would fail with that error
1657756892582.png
This only happens to certain VM's and not all Windows VM's Running Xeon E5-2640 and E5-2643's in a cluster with CEPH
I will try applying that kvm.tdp_mmu=N Option and see if that would fix it
 
Last edited:
  • Like
Reactions: Hotsticker