VM shutdown, KVM: entry failed, hardware error 0x80000021

After 2 weeks, my Windows Server 2019 VM was shutdown again :(
Update: I am sorry, false alarm. Proxmox itself was rebooted because of a power loss and a second disk on a NFS was unavailable.
 
Last edited:
  • Like
Reactions: rursache
After 2 weeks, my Windows Server 2019 VM was shutdown again :(

Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.15.30-2-pve: 5.15.30-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
And did you follow the guide?
https://pve.proxmox.com/mediawiki/index.php?title=Upgrade_from_6.x_to_7.0&action=history
 
Giving it a go... Unpinned kernel 5.13.19-6-pve and set kvm.tdp_mmu=N.
I don't have a reliable way to recreate the issue, but my Windows 11 VM would usually dump on me within a few days time. I'll report back!
@Stoiko Ivanov
It's been about a week and I haven't had any problems with normal usage after applying the workaround.

Here's a breakdown of my setup:
  • ASUS ROG Strix Z490-E Gaming Motherboard (BIOS 2403 - 2021/12/16)
  • Intel Core i9-10850K Processor
  • G.Skill Ripjaws V 64 GB (4 x 16 GB) DDR4-3600 CL16 Memory
  • EVGA GeForce RTX 3060 Ti 8 GB XC GAMING Video Card
  • intel-microcode is installed
  • Latest updates from enterprise repository
  • Windows 11 VM with GPU passthrough used as "daily driver"
  • Mix of Ubuntu VMs and LCX Containers
 
  • Like
Reactions: Stoiko Ivanov
Thanks to all those who provided the fix and confirmed it's working for them. After repeatedly having random shutdowns also, the kvm.tdp_mmu=N setting worked for me.

Question - Does it mean this isn't an issue for anyone who's set up a new Proxmox 7.2 install, and only a problem for those who've upgraded from 6.x to 7?
 
Can confirm newest Kernel and kvm.tdp_mmu=N for older CPUs finally helps....
 
What's considered an older CPU? Also out of curiosity what does that setting do?
I think the statement about older CPUs is misleading. I have 11th gen Intel host with this issue, yet another system with 9th gen Intel is fine.
kvm.tdp_mmu=N seems to have helped though. It's been 2 days now.

Somehow CPU Usage went up a little though, after applying kvm.tdp_mmu=N . Not sure if it's related, but it happened right at the time when I went back to 5.15* kernel from 5.13 and applied kvm.tdp_mmu=N workaround
1657315948795.png
 
Last edited:
I think the statement about older CPUs is misleading. I have 11th gen Intel host with this issue, yet another system with 9th gen Intel is fine.
kvm.tdp_mmu=N seems to have helped though. It's been 2 days now.

Somehow CPU Usage went up a little though, after applying kvm.tdp_mmu=N . Not sure if it's related, but it happened right at the time when I went back to 5.15* kernel from 5.13 and applied kvm.tdp_mmu=N workaround
View attachment 38790

Interesting, good noticing the higher CPU usage. I didn't pay attention to if there's a difference in that
 
  • Like
Reactions: moonman
From kernel PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) no more crash happend :D
With lvm.tcp_mmu=? ??? Only new Kernel or something else?
 
With lvm.tcp_mmu=? ??? Only new Kernel or something else?
Only with new kernel.
My machine setting is pc-q35-5.2 I dont know if version 6 work too.

I see in changelog of kernel and backup apk fix io_uring in backup and I think it is possible fix the bug.

Every day crash and now is running from 10 days ago
 
Can confirm newest Kernel and kvm.tdp_mmu=N for older CPUs finally helps....
Still stable.... so it looks this or PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) finally fixes this.....
 
  • Like
Reactions: Sp00nman
I also have this issue with 2 up-2-date Dell R720's in a cluster. Occasionally some VM's would fail with that error
1657756892582.png
This only happens to certain VM's and not all Windows VM's Running Xeon E5-2640 and E5-2643's in a cluster with CEPH
I will try applying that kvm.tdp_mmu=N Option and see if that would fix it
 
Last edited:
  • Like
Reactions: Hotsticker

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!