VM shutdown, KVM: entry failed, hardware error 0x80000021

macpip · Jun 29, 2022

t.lamprecht said:
Did you tried the workaround that addresses our reproducer for this?

Thank you! Have I to do this on all thee nodes, or may I just test the node where the affected VM runs?

Stoiko Ivanov · Jun 29, 2022

macpip said:
Thank you! Have I to do this on all thee nodes, or may I just test the node where the affected VM runs?

If the clusternodes have similar/identical hardware I would recommend disabling tdp_mmu on all of them (by setting it in the kernelcommandline or in /etc/modprobe.d/ and rebooting afterwards) .

macpip · Jun 29, 2022

t.lamprecht said:
Did you tried the workaround that addresses our reproducer for this? Namely:

As you suggested, I've tried the workaround:

vi /etc/kernel/cmdline
  root=ZFS=rpool/ROOT/pve-1 boot=zfs kvm.tdp_mmu=N
proxmox-boot-tool refresh
reboot
cat /sys/module/kvm/parameters/tdp_mmu
  N

The installation completed at the first attempt with no issues!!!

engelant · Jul 2, 2022

Could it be that this patch here is related?
Seems to be merged in v5.18.

nick.kopas · Jul 2, 2022

Stoiko Ivanov said:
Could someone who is affected by the `KVM: entry failed, hardware error 0x80000021` issue please try setting the:
tdp_mmu module parameter for kvm to 'N'

Giving it a go... Unpinned kernel 5.13.19-6-pve and set kvm.tdp_mmu=N.
I don't have a reliable way to recreate the issue, but my Windows 11 VM would usually dump on me within a few days time. I'll report back!

t.lamprecht · Jul 2, 2022

engelant said:
Could it be that this patch here is related?
Seems to be merged in v5.18.

We tried internal builds of 5.18 a while ago to test for a possible fix in newer kernel versions, and booting with 5.18 wouldn't fix (or improve) triggering our reproducer, so I'd figure no.

I also stumbled upon this patch when searching LKML for related things, but from the commit message it also reads like the issue fixed there would emit the message
cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e)
which I cannot remeber to have seen, neither in our tests nor in posted logs here.

Altrove · Jul 2, 2022

Hi,
Unfortunately yesterday on a new proxmox VE 7.2.4 with only one VM a new Windows server 2022 same problem ... crash of the VM found turned off for no reason, the server is a new Dell T340 with Perc 330 controller .... now I install the old kernel and I pray it won't happen again as they are servers in production! Why can't this serious problem be solved?
I attach below package versions and part of the syslog, if you need anything else let me know!
... Maybe I repeat myself but this problem is more than a month that we have it from all our customers who have an updated Proxmox VE and who have Windows 2022 servers ... and there are 4 different installations on different servers ... like can we fix it apart from using the old kernel? thank you.

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.30-2-pve: 5.15.30-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Code:

Jul 01 19:07:26 matrasprox QEMU[79174]: KVM: entry failed, hardware error 0x80000021
Jul 01 19:07:26 matrasprox kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
Jul 01 19:07:26 matrasprox QEMU[79174]: If you're running a guest on an Intel machine without unrestricted mode
Jul 01 19:07:26 matrasprox QEMU[79174]: support, the failure can be most likely due to the guest entering an invalid
Jul 01 19:07:26 matrasprox QEMU[79174]: state for Intel VT. For example, the guest maybe running in big real mode
Jul 01 19:07:26 matrasprox QEMU[79174]: which is not supported on less recent Intel processors.
Jul 01 19:07:26 matrasprox QEMU[79174]: EAX=000022e2 EBX=63d9e180 ECX=00000001 EDX=00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: ESI=bc81b140 EDI=63daa340 EBP=00000000 ESP=65453d40
Jul 01 19:07:26 matrasprox QEMU[79174]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
Jul 01 19:07:26 matrasprox QEMU[79174]: ES =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: CS =b600 7ffb6000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: SS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: DS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: FS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: GS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: LDT=0000 00000000 000fffff 00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: TR =0040 63dad000 00000067 00008b00
Jul 01 19:07:26 matrasprox QEMU[79174]: GDT=     63daefb0 00000057
Jul 01 19:07:26 matrasprox QEMU[79174]: IDT=     00000000 00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: CR0=00050032 CR2=39f33dcc CR3=001ae000 CR4=00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Jul 01 19:07:26 matrasprox QEMU[79174]: DR6=00000000ffff0ff0 DR7=0000000000000400
Jul 01 19:07:26 matrasprox QEMU[79174]: EFER=0000000000000000
Jul 01 19:07:26 matrasprox QEMU[79174]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 2(tap1010i0) entered disabled state
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 2(tap1010i0) entered disabled state
Jul 01 19:07:26 matrasprox systemd[1]: 1010.scope: Succeeded.
Jul 01 19:07:26 matrasprox systemd[1]: 1010.scope: Consumed 4min 45.838s CPU time.
Jul 01 19:07:26 matrasprox qmeventd[81331]: Starting cleanup for 1010
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 1(fwln1010i0) entered disabled state
Jul 01 19:07:26 matrasprox kernel: vmbr0: port 2(fwpr1010p0) entered disabled state
Jul 01 19:07:26 matrasprox kernel: device fwln1010i0 left promiscuous mode
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 1(fwln1010i0) entered disabled state
Jul 01 19:07:27 matrasprox kernel: device fwpr1010p0 left promiscuous mode
Jul 01 19:07:27 matrasprox kernel: vmbr0: port 2(fwpr1010p0) entered disabled state
Jul 01 19:07:27 matrasprox qmeventd[81331]: Finished cleanup for 1010

And here attached the conf of VM and the error in Event Viewer on Windows Server 2022 after boot

itNGO · Jul 2, 2022

Altrove said:

Hi,
Unfortunately yesterday on a new proxmox VE 7.2.4 with only one VM a new Windows server 2022 same problem ... crash of the VM found turned off for no reason, the server is a new Dell T340 with Perc 330 controller .... now I install the old kernel and I pray it won't happen again as they are servers in production! Why can't this serious problem be solved?
I attach below package versions and part of the syslog, if you need anything else let me know!
... Maybe I repeat myself but this problem is more than a month that we have it from all our customers who have an updated Proxmox VE and who have Windows 2022 servers ... and there are 4 different installations on different servers ... like can we fix it apart from using the old kernel? thank you.

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.30-2-pve: 5.15.30-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Code:

Jul 01 19:07:26 matrasprox QEMU[79174]: KVM: entry failed, hardware error 0x80000021
Jul 01 19:07:26 matrasprox kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
Jul 01 19:07:26 matrasprox QEMU[79174]: If you're running a guest on an Intel machine without unrestricted mode
Jul 01 19:07:26 matrasprox QEMU[79174]: support, the failure can be most likely due to the guest entering an invalid
Jul 01 19:07:26 matrasprox QEMU[79174]: state for Intel VT. For example, the guest maybe running in big real mode
Jul 01 19:07:26 matrasprox QEMU[79174]: which is not supported on less recent Intel processors.
Jul 01 19:07:26 matrasprox QEMU[79174]: EAX=000022e2 EBX=63d9e180 ECX=00000001 EDX=00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: ESI=bc81b140 EDI=63daa340 EBP=00000000 ESP=65453d40
Jul 01 19:07:26 matrasprox QEMU[79174]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
Jul 01 19:07:26 matrasprox QEMU[79174]: ES =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: CS =b600 7ffb6000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: SS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: DS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: FS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: GS =0000 00000000 ffffffff 00809300
Jul 01 19:07:26 matrasprox QEMU[79174]: LDT=0000 00000000 000fffff 00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: TR =0040 63dad000 00000067 00008b00
Jul 01 19:07:26 matrasprox QEMU[79174]: GDT=     63daefb0 00000057
Jul 01 19:07:26 matrasprox QEMU[79174]: IDT=     00000000 00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: CR0=00050032 CR2=39f33dcc CR3=001ae000 CR4=00000000
Jul 01 19:07:26 matrasprox QEMU[79174]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Jul 01 19:07:26 matrasprox QEMU[79174]: DR6=00000000ffff0ff0 DR7=0000000000000400
Jul 01 19:07:26 matrasprox QEMU[79174]: EFER=0000000000000000
Jul 01 19:07:26 matrasprox QEMU[79174]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 2(tap1010i0) entered disabled state
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 2(tap1010i0) entered disabled state
Jul 01 19:07:26 matrasprox systemd[1]: 1010.scope: Succeeded.
Jul 01 19:07:26 matrasprox systemd[1]: 1010.scope: Consumed 4min 45.838s CPU time.
Jul 01 19:07:26 matrasprox qmeventd[81331]: Starting cleanup for 1010
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 1(fwln1010i0) entered disabled state
Jul 01 19:07:26 matrasprox kernel: vmbr0: port 2(fwpr1010p0) entered disabled state
Jul 01 19:07:26 matrasprox kernel: device fwln1010i0 left promiscuous mode
Jul 01 19:07:26 matrasprox kernel: fwbr1010i0: port 1(fwln1010i0) entered disabled state
Jul 01 19:07:27 matrasprox kernel: device fwpr1010p0 left promiscuous mode
Jul 01 19:07:27 matrasprox kernel: vmbr0: port 2(fwpr1010p0) entered disabled state
Jul 01 19:07:27 matrasprox qmeventd[81331]: Finished cleanup for 1010

And here attached the conf of VM and the error in Event Viewer on Windows Server 2022 after boot

You can still revert to older Kernel, which is a valid workaround and wait for the fix without having problems until it is released.

5.13.19-6 helps and it does not hurt to use it for several more months...

No need to cry, this can take months to fix. Even largest companies have problems getting some problems fixed after 6 months.... Remember Print Nightmare?

Ad3t0 · Jul 3, 2022

Hello,

If I still get this error randomly even after 3 days uptime for Windows Server 2022 pinning the 5.13.19-6 kernel running on a Dell R620 what does that mean for me?

Thanks for anyone's help!

Petr Svacina · Jul 3, 2022

Pinning kernel:

Code:

proxmox-boot-tool kernel pin 5.13.19-6-pve

Did not force the kernel 5.13.19-6-pve to be booted first .. So check, If you REALY run this kernel after reboot ...

Ad3t0 · Jul 4, 2022

Kernel Version

Linux 5.13.19-6-pve #1 SMP PVE 5.13.19-15 (Tue, 29 Mar 2022 15:59:50 +0200)

PVE Manager Version

pve-manager/7.2-5/12f1e639

This is what I see under the summary for my booted node

Petr Svacina · Jul 4, 2022

Hi all, we have many PVE servers. Recently I have upgraded all of them to latest PVE 7.2.4, 7.2.5 among with the latest kernel 5.15.
All the servers have Xeon(R) Silver 41XX processors, and we have NO issue with VM destroy mentioned here. Servers are mostly Supermicro or HP.

But we also have PVE cluster with Intel(R) Xeon(R) Gold 5218 CPU and on this cluster we were forced to try downgrade to 5.13.19-6-pve kernel, because the same error mentioned here ... Servers are Supermicro ...

And we have one Intel(R) Xeon(R) Gold 6226R CPU Supermicro server, which is also seems to be rock stable ...

Can this help ? Do anyone have Intel Xeon(R) Silver 41XX CPU with this issue ?

Thanks

alfred.johansen · Jul 4, 2022

Petr Svacina said:
Hi all, we have many PVE servers. Recently I have upgraded all of them to latest PVE 7.2.4, 7.2.5 among with the latest kernel 5.15.
All the servers have Xeon(R) Silver 41XX processors, and we have NO issue with VM destroy mentioned here. Servers are mostly Supermicro or HP.

But we also have PVE cluster with Intel(R) Xeon(R) Gold 5218 CPU and on this cluster we were forced to try downgrade to 5.13.19-6-pve kernel, because the same error mentioned here ... Servers are Supermicro ...

And we have one Intel(R) Xeon(R) Gold 6226R CPU Supermicro server, which is also seems to be rock stable ...

Can this help ? Do anyone have Intel Xeon(R) Silver 41XX CPU with this issue ?

Thanks

Hi,

We experience the same issue on a NODE with this CPU: Intel(R) Xeon(R) Silver 4215 CPU

Petr Svacina · Jul 4, 2022

alfred.johansen said:
Hi,

We experience the same issue on a NODE with this CPU: Intel(R) Xeon(R) Silver 4215 CPU

OK. To be more specific, we use for example 4210, which is the same "Cascade Lake " like your 4215 ... Ok. Thanks for the info ...

Altrove · Jul 4, 2022

Petr Svacina said:
Hi all, we have many PVE servers. Recently I have upgraded all of them to latest PVE 7.2.4, 7.2.5 among with the latest kernel 5.15.
All the servers have Xeon(R) Silver 41XX processors, and we have NO issue with VM destroy mentioned here. Servers are mostly Supermicro or HP.

But we also have PVE cluster with Intel(R) Xeon(R) Gold 5218 CPU and on this cluster we were forced to try downgrade to 5.13.19-6-pve kernel, because the same error mentioned here ... Servers are Supermicro ...

And we have one Intel(R) Xeon(R) Gold 6226R CPU Supermicro server, which is also seems to be rock stable ...

Can this help ? Do anyone have Intel Xeon(R) Silver 41XX CPU with this issue ?

Thanks

Hi!
I checked on the last 3 Proxmox servers in which I have access and they are two of the Intel Silver 4208 and instead the third is an E-2236, all three servers have the problem only on Windows 2022 servers, no problems with other VMs, I attach screen shots of the 3 CPUs

have a nice day!

macpip · Jul 4, 2022

Petr Svacina said:
Hi all, we have many PVE servers. Recently I have upgraded all of them to latest PVE 7.2.4, 7.2.5 among with the latest kernel 5.15.
All the servers have Xeon(R) Silver 41XX processors, and we have NO issue with VM destroy mentioned here. Servers are mostly Supermicro or HP.

But we also have PVE cluster with Intel(R) Xeon(R) Gold 5218 CPU and on this cluster we were forced to try downgrade to 5.13.19-6-pve kernel, because the same error mentioned here ... Servers are Supermicro ...

And we have one Intel(R) Xeon(R) Gold 6226R CPU Supermicro server, which is also seems to be rock stable ...

Can this help ? Do anyone have Intel Xeon(R) Silver 41XX CPU with this issue ?

Thanks

Nodes of my previous posts where the problem happened frequently:
Model name: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz

Work around applied on 2022.09.29 20:30.

macpip said:
kvm.tdp_mmu=N

Code:

~# uname -a
Linux pve02 5.15.35-3-pve #1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) x86_64 GNU/Linux
~# cat /sys/module/kvm/parameters/tdp_mmu
N

As you can see from the attached image, VM 921 stopped frequently (see all the start without a stop). After kvm.tdp_mmu=N did never stop again.
Furthermore a new Windows 2022 Server Std. that I was not able to install before the work around, installed ad is working fine.

On another node in another cluster with CPU
Model name: Intel(R) Xeon(R) Bronze 3106 CPU @ 1.70GHz
and work around not aplied, I've been experiencing the issue just about 3 times in the last 20 days on a freebsd VM.

I hope the info would help. Please tell me if I can do something to help resolving the issue.

piggie-mickie · Jul 5, 2022

piggie-mickie said:
Before I downgraded the kernel to 5.13.19-6, I got this issue regularly during backup on an Exchange 2016 server on Win2016, in the last days also during day (no running backup task).

My home lab is a Z390 chipset (Gigabyte) with Intel I7 8700 (coffeelake), 128 GB RAM and 2x2 TB NVME. There are 24 VM running on it, but only the Exchange VM and sometimes a Windows 11 insider VM were affected by this issue. All other machines have never crashed so far (Windows 2008R2, 2016/19 and 22, Win7, Win11, FreeBSD, Ubuntu, Debian, Nested ESXi 6.7 with MacOS and Android VM).

Now with the older kernel version all is running stable for the last week.

small update: Last week I have unpinned kernel 5.13.19-6-pve and switched back to the last kernel version but together with the option kvm tdp_mmu=N and rebooted as suggested.
So far no issues, no VM crashed.
A small side effect: after enabling the option kvm tdp_mmu=N I had to switch the CPU type of my VMs from Skylake-Client to older IvyBridge, since newer aren't supported with this setting.

mira · Jul 5, 2022

Thank you @macpip and @piggie-mickie for the feedback!

David Herselman · Jul 6, 2022

Herewith confirmation as well that running with TDP_MMU disabled resolves this problem for us. It appears to primarily affect Windows 2019 and Windows 2022 hosts. Also took us a while to identify as most VMs got hit by this after hours, when they are more idle than during office hours. The restarts were attributed to VMs restarting to install Windows updates whereas searching for 'hardware error' revealed the intermittend and random pattern:

for f in /var/log/syslog*; do zgrep 'hardware error' $f; done | sort -k1M -k2n -k3

I see the following notes in the PVE 6 to 7 upgrade notes, not sure how long this has been there for:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0

I can only presume that this issue became more prevalent with the latest series of Intel microcode updates which were released in response to additional Spectre vulnerabilities and mitigations from May 2022...

itNGO · Jul 6, 2022

David Herselman said:
Herewith confirmation as well that running with TDP_MMU disabled resolves this problem for us. It appears to primarily affect Windows 2019 and Windows 2022 hosts. Also took us a while to identify as most VMs got hit by this after hours, when they are more idle than during office hours. The restarts were attributed to VMs restarting to install Windows updates whereas searching for 'hardware error' revealed the intermittend and random pattern:
View attachment 38725
for f in /var/log/syslog*; do zgrep 'hardware error' $f; done | sort -k1M -k2n -k3

I see the following notes in the PVE 6 to 7 upgrade notes, not sure how long this has been there for:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0

View attachment 38724

I can only presume that this issue became more prevalent with the latest series of Intel microcode updates which were released in response to additional Spectre vulnerabilities and mitigations from May 2022...

After reading this and confirming that we have on our older Intel CPUs the Microcode-Addon installed and set " /etc/modprobe.d/intel-microcode-blacklist.conf" to NOT blacklist, we detected that we had one node where it was missed to configure it right.

We will unpin Kernel, correct the settings and give it another go tonight with latest Enterprise-Repository-Kernel....

VM shutdown, KVM: entry failed, hardware error 0x80000021

Active Member

Proxmox Staff Member

Active Member

Renowned Member

Active Member

Proxmox Staff Member

Renowned Member

Attachments

Famous Member

Member

Well-Known Member

Member

Well-Known Member

New Member

Well-Known Member

Renowned Member

Attachments

Active Member

Attachments

Member

Proxmox Staff Member

Renowned Member

Famous Member

We value your privacy