[BUG] TrueNAS VM fails to reboot, KVM: entry failed, hardware error 0x80000021

SteelBlade

Member
May 6, 2017
2
2
6
45
Hi
I'm moving some VMs from an old installation 5.4-2 to a fresh one 6.3-1.

I got an issue with a TrueNAS machine, in the old host it reboots normally, in the new one everytime the guest is rebooted it's status goes to "running (internal-error)" with yellow triangle, then the guest must be stopped and started again manually.
Does someone have an idea what could be the problem?

Some facts:
The error never happens at fresh boot of the guest or at graceful shutdown, only at reboot.
The guest is configured with cpu: host, using the default kvm64 it works as expected.

VM configuration:
Code:
bootdisk: scsi0
cores: 2
cpu: host
ide2: none,media=cdrom
memory: 16384
name: FreeNAS
net0: virtio=62:88:EC:CD:99:A2,bridge=vmbr2
numa: 0
ostype: other
scsi0: local:601/vm-601-disk-0.qcow2,discard=on,size=10G
scsihw: virtio-scsi-pci
smbios1: uuid=377ec163-fac3-4eea-b94a-ae341bee5a6e
sockets: 1
tablet: 0

Host 1 (old):
CPU: Intel Xeon E5-1630 v4
intel-microcode 3.20191115.2~deb9u1

lscpu:
Code:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70GHz
Stepping:              1
CPU MHz:               3543.630
CPU max MHz:           4000.0000
CPU min MHz:           1200.0000
BogoMIPS:              7399.40
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              10240K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

pveversion:
Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-25-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-13
pve-kernel-4.15.18-25-pve: 4.15.18-53
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-41
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-55
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

Host 2 (new):
CPU: Intel Xeon E5-1650 v3
intel-microcode 3.20200616.1~deb10u1

lscpu:
Code:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping:            2
CPU MHz:             3104.187
CPU max MHz:         3800.0000
CPU min MHz:         1200.0000
BogoMIPS:            7000.03
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            15360K
NUMA node0 CPU(s):   0-11
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d

pveversion:
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
pve-zsync: 2.0-4
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

the syslog show this error when the guest tries to reboot
Code:
Jan  8 15:29:07 hyper QEMU[13192]: KVM: entry failed, hardware error 0x80000021
Jan  8 15:29:07 hyper QEMU[13192]: If you're running a guest on an Intel machine without unrestricted mode
Jan  8 15:29:07 hyper QEMU[13192]: support, the failure can be most likely due to the guest entering an invalid
Jan  8 15:29:07 hyper QEMU[13192]: state for Intel VT. For example, the guest maybe running in big real mode
Jan  8 15:29:07 hyper QEMU[13192]: which is not supported on less recent Intel processors.
Jan  8 15:29:07 hyper QEMU[13192]: EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306f2
Jan  8 15:29:07 hyper QEMU[13192]: ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
Jan  8 15:29:07 hyper QEMU[13192]: EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
Jan  8 15:29:07 hyper QEMU[13192]: ES =0000 00000000 0000ffff 00009300
Jan  8 15:29:07 hyper QEMU[13192]: CS =f000 ffff0000 0000ffff 00009b00
Jan  8 15:29:07 hyper QEMU[13192]: SS =0000 00000000 0000ffff 00009300
Jan  8 15:29:07 hyper QEMU[13192]: DS =0000 00000000 0000ffff 00009300
Jan  8 15:29:07 hyper QEMU[13192]: FS =0000 00000000 0000ffff 00009300
Jan  8 15:29:07 hyper QEMU[13192]: GS =0000 00000000 0000ffff 00009300
Jan  8 15:29:07 hyper QEMU[13192]: LDT=0000 00000000 0000ffff 00008200
Jan  8 15:29:07 hyper QEMU[13192]: TR =0000 00000000 0000ffff 00008b00
Jan  8 15:29:07 hyper QEMU[13192]: GDT=     00000000 0000ffff
Jan  8 15:29:07 hyper QEMU[13192]: IDT=     00000000 0000ffff
Jan  8 15:29:07 hyper QEMU[13192]: CR0=60000010 CR2=00000000 CR3=00000000 CR4=001726e0
Jan  8 15:29:07 hyper QEMU[13192]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Jan  8 15:29:07 hyper QEMU[13192]: DR6=00000000ffff0ff0 DR7=0000000000000400
Jan  8 15:29:07 hyper QEMU[13192]: EFER=0000000000000000
Jan  8 15:29:07 hyper QEMU[13192]: Code=00 66 89 d8 66 e8 e1 a3 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Jan  8 15:29:07 hyper kernel: [17678.626163] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
Jan  8 15:29:24 hyper pvedaemon[582]: <root@pam> end task UPID:hyper:000033A1:001AD58B:5FF879E7:vncproxy:601:root@pam: OK
 
Last edited:
  • Like
Reactions: KB19
Some updates, it looks like a bug since in old pve the issue doesn't happen:
- All hosts have nested virtualization enabled
- Disabling the nested virtualization on host 2 (pve 6.3) makes the VM behaves correctly
- The issue happens even it the VMs is created from scratch with a fresh install of TrueNAS
- The issue happens even with old FreeNAS Version 11.2 and 11.3
- The issue happens on a third host, following the data of the new host:

Host 3 (pve 6.2):
CPU: 2x Intel Xeon E5-2670 v2
intel-microcode 3.20200616.1~deb10u1

lscpu:
Code:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              40
On-line CPU(s) list: 0-39
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               62
Model name:          Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Stepping:            4
CPU MHz:             2494.082
CPU max MHz:         3300.0000
CPU min MHz:         1200.0000
BogoMIPS:            4987.93
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            25600K
NUMA node0 CPU(s):   0-9,20-29
NUMA node1 CPU(s):   10-19,30-39
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d

pveversion:
Code:
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-14 (running version: 6.2-14/2001b502)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.4.65-1-pve: 5.4.65-1
ceph: 14.2.11-pve1
ceph-fuse: 14.2.11-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 0.9.4-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-6
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-3
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-17
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2
 
  • Like
Reactions: KB19
The guest is configured with cpu: host, using the default kvm64 it works as expected.
Disabling the nested virtualization on host 2 (pve 6.3) makes the VM behaves correctly
Thanks for sharing your findings! :)

Workaround for my setup: Use i.e. "Westmere" instead of "host" as CPU type. I don't need the VMX flag at TrueNAS. Performance seems to be identical and the reboot issue is gone. "kvm64" as CPU type is just too slow, because there's no AES-NI by default.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!