for me it did not work, crash after aprox 2,5 daysI upgrade to the latest -3 when i disabled the nested virtualization and since then no crash.
Could been fixed in -3 or the nested option works.
for me it did not work, crash after aprox 2,5 daysI upgrade to the latest -3 when i disabled the nested virtualization and since then no crash.
Could been fixed in -3 or the nested option works.
The VM is running Server 2022 so yes UEFI with pre-enrolled keys.Can you try disabling SMM?
To do so you'll have to run the VM manually. First runqm showcmd <VMID> --pretty
and copy the content to a file.
Modify the-machine
line by adding,smm=off
.
Then run that command.
One more question, do all of you use UEFI with pre-enrolled keys and perhaps even secure boot?
same here VM created through Proxmox "wizzard" for Win Server 2022The VM is running Server 2022 so yes UEFI with pre-enrolled keys.
May 20 03:29:52 proxmox postfix/qmgr[2909]: 9B83980D67: removed
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], CHECK POWER STATUS spins up disk (0x
82 -> 0xff)
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Prefailure Attribute: 1 Raw_Re
ad_Error_Rate changed from 84 to 74
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Usage Attribute: 190 Airflow_T
emperature_Cel changed from 71 to 70
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperatu
re_Celsius changed from 29 to 30
May 20 03:46:58 proxmox kernel: [68465.912927] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 20 03:46:58 proxmox QEMU[3054]: KVM: entry failed, hardware error 0x80000021
May 20 03:46:58 proxmox QEMU[3054]: If you're running a guest on an Intel machine without unrestricted mode
May 20 03:46:58 proxmox QEMU[3054]: support, the failure can be most likely due to the guest entering an invalid
May 20 03:46:58 proxmox QEMU[3054]: state for Intel VT. For example, the guest maybe running in big real mode
May 20 03:46:58 proxmox QEMU[3054]: which is not supported on less recent Intel processors.
May 20 03:46:58 proxmox QEMU[3054]: EAX=00127c6a EBX=7d183180 ECX=00000000 EDX=00000000
May 20 03:46:58 proxmox QEMU[3054]: ESI=7d18f240 EDI=dafd80c0 EBP=68dc6470 ESP=68dc6290
May 20 03:46:58 proxmox QEMU[3054]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLTMay 20 03:29:52 proxmox postfix/qmgr[2909]: 9B83980D67: removed
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], CHECK POWER STATUS spins up disk (0x
82 -> 0xff)
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Prefailure Attribute: 1 Raw_Re
ad_Error_Rate changed from 84 to 74
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Usage Attribute: 190 Airflow_T
emperature_Cel changed from 71 to 70
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperatu
re_Celsius changed from 29 to 30
May 20 03:46:58 proxmox kernel: [68465.912927] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 20 03:46:58 proxmox QEMU[3054]: KVM: entry failed, hardware error 0x80000021
May 20 03:46:58 proxmox QEMU[3054]: If you're running a guest on an Intel machine without unrestricted mode
May 20 03:46:58 proxmox QEMU[3054]: support, the failure can be most likely due to the guest entering an invalid
May 20 03:46:58 proxmox QEMU[3054]: state for Intel VT. For example, the guest maybe running in big real mode
May 20 03:46:58 proxmox QEMU[3054]: which is not supported on less recent Intel processors.
May 20 03:46:58 proxmox QEMU[3054]: EAX=00127c6a EBX=7d183180 ECX=00000000 EDX=00000000
May 20 03:46:58 proxmox QEMU[3054]: ESI=7d18f240 EDI=dafd80c0 EBP=68dc6470 ESP=68dc6290
May 20 03:46:58 proxmox QEMU[3054]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLTMay 20 03:29:52 proxmox postfix/qmgr[2909]: 9B83980D67: removed
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], CHECK POWER STATUS spins up disk (0x
82 -> 0xff)
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Prefailure Attribute: 1 Raw_Re
ad_Error_Rate changed from 84 to 74
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Usage Attribute: 190 Airflow_T
emperature_Cel changed from 71 to 70
May 20 03:46:01 proxmox smartd[2401]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperatu
re_Celsius changed from 29 to 30
May 20 03:46:58 proxmox kernel: [68465.912927] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 20 03:46:58 proxmox QEMU[3054]: KVM: entry failed, hardware error 0x80000021
May 20 03:46:58 proxmox QEMU[3054]: If you're running a guest on an Intel machine without unrestricted mode
May 20 03:46:58 proxmox QEMU[3054]: support, the failure can be most likely due to the guest entering an invalid
May 20 03:46:58 proxmox QEMU[3054]: state for Intel VT. For example, the guest maybe running in big real mode
May 20 03:46:58 proxmox QEMU[3054]: which is not supported on less recent Intel processors.
May 20 03:46:58 proxmox QEMU[3054]: EAX=00127c6a EBX=7d183180 ECX=00000000 EDX=00000000
May 20 03:46:58 proxmox QEMU[3054]: ESI=7d18f240 EDI=dafd80c0 EBP=68dc6470 ESP=68dc6290
May 20 03:46:58 proxmox QEMU[3054]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT
Turning off SMM causes the VM to start but not POST (yes, I did launchCan you try disabling SMM?
To do so you'll have to run the VM manually. First runqm showcmd <VMID> --pretty
and copy the content to a file.
Modify the-machine
line by adding,smm=off
.
Then run that command.
swtpm
manually), so I'm afraid this is not a viable test for VMs that require secure boot.kvm
arguments you may want to use in trying to reproduce this problem:-machine type=pc-q35-6.1+pve0
-smp 1,sockets=1,cores=12,maxcpus=12
-cpu host,+hv-tlbflush,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt
-numa node,nodeid=0,cpus=0-11,memdev=ram-node0
-drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd
-tpmdev emulator,id=tpmdev,chardev=tpmchar
-device tpm-tis,tpmdev=tpmdev
Nothing special. Even OS installation gets irrecoverably interrupted, I had to repeat it 3 times. Looks like disabling CFG (Control flow guard) on guest makes crashes less often, maybe by three times. There may be more types of illegal instructions causing VM crash.We're now trying to reproduce it here with Windows Server 2022 and pre-enrolled keys.
Is there any specific software you're running in the VMs that run into this assertion?
Any load we could try to reproduce?
5.13.19-6
—with all other things unchanged—completely resolves the issue for me.5.15.35-1
my VM would invariably crash during Windows startup or shortly thereafter (before I could even log in), so in my particular case it's very quick and easy to know whether the problem is still there or not.Now that I think about it I had vanilla windows server 2019 VMs crash on me too but I don't know if those crashes were caused by the same thing. I'll spin up a couple and see if any of them crash.We're now trying to reproduce it here with Windows Server 2022 and pre-enrolled keys.
Is there any specific software you're running in the VMs that run into this assertion?
Any load we could try to reproduce?
5.13.19
is a solution that works for me (been over two weeks now since going back).May 4 19:01:15 riko kernel: [ 0.000000] microcode: microcode updated early to revision 0xec, date = 2021-04-29
May 4 19:01:15 riko kernel: [ 0.000000] Linux version 5.15.30-2-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.30-3 (Fri, 22 Apr 2022 18:08:27 +0200) ()
May 4 19:01:15 riko kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.30-2-pve root=/dev/mapper/pve-root ro quiet console=tty0 console=ttyS1,115200n8 fsck.mode=force fsck.repair=preen nomodeset nmi_watchdog=0 intel_iommu=on l1tf=off mds=off tsx=on tsx_async_abort=off aacraid.expose_physicals=0 aacraid.dacmode=1 cpufreq.default_governor=schedutil
May 4 21:05:35 riko QEMU[4093]: KVM: entry failed, hardware error 0x80000021
May 4 21:05:35 riko QEMU[4093]: If you're running a guest on an Intel machine without unrestricted mode
May 4 21:05:35 riko QEMU[4093]: support, the failure can be most likely due to the guest entering an invalid
May 4 21:05:35 riko QEMU[4093]: state for Intel VT. For example, the guest maybe running in big real mode
May 4 21:05:35 riko QEMU[4093]: which is not supported on less recent Intel processors.
May 4 21:05:35 riko QEMU[4093]: EAX=00000000 EBX=00000000 ECX=40000070 EDX=00000000
May 4 21:05:35 riko QEMU[4093]: ESI=00000000 EDI=00326000 EBP=414bc476 ESP=00605920
May 4 21:05:35 riko QEMU[4093]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
May 4 21:05:35 riko QEMU[4093]: ES =0000 00000000 ffffffff 00809300
May 4 21:05:35 riko QEMU[4093]: CS =c400 7ffc4000 ffffffff 00809300
May 4 21:05:35 riko QEMU[4093]: SS =0000 00000000 ffffffff 00809300
May 4 21:05:35 riko QEMU[4093]: DS =0000 00000000 ffffffff 00809300
May 4 21:05:35 riko QEMU[4093]: FS =0000 00000000 ffffffff 00809300
May 4 21:05:35 riko QEMU[4093]: GS =0000 00000000 ffffffff 00809300
May 4 21:05:35 riko QEMU[4093]: LDT=0000 00000000 00000000 00000000
May 4 21:05:35 riko QEMU[4093]: TR =0030 00356040 00000067 00008b00
May 4 21:05:35 riko QEMU[4093]: GDT= 00356000 0000ffff
May 4 21:05:35 riko QEMU[4093]: IDT= 00000000 00000000
May 4 21:05:35 riko QEMU[4093]: CR0=00010030 CR2=385dda78 CR3=f9cc3000 CR4=00000000
May 4 21:05:35 riko QEMU[4093]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
May 4 21:05:35 riko QEMU[4093]: DR6=00000000ffff0ff0 DR7=0000000000000400
May 4 21:05:35 riko QEMU[4093]: EFER=0000000000000000
May 4 21:05:35 riko QEMU[4093]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz
Stepping: 9
CPU MHz: 3800.000
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 7599.80
Virtualization: VT-x
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 1 MiB
L3 cache: 8 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX vulnerable
Vulnerability Mds: Vulnerable; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
# cat /sys/module/kvm_intel/parameters/nested
Y
# cat /etc/pve/qemu-server/109.conf
agent: 1
args: -cpu host,+aes,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-time,hv-synic,hv-stimer,+hv-tlbflush,hv-ipi,hv-frequencies,hv-stimer-direct,hv-reenlightenment,hv-no-nonarch-coresharing=on,+kvm_pv_unhalt,+pcid,+pdpe1gb,+spec-ctrl,+ssbd
balloon: 4096
bios: ovmf
bootdisk: scsi0
cores: 4
cpu: host,flags=+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+hv-tlbflush;+hv-evmcs;+aes
efidisk0: local:109/vm-109-disk-3.raw,efitype=4m,pre-enrolled-keys=1,size=528K
hotplug: disk,network,usb
ide0: none,media=cdrom
localtime: 1
machine: pc-q35-6.1
memory: 6144
name: nanoha
net0: virtio=66:04:FA:3A:15:D1,bridge=vmbr0
numa: 0
onboot: 1
ostype: win10
scsi0: local:109/vm-109-disk-1.qcow2,discard=on,size=128G,ssd=1,aio=native
scsihw: virtio-scsi-pci
smbios1: uuid=9cc7744e-098e-488c-a175-37a4a36d6135
sockets: 1
tablet: 0
tpmstate0: local:109/vm-109-disk-2.raw,size=4M,version=v2.0
usb0: spice,usb3=1
vga: qxl
vmgenid: d2f5ea32-9be4-4c64-96d6-09d49cbe6487
# cat /etc/pve/qemu-server/100.conf
agent: 1
args: -machine kernel_irqchip=on -cpu host,+aes,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-time,hv-synic,hv-stimer,+hv-tlbflush,hv-ipi,hv-frequencies,hv-stimer-direct,hv-reenlightenment,hv-no-nonarch-coresharing=on,+kvm_pv_unhalt,+pcid,+pdpe1gb,+spec-ctrl,+ssbd
balloon: 8192
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 4
cpu: host,flags=+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+hv-tlbflush;+hv-evmcs;+aes
efidisk0: local:100/vm-100-disk-3.raw,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 06:00,pcie=1
hotplug: disk,network,usb,cpu
ide0: none,media=cdrom
localtime: 1
machine: pc-q35-6.1
memory: 12288
name: azusa
net0: virtio=72:9F:D9:C6:42:AA,bridge=vmbr0
numa: 0
onboot: 1
ostype: win10
scsi0: local:100/vm-100-disk-0.qcow2,discard=on,iothread=1,size=128G,ssd=1,aio=native
scsi1: /dev/mapper/tank-veeam,backup=0,iothread=1,size=4T
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=bd604984-7547-449e-8d30-10b815c8bf22
sockets: 1
startup: order=1
tablet: 0
tpmstate0: local:100/vm-100-disk-2.raw,size=4M,version=v2.0
usb0: spice,usb3=1
vga: qxl
vmgenid: a3fbf8ce-21e9-4f85-ae05-d036f173a51e
Im running WinServer 2022 with MSSql server 2019 with small database under 10G, the VM crashed regularly, other VM with WinServer 2022 serving as windows fileserver does not crash, meaning on the same Proxmox instanceWe're now trying to reproduce it here with Windows Server 2022 and pre-enrolled keys.
Is there any specific software you're running in the VMs that run into this assertion?
Any load we could try to reproduce?
5.13.19-6-pve
and disabled nested virtualization. Will monitor the situation.The following ISO has this issue even during OS installation, with network not configured and internet not yet reachable. The only thing to select/customize is Datacenter with GUI (last in the list). It's the 180 day evaluation version of WS 2022. I doubt it already has any May updates.As I wrote in German forum: I am having two VM Windows Server 2022 on the pve with the 5.15.35-1 Kernel. One with and one without the Microsoft May updates. The VM with the updates crashes the other one is stable. Could someone confirm that the crashes are only with Microsoft May updates?
5.13.19-6-pve
to resolve some issues with GPU passthrough.5.15.35-2
.)5.15.35-2
.5.13.19-6-pve
kernel unpinned, running the WindowsImageBackup will always make it crash.May 22 11:38:07 pve QEMU[3291]: KVM: entry failed, hardware error 0x80000021
May 22 11:38:07 pve kernel: [ 380.096623] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
May 22 11:38:07 pve QEMU[3291]: If you're running a guest on an Intel machine without unrestricted mode
May 22 11:38:07 pve QEMU[3291]: support, the failure can be most likely due to the guest entering an invalid
May 22 11:38:07 pve QEMU[3291]: state for Intel VT. For example, the guest maybe running in big real mode
May 22 11:38:07 pve QEMU[3291]: which is not supported on less recent Intel processors.
May 22 11:38:07 pve QEMU[3291]: EAX=00000000 EBX=00000000 ECX=40000070 EDX=00000000
May 22 11:38:07 pve QEMU[3291]: ESI=00000000 EDI=0037c000 EBP=003b2d59 ESP=003b2cb0
May 22 11:38:07 pve QEMU[3291]: EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
May 22 11:38:07 pve QEMU[3291]: ES =0000 00000000 ffffffff 00809300
May 22 11:38:07 pve QEMU[3291]: CS =be00 7ffbe000 ffffffff 00809300
May 22 11:38:07 pve QEMU[3291]: SS =0000 00000000 ffffffff 00809300
May 22 11:38:07 pve QEMU[3291]: DS =0000 00000000 ffffffff 00809300
May 22 11:38:07 pve QEMU[3291]: FS =0000 00000000 ffffffff 00809300
May 22 11:38:07 pve QEMU[3291]: GS =0000 00000000 ffffffff 00809300
May 22 11:38:07 pve QEMU[3291]: LDT=0000 00000000 00000000 00000000
May 22 11:38:07 pve QEMU[3291]: TR =0030 003ac040 00000067 00008b00
May 22 11:38:07 pve QEMU[3291]: GDT= 003ac000 0000ffff
May 22 11:38:07 pve QEMU[3291]: IDT= 00000000 00000000
May 22 11:38:07 pve QEMU[3291]: CR0=00010030 CR2=00000000 CR3=17786000 CR4=00000000
May 22 11:38:07 pve QEMU[3291]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
May 22 11:38:07 pve QEMU[3291]: DR6=00000000ffff0ff0 DR7=0000000000000400
May 22 11:38:07 pve QEMU[3291]: EFER=0000000000000000
May 22 11:38:07 pve QEMU[3291]: Code=kvm: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
May 22 11:38:08 pve kernel: [ 380.262131] vmbr0: port 2(tap100i0) entered disabled state
May 22 11:38:08 pve qmeventd[14863]: Starting cleanup for 100
May 22 11:38:08 pve qmeventd[14863]: Finished cleanup for 100
May 22 11:38:11 pve systemd[1]: 100.scope: Succeeded.
May 22 11:38:11 pve systemd[1]: 100.scope: Consumed 18min 48.635s CPU time.
Can anyone point me in the right direction of this?Is there any way to automatically restart VM on failure until this gets resolved?