[SOLVED] With latest 5.15.104-1-pve, Windows Server VM freeze/stuck

jf2021

New Member
Jul 17, 2021
18
11
3
54
Hi. After upgrading to 5.15.104-1-pve (no-subscription package), Windows Server VM boot but they freeze/get stucked before login screen. I got some errors in the kern.log. It seems to happen to different machine with different hardware (Intel or AMD CPU).
Rolling back to 5.15.102-1-pve solved the problem.

Is that a bug ?

Code:
# pveversion
pve-manager/7.4-3/9002ab8a (running kernel: 5.15.104-1-pve)

Code:
# qm config 105
agent: 0
bootdisk: ide0
cores: 4
cpu: host,flags=+hv-tlbflush
ide0: local:105/vm-105-disk-0.raw,size=200G
ide3: none,media=cdrom
localtime: 1
memory: 16384
name: winsrv105
net0: virtio=56:2A:FD:1F:DE:35,bridge=vmbr1
net1: virtio=B2:45:43:0A:34:FD,bridge=vmbr2
numa: 0
onboot: 0
ostype: win8
protection: 1
scsihw: virtio-scsi-pci
smbios1: uuid=cc12c0a7-5eb9-4f44-bc89-c6c6c1ed5804
sockets: 2
vmgenid: 40ae3691-3c2a-427c-8957-8e81421ea236

Code:
# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) D-2141I CPU @ 2.20GHz
Stepping:                        4
CPU MHz:                         2249.792
CPU max MHz:                     3000,0000
CPU min MHz:                     1000,0000
BogoMIPS:                        4400.00
Virtualization:                  VT-x
L1d cache:                       256 KiB
L1i cache:                       256 KiB
L2 cache:                        8 MiB
L3 cache:                        11 MiB
NUMA node0 CPU(s):               0-15
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:          Mitigation; IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xt
                                 opology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra
                                 nd lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 e
                                 rms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm i
                                 da arat pln pts pku ospke md_clear flush_l1d

Code:
Apr  4 02:25:41 ns3192824 kernel: [  154.793788] BUG: kernel NULL pointer dereference, address: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  154.803631] #PF: supervisor read access in kernel mode
Apr  4 02:25:41 ns3192824 kernel: [  154.811691] #PF: error_code(0x0000) - not-present page
Apr  4 02:25:41 ns3192824 kernel: [  154.819794] PGD 0 P4D 0
Apr  4 02:25:41 ns3192824 kernel: [  154.825292] Oops: 0000 [#1] SMP PTI
Apr  4 02:25:41 ns3192824 kernel: [  154.831716] CPU: 7 PID: 14578 Comm: CPU 0/KVM Tainted: P           O      5.15.104-1-pve #1
Apr  4 02:25:41 ns3192824 kernel: [  154.843077] Hardware name: Supermicro Super Server/X11SDV-8C-TLN2F, BIOS 1.3a 07/13/2020
Apr  4 02:25:41 ns3192824 kernel: [  154.854269] RIP: 0010:_find_first_bit+0x19/0x40
Apr  4 02:25:41 ns3192824 kernel: [  154.861880] Code: 5d 41 5e 41 5f 5d c3 cc cc cc cc cc cc cc cc cc cc 49 89 f0 48 85 f6 74 28 31 c0 eb 0d 48 83 c0 40 48 83 c7 08 4c 39 c0 73 17 <48> 8b 17 48 85 d2 74 eb f3 48 0f bc d2 48 01 d0 49 39 c0 4c 0f 47
Apr  4 02:25:41 ns3192824 kernel: [  154.887177] RSP: 0018:ffffa8d74af3b788 EFLAGS: 00010246
Apr  4 02:25:41 ns3192824 kernel: [  154.895697] RAX: 0000000000000000 RBX: ffffa8d74a705000 RCX: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  154.906157] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  154.916626] RBP: ffffa8d74af3b7d0 R08: 0000000000000400 R09: ffff8eb23d2e3728
Apr  4 02:25:41 ns3192824 kernel: [  154.927065] R10: ffff8eb2e86f0170 R11: 000000000000002c R12: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  154.937467] R13: ffff8eb23d2e3728 R14: 0000000000000323 R15: 0000000000000003
Apr  4 02:25:41 ns3192824 kernel: [  154.947884] FS:  00007fe867d95700(0000) GS:ffff8ebf7ffc0000(0000) knlGS:fffff80320963000
Apr  4 02:25:41 ns3192824 kernel: [  154.959280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  4 02:25:41 ns3192824 kernel: [  154.968365] CR2: 0000000000000000 CR3: 00000002fd010001 CR4: 00000000007726e0
Apr  4 02:25:41 ns3192824 kernel: [  154.978920] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  154.989471] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr  4 02:25:41 ns3192824 kernel: [  154.999990] PKRU: 55555554
Apr  4 02:25:41 ns3192824 kernel: [  155.006087] Call Trace:
Apr  4 02:25:41 ns3192824 kernel: [  155.011864]  <TASK>
Apr  4 02:25:41 ns3192824 kernel: [  155.017222]  ? kvm_make_vcpus_request_mask+0x3d/0x130 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.026189]  kvm_hv_flush_tlb.isra.0+0x116/0x540 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.034697]  ? vmx_read_guest_seg_ar+0x37/0x130 [kvm_intel]
Apr  4 02:25:41 ns3192824 kernel: [  155.043617]  ? kvm_page_track_is_active+0x16/0x60 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.052233]  ? mmu_try_to_unsync_pages+0x35/0x210 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.060823]  ? make_spte+0x165/0x3e0 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.068245]  ? kvm_tdp_mmu_map+0x3bd/0x6a0 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.076257]  ? kvm_is_reserved_pfn+0x2f/0x80 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.084364]  ? kvm_release_pfn_clean+0x3d/0x50 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.092601]  ? direct_page_fault+0x543/0xbd0 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.100701]  ? vmx_read_guest_seg_ar+0x37/0x130 [kvm_intel]
Apr  4 02:25:41 ns3192824 kernel: [  155.110161]  kvm_hv_hypercall+0x3af/0x880 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.117983]  ? sysvec_call_function_single+0x4e/0x90
Apr  4 02:25:41 ns3192824 kernel: [  155.126224]  ? asm_sysvec_call_function_single+0x1b/0x20
Apr  4 02:25:41 ns3192824 kernel: [  155.134796]  ? vmx_vmexit+0x7c/0xdad [kvm_intel]
Apr  4 02:25:41 ns3192824 kernel: [  155.142707]  ? vmx_vmexit+0x76/0xdad [kvm_intel]
Apr  4 02:25:41 ns3192824 kernel: [  155.150547]  ? vmx_vmexit+0x90/0xdad [kvm_intel]
Apr  4 02:25:41 ns3192824 kernel: [  155.158380]  ? kvm_emulate_hypercall.part.0+0x6e0/0x6e0 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.167420]  kvm_emulate_hypercall+0x51/0x60 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.175487]  ? kvm_emulate_hypercall+0x51/0x60 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.183734]  vmx_handle_exit+0x1fc/0x8d0 [kvm_intel]
Apr  4 02:25:41 ns3192824 kernel: [  155.191978]  kvm_arch_vcpu_ioctl_run+0xdd6/0x1730 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.200556]  kvm_vcpu_ioctl+0x252/0x6b0 [kvm]
Apr  4 02:25:41 ns3192824 kernel: [  155.208290]  ? fpu__restore_sig+0x55/0xa0
Apr  4 02:25:41 ns3192824 kernel: [  155.215590]  ? __fget_files+0x86/0xc0
Apr  4 02:25:41 ns3192824 kernel: [  155.222488]  __x64_sys_ioctl+0x92/0xd0
Apr  4 02:25:41 ns3192824 kernel: [  155.229473]  do_syscall_64+0x59/0xc0
Apr  4 02:25:41 ns3192824 kernel: [  155.236276]  ? __x64_sys_futex+0x81/0x1d0
Apr  4 02:25:41 ns3192824 kernel: [  155.243499]  ? exit_to_user_mode_prepare+0x37/0x1b0
Apr  4 02:25:41 ns3192824 kernel: [  155.251574]  ? exit_to_user_mode_prepare+0x37/0x1b0
Apr  4 02:25:41 ns3192824 kernel: [  155.259558]  ? syscall_exit_to_user_mode+0x27/0x50
Apr  4 02:25:41 ns3192824 kernel: [  155.267371]  ? do_syscall_64+0x69/0xc0
Apr  4 02:25:41 ns3192824 kernel: [  155.274040]  ? do_syscall_64+0x69/0xc0
Apr  4 02:25:41 ns3192824 kernel: [  155.280592]  ? irqentry_exit+0x1d/0x30
Apr  4 02:25:41 ns3192824 kernel: [  155.287054]  ? exc_page_fault+0x89/0x170
Apr  4 02:25:41 ns3192824 kernel: [  155.293632]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Apr  4 02:25:41 ns3192824 kernel: [  155.301387] RIP: 0033:0x7fe87340d5f7
Apr  4 02:25:41 ns3192824 kernel: [  155.307647] Code: 00 00 00 48 8b 05 99 c8 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c8 0d 00 f7 d8 64 89 01 48
Apr  4 02:25:41 ns3192824 kernel: [  155.332230] RSP: 002b:00007fe867d90408 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr  4 02:25:41 ns3192824 kernel: [  155.342755] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fe87340d5f7
Apr  4 02:25:41 ns3192824 kernel: [  155.352840] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001c
Apr  4 02:25:41 ns3192824 kernel: [  155.362886] RBP: 000055ec457e3e10 R08: 000055ec44507240 R09: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  155.372937] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  155.383026] R13: 000055ec44c12020 R14: 00007fe867d906c0 R15: 000055ec4562d330
Apr  4 02:25:41 ns3192824 kernel: [  155.393066]  </TASK>
Apr  4 02:25:41 ns3192824 kernel: [  155.398139] Modules linked in: nft_compat nft_counter nf_tables rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs veth ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack ip_set_hash_net ip_set iptable_filter iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass drm_vram_helper drm_ttm_helper ttm irdma drm_kms_helper cec crct10dif_pclmul ghash_clmulni_intel aesni_intel rc_core crypto_simd cryptd i2c_algo_bit ioatdma fb_sys_fops ice rapl syscopyarea ib_uverbs joydev input_leds mei_me sysfillrect sysimgblt intel_cstate mei dca efi_pstore acpi_ipmi intel_pch_thermal ipmi_si ipmi_devintf mac_hid
Apr  4 02:25:41 ns3192824 kernel: [  155.398203]  ipmi_msghandler acpi_power_meter acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear simplefb hid_generic usbkbd usbmouse usbhid hid raid1 xhci_pci crc32_pclmul xhci_pci_renesas i40e i2c_i801 ahci i2c_smbus xhci_hcd libahci wmi
Apr  4 02:25:41 ns3192824 kernel: [  155.584197] CR2: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  155.591871] ---[ end trace 4815489ecd5be884 ]---
Apr  4 02:25:41 ns3192824 kernel: [  155.649933] RIP: 0010:_find_first_bit+0x19/0x40
Apr  4 02:25:41 ns3192824 kernel: [  155.658828] Code: 5d 41 5e 41 5f 5d c3 cc cc cc cc cc cc cc cc cc cc 49 89 f0 48 85 f6 74 28 31 c0 eb 0d 48 83 c0 40 48 83 c7 08 4c 39 c0 73 17 <48> 8b 17 48 85 d2 74 eb f3 48 0f bc d2 48 01 d0 49 39 c0 4c 0f 47
Apr  4 02:25:41 ns3192824 kernel: [  155.686863] RSP: 0018:ffffa8d74af3b788 EFLAGS: 00010246
Apr  4 02:25:41 ns3192824 kernel: [  155.696769] RAX: 0000000000000000 RBX: ffffa8d74a705000 RCX: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  155.708658] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  155.720505] RBP: ffffa8d74af3b7d0 R08: 0000000000000400 R09: ffff8eb23d2e3728
Apr  4 02:25:41 ns3192824 kernel: [  155.732351] R10: ffff8eb2e86f0170 R11: 000000000000002c R12: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  155.744199] R13: ffff8eb23d2e3728 R14: 0000000000000323 R15: 0000000000000003
Apr  4 02:25:41 ns3192824 kernel: [  155.756024] FS:  00007fe867d95700(0000) GS:ffff8ebf7ffc0000(0000) knlGS:fffff80320963000
Apr  4 02:25:41 ns3192824 kernel: [  155.768864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  4 02:25:41 ns3192824 kernel: [  155.779372] CR2: 0000000000000000 CR3: 00000002fd010001 CR4: 00000000007726e0
Apr  4 02:25:41 ns3192824 kernel: [  155.791341] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr  4 02:25:41 ns3192824 kernel: [  155.803289] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr  4 02:25:41 ns3192824 kernel: [  155.815233] PKRU: 55555554
Apr  4 04:01:54 ns3192824 kernel: [ 5928.173178] perf: interrupt took too long (2529 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
 
cpu: host,flags=+hv-tlbflush

Apr 4 02:25:41 ns3192824 kernel: [ 155.026189] kvm_hv_flush_tlb.isra.0+0x116/0x540 [kvm]

Try removing the +hv-tlbflush flag (using the gui, or remove the ,flags=+hv-tlbflush from the cpu: line in the config file) and see if that makes a difference.
 
cpu: host,flags=+hv-tlbflush

Apr 4 02:25:41 ns3192824 kernel: [ 155.026189] kvm_hv_flush_tlb.isra.0+0x116/0x540 [kvm]

Try removing the +hv-tlbflush flag (using the gui, or remove the ,flags=+hv-tlbflush from the cpu: line in the config file) and see if that makes a difference.
@milsav92 Thanks a lot for your suggestion, removing this flag seems to work. I still want to see on the long term how it goes and if it's stable, but I was able to log into the Windows VM and use it.
Also wondering why, since this option has been on forever and never caused problem with previous kernels.
Anyway, thanks for your help and the workaround !
 
  • Like
Reactions: sirebral

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!