PVE 8 Upgrade: Kernel 6.2.16--pve causing consistent instability not present on 5.15.-pve

danpoltawski · Sep 20, 2023

We are having consistent instability with one guest since upgrading to PVE 8 with the 6.2 kernel variant. I've seen various other threads relating to similar symtoms, but not of the solutions mentioned have solved our issue. Furthermore we can consistently stop the issues by rolling back to the 5.15 Kernel (and consistently reproduce the issue by switching back to the 6.2 kernel) so I believe this is a serious regression.

Kernels versions:

Tried and seen the same behaviour:

{release="6.2.16-12-pve", version="#1 SMP PREEMPT_DYNAMIC PMX 6.2.16-12 (2023-09-04T13:21Z)"}
{release="6.2.16-6-pve", version="#1 SMP PREEMPT_DYNAMIC PMX 6.2.16-7 (2023-08-01T11:23Z)"}

{release="6.2.16-5-pve", version="#1 SMP PREEMPT_DYNAMIC PVE 6.2.16-6 (2023-07-25T15:33Z)"}
{release="6.2.16-4-pve", version="#1 SMP PREEMPT_DYNAMIC PVE 6.2.16-5 (2023-07-14T17:53Z)"}

Kernels which we've switched back to and not had any instability:

{release="5.15.116-1-pve", version="#1 SMP PVE 5.15.116-1 (2023-08-29T13:46Z)"}
{release="5.15.108-1-pve", version="#1 SMP PVE 5.15.108-1 (2023-06-17T09:41Z)"}

Symptoms:

One of our Linux guests (Debian 11, recently upgraded to Debian 12) starts experiencing significant soft lockups, leading to extensive instability (and eventually crashes).

e.g.

Code:

15:09:41+01:00    watchdog: BUG: soft lockup - CPU#6 stuck for 24s! [loki:37022]
15:09:41+01:00    watchdog: BUG: soft lockup - CPU#3 stuck for 24s! [loki:36312]
15:09:41+01:00    watchdog: BUG: soft lockup - CPU#12 stuck for 24s! [caddy:38428]
15:09:41+01:00    watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [loki:39941]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#8 stuck for 24s! [caddy:36942]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [loki:36502]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#7 stuck for 24s! [containerd-shim:35661]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#23 stuck for 24s! [containerd-shim:35881]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#13 stuck for 24s! [caddy:36932]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#22 stuck for 24s! [promtail:1106]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#9 stuck for 24s! [loki:39437]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#10 stuck for 24s! [promtail:1111]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#17 stuck for 24s! [loki:36435]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#1 stuck for 24s! [containerd-shim:35832]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#19 stuck for 24s! [dockerd:30801]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#11 stuck for 24s! [loki:39685]
15:10:23+01:00    watchdog: BUG: soft lockup - CPU#16 stuck for 24s! [dockerd:829]

Guest unique features:
The guest impacted by this lockups:

Has a large (7TB) ZFS volume (although root parition is on ceph, which most of our other guests are also running on and not experiencing this issue)
Is doing lots of sustained IO (about 15Mb/s write average)

Host unique features:

Large ZFS pool on slow large spinning disks (9 stripped mirrors)
120 x Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz (4 Sockets) ( SuperMicro CSE-848X X10QBi 24x 3.5")

Workaround attempts

Switching Async IO from io_uring to native threads
Turning on NUMA
Increasing CPU cores, spliting betwene sockets
Switching CPU type to 'host'

We'd really like to get to the bottom of this issue and not be stuck on an old kernel, so happy to try and gather any debugging information to try and get to the bottom of this issue

ilia987 · Sep 26, 2023

i was investigation another issue, and left an open ssh conncetion to one of the vms:

i have the same error:

Code:

Message from syslogd@kube-node-11 at Sep 26 15:13:24 ...
 kernel:[426489.912429] watchdog: BUG: soft lockup - CPU#31 stuck for 22s! [Engine_Simulato:1069183]

Message from syslogd@kube-node-11 at Sep 26 15:31:43 ...
 kernel:[427588.877068] watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [containerd:1096697]

Message from syslogd@kube-node-11 at Sep 26 15:31:43 ...
 kernel:[427588.957414] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [Engine_Simulato:1098497]

Message from syslogd@kube-node-11 at Sep 26 15:31:43 ...
 kernel:[427589.105411] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [Engine_Simulato:1097505]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427589.365409] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [Engine_Simulato:1099132]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427589.601419] watchdog: BUG: soft lockup - CPU#15 stuck for 21s! [Engine_Simulato:1098495]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427589.649406] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [Engine_Simulato:1098444]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381572] watchdog: BUG: soft lockup - CPU#39 stuck for 22s! [systemd-udevd:1097148]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381577] watchdog: BUG: soft lockup - CPU#30 stuck for 21s! [systemd-udevd:1097335]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381589] watchdog: BUG: soft lockup - CPU#48 stuck for 22s! [Engine_Simulato:1098651]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381596] watchdog: BUG: soft lockup - CPU#40 stuck for 23s! [Engine_Simulato:1096711]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381604] watchdog: BUG: soft lockup - CPU#42 stuck for 23s! [systemd-udevd:1098219]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381609] watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [Engine_Simulato:1097726]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381633] watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [Engine_Simulato:1096500]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381648] watchdog: BUG: soft lockup - CPU#32 stuck for 21s! [systemd-udevd:1098215]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381663] watchdog: BUG: soft lockup - CPU#52 stuck for 22s! [systemd-udevd:1097515]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381669] watchdog: BUG: soft lockup - CPU#25 stuck for 21s! [k3s-agent:764383]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381685] watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [systemd-udevd:1097313]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381690] watchdog: BUG: soft lockup - CPU#33 stuck for 22s! [kworker/33:2:1092752]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381693] watchdog: BUG: soft lockup - CPU#21 stuck for 21s! [Engine_Simulato:1098485]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381707] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [Engine_Simulato:1097522]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381712] watchdog: BUG: soft lockup - CPU#28 stuck for 21s! [Engine_Simulato:1097751]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381718] watchdog: BUG: soft lockup - CPU#47 stuck for 22s! [Engine_Simulato:1095113]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381738] watchdog: BUG: soft lockup - CPU#27 stuck for 21s! [Engine_Simulato:1098296]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381747] watchdog: BUG: soft lockup - CPU#51 stuck for 22s! [Engine_Simulato:1098481]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381759] watchdog: BUG: soft lockup - CPU#49 stuck for 22s! [systemd-resolve:1228]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381765] watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [systemd-udevd:1097322]

Message from syslogd@kube-node-11 at Sep 26 15:31:47 ...
 kernel:[427592.381772] watchdog: BUG: soft lockup - CPU#31 stuck for 22s! [Engine_Simulato:1098640]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427629.057029] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [Engine_Simulato:1097751]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427629.076083] watchdog: BUG: soft lockup - CPU#34 stuck for 21s! [Engine_Simulato:1098444]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427629.409031] watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [Engine_Simulato:1100985]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036157] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [Engine_Simulato:1098296]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036163] watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [Engine_Simulato:1098497]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036169] watchdog: BUG: soft lockup - CPU#53 stuck for 22s! [Engine_Simulato:1095215]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036172] watchdog: BUG: soft lockup - CPU#46 stuck for 21s! [k3s-agent:780936]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036177] watchdog: BUG: soft lockup - CPU#48 stuck for 22s! [Engine_Simulato:1098479]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036181] watchdog: BUG: soft lockup - CPU#20 stuck for 21s! [Engine_Simulato:1097505]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036196] watchdog: BUG: soft lockup - CPU#45 stuck for 21s! [k3s-agent:764383]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036201] watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [k3s-agent:1101117]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036205] watchdog: BUG: soft lockup - CPU#47 stuck for 22s! [Engine_Simulato:1095113]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036213] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [Engine_Simulato:1098483]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036223] watchdog: BUG: soft lockup - CPU#38 stuck for 23s! [Engine_Simulato:1097726]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036242] watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [containerd-shim:1095882]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036247] watchdog: BUG: soft lockup - CPU#33 stuck for 21s! [k3s-agent:788136]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036250] watchdog: BUG: soft lockup - CPU#44 stuck for 24s! [kworker/44:2:1071825]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036271] watchdog: BUG: soft lockup - CPU#49 stuck for 22s! [Engine_Simulato:1096500]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036278] watchdog: BUG: soft lockup - CPU#16 stuck for 21s! [containerd-shim:1095497]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036310] watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [swapper/21:0]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036347] watchdog: BUG: soft lockup - CPU#37 stuck for 25s! [k3s-agent:801361]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036422] watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [k3s-agent:892447]

Message from syslogd@kube-node-11 at Sep 26 15:32:27 ...
 kernel:[427633.036535] watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [Engine_Simulato:1097733]

host is :
proxmox 8.0.4
kernel 6.2.16-12

with single vm based on ubuntu 20.04 with cpu set to HOST
the vm is under heavy load (used for compute) allocated 54 out of 56 available cores, so it should not hang

system is stable and noting crashed but it does not look healthy

Outpost1534 · Oct 2, 2023

Came here to say "me too."

I've also tried 6.2.16-11-bpo11-pve with no success.

Host:
HPE Proliant DL360 Gen10 x2 Xeon 5220R's - 96GB RAM
Proxmox VE 8.0.6 (I believe I was on 8.0.4, tried the testing repo)

Guest (exhibiting the issues, though this is the only one with a heavy load):
Debian 12 - Kernel 6.1.0-12
Guest has 48 CPUs (2 sockets 24 cores) - 32GB RAM

Stable with 5.15.116-1-pve.

It's been a frustrating weekend trying to figure this out. Thanks for the info you've provided. Hopefully this will get me by until someone far smarter than myself can nail this down.

fiona · Oct 2, 2023

Hi,
could you share the VM configuration of the guest with the issue? Is there anything in the host system logs/journal around the time the issue occurs? Could you share the output of lscpu on the host?

ilia987 · Oct 2, 2023

fiona said:
Hi,
could you share the VM configuration of the guest with the issue? Is there anything in the host system logs/journal around the time the issue occurs? Could you share the output of lscpu on the host?

sure here:

both host and vm have mitigations=off in grub (error was more frequent before settings this configuration)
my vm is host for high cpu load when the error occurs

proxmox host:
version:

Code:

proxmox-ve: 8.0.2 (running kernel: 6.2.16-14-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-6
proxmox-kernel-6.2.16-14-pve: 6.2.16-14
proxmox-kernel-6.2: 6.2.16-14
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
pve-kernel-5.15.116-1-pve: 5.15.116-1
pve-kernel-5.15.111-1-pve: 5.15.111-1
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.39-2-pve: 5.15.39-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-6
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

lscpu host:

Code:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  56
  On-line CPU(s) list:   0-55
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    BIOS Model name:     Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz  CPU @ 2.4GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  14
    Socket(s):           2
    Stepping:            1
    CPU(s) scaling MHz:  91%
    CPU max MHz:         3300.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            4788.95
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts re
                         p_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadli
                         ne_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_a
                         djust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   896 KiB (28 instances)
  L1i:                   896 KiB (28 instances)
  L2:                    7 MiB (28 instances)
  L3:                    70 MiB (2 instances)
NUMA:                   
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Vulnerable
  L1tf:                  Mitigation; PTE Inversion; VMX vulnerable
  Mds:                   Vulnerable; SMT vulnerable
  Meltdown:              Vulnerable
  Mmio stale data:       Vulnerable
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
  Spectre v2:            Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
  Srbds:                 Not affected
  Tsx async abort:       Vulnerable

vm ubuntu 20.04, processor set to host
lscpu vm:

Code:

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             54
On-line CPU(s) list:                0-53
Thread(s) per core:                 1
Core(s) per socket:                 27
Socket(s):                          2
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              79
Model name:                         Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping:                           1
CPU MHz:                            2394.454
BogoMIPS:                           4788.90
Virtualization:                     VT-x
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          1.7 MiB
L1i cache:                          1.7 MiB
L2 cache:                           216 MiB
L3 cache:                           32 MiB
NUMA node0 CPU(s):                  0-53
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX vulnerable, SMT disabled
Vulnerability Mds:                  Vulnerable; SMT Host state unknown
Vulnerability Meltdown:             Vulnerable
Vulnerability Mmio stale data:      Vulnerable
Vulnerability Retbleed:             Not affected
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:           Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Vulnerable
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xt
                                    opology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3
                                    dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed a
                                    dx smap xsaveopt arat umip md_clear arch_capabilities

fiona · Oct 2, 2023

ilia987 said:
sure here:

both host and vm have mitigations=off in grub (error was more frequent before settings this configuration)
my vm is host for high cpu load when the error occurs

How does the load on the host look like during the issue happens?

Did you also not have the issue on kernel 5.15 and only after upgrading to 6.2? Otherwise, you could try assigning fewer cores to the VM and see if the issue persists. Other QEMU threads (not for the vCPUs) and host/network/IO need some CPU themselves after all.

ilia987 · Oct 2, 2023

like this

before on proxmox 7.4 i never had this issue,
vm was exactly the same

Outpost1534 · Oct 2, 2023

fiona said:
Hi,
could you share the VM configuration of the guest with the issue? Is there anything in the host system logs/journal around the time the issue occurs? Could you share the output of lscpu on the host?

pveversion: (The only thing that changes between kernels is the running kernel version)

Code:

# pveversion --verbose\
proxmox-ve: 8.0.2 (running kernel: 5.15.116-1-pve)\
pve-manager: 8.0.6 (running version: 8.0.6/57490ff2c6a38448)\
proxmox-kernel-helper: 8.0.3\
proxmox-kernel-6.2.16-14-pve: 6.2.16-14\
proxmox-kernel-6.2: 6.2.16-14\
pve-kernel-6.2.16-11-bpo11-pve: 6.2.16-11~bpo11+2\
pve-kernel-5.15.116-1-pve: 5.15.116-1\
ceph-fuse: 17.2.6-pve1+3\
corosync: 3.1.7-pve3\
criu: 3.17.1-2\
glusterfs-client: 10.3-5\
ifupdown2: 3.2.0-1+pmx5\
ksm-control-daemon: 1.4-1\
libjs-extjs: 7.0.0-4\
libknet1: 1.26-pve1\
libproxmox-acme-perl: 1.4.6\
libproxmox-backup-qemu0: 1.4.0\
libproxmox-rs-perl: 0.3.1\
libpve-access-control: 8.0.5\
libpve-apiclient-perl: 3.3.1\
libpve-common-perl: 8.0.9\
libpve-guest-common-perl: 5.0.5\
libpve-http-server-perl: 5.0.4\
libpve-rs-perl: 0.8.6\
libpve-storage-perl: 8.0.3\
libspice-server1: 0.15.1-1\
lvm2: 2.03.16-2\
lxc-pve: 5.0.2-4\
lxcfs: 5.0.3-pve3\
novnc-pve: 1.4.0-2\
proxmox-backup-client: 3.0.2-1\
proxmox-backup-file-restore: 3.0.2-1\
proxmox-kernel-helper: 8.0.3\
proxmox-mail-forward: 0.2.0\
proxmox-mini-journalreader: 1.4.0\
proxmox-widget-toolkit: 4.0.8\
pve-cluster: 8.0.4\
pve-container: 5.0.5\
pve-docs: 8.0.5\
pve-edk2-firmware: 3.20230228-4\
pve-firewall: 5.0.3\
pve-firmware: 3.8-2\
pve-ha-manager: 4.0.2\
pve-i18n: 3.0.7\
pve-qemu-kvm: 8.0.2-6\
pve-xtermjs: 4.16.0-3\
qemu-server: 8.0.7\
smartmontools: 7.3-pve1\
spiceterm: 3.3.0\
swtpm: 0.8.0+pve1\
vncterm: 1.8.0\
zfsutils-linux: 2.1.12-pve1\
\

Guest config:

Code:

# cat /etc/pve/qemu-server/102.conf\
agent: 1\
bios: ovmf\
boot: order=scsi0;ide2;net0\
cores: 24\
cpu: host\
efidisk0: local-zfs:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M\
ide2: local:iso/debian-12.1.0-amd64-DVD-1.iso,media=cdrom,size=3900480K\
machine: q35\
memory: 32768\
meta: creation-qemu=8.0.2,ctime=1693164173\
name: docker-2\
net0: virtio=E2:6B:70:6C:BF:B0,bridge=vmbr0,firewall=1\
numa: 0\
onboot: 1\
ostype: l26\
scsi0: local-zfs:vm-102-disk-1,iothread=1,size=90G\
scsihw: virtio-scsi-single\
smbios1: uuid=e34bed8b-254a-4bbd-b4e3-5d42e3752024\
sockets: 2\
usb0: host=2-14\
vmgenid: 95326ca4-9658-4b46-a40e-b98133ab8411\

Host lscpu with 5.15.116-1-pve:

Code:

# lscpu\
Architecture:                       x86_64\
CPU op-mode(s):                     32-bit, 64-bit\
Address sizes:                      46 bits physical, 48 bits virtual\
Byte Order:                         Little Endian\
CPU(s):                             96\
On-line CPU(s) list:                0-95\
Vendor ID:                          GenuineIntel\
BIOS Vendor ID:                     Intel(R) Corporation\
Model name:                         Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz\
BIOS Model name:                    Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz  CPU @ 2.2GHz\
BIOS CPU family:                    179\
CPU family:                         6\
Model:                              85\
Thread(s) per core:                 2\
Core(s) per socket:                 24\
Socket(s):                          2\
Stepping:                           7\
CPU(s) scaling MHz:                 62%\
CPU max MHz:                        4000.0000\
CPU min MHz:                        1000.0000\
BogoMIPS:                           4400.00\
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities\
Virtualization:                     VT-x\
L1d cache:                          1.5 MiB (48 instances)\
L1i cache:                          1.5 MiB (48 instances)\
L2 cache:                           48 MiB (48 instances)\
L3 cache:                           71.5 MiB (2 instances)\
NUMA node(s):                       2\
NUMA node0 CPU(s):                  0-23,48-71\
NUMA node1 CPU(s):                  24-47,72-95\
Vulnerability Gather data sampling: Mitigation; Microcode\
Vulnerability Itlb multihit:        KVM: Mitigation: Split huge pages\
Vulnerability L1tf:                 Not affected\
Vulnerability Mds:                  Not affected\
Vulnerability Meltdown:             Not affected\
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable\
Vulnerability Retbleed:             Mitigation; Enhanced IBRS\
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp\
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence\
Vulnerability Srbds:                Not affected\
Vulnerability Tsx async abort:      Mitigation; TSX disabled\
\

Host lscpu with 6.2.16-14-pve:

Code:

# lscpu\
Architecture:                       x86_64\
CPU op-mode(s):                     32-bit, 64-bit\
Address sizes:                      46 bits physical, 48 bits virtual\
Byte Order:                         Little Endian\
CPU(s):                             96\
On-line CPU(s) list:                0-95\
Vendor ID:                          GenuineIntel\
BIOS Vendor ID:                     Intel(R) Corporation\
Model name:                         Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz\
BIOS Model name:                    Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz  CPU @ 2.2GHz\
BIOS CPU family:                    179\
CPU family:                         6\
Model:                              85\
Thread(s) per core:                 2\
Core(s) per socket:                 24\
Socket(s):                          2\
Stepping:                           7\
CPU(s) scaling MHz:                 76%\
CPU max MHz:                        4000.0000\
CPU min MHz:                        1000.0000\
BogoMIPS:                           4400.00\
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities\
Virtualization:                     VT-x\
L1d cache:                          1.5 MiB (48 instances)\
L1i cache:                          1.5 MiB (48 instances)\
L2 cache:                           48 MiB (48 instances)\
L3 cache:                           71.5 MiB (2 instances)\
NUMA node(s):                       2\
NUMA node0 CPU(s):                  0-23,48-71\
NUMA node1 CPU(s):                  24-47,72-95\
Vulnerability Gather data sampling: Mitigation; Microcode\
Vulnerability Itlb multihit:        KVM: Mitigation: Split huge pages\
Vulnerability L1tf:                 Not affected\
Vulnerability Mds:                  Not affected\
Vulnerability Meltdown:             Not affected\
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable\
Vulnerability Retbleed:             Mitigation; Enhanced IBRS\
Vulnerability Spec rstack overflow: Not affected\
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl\
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization\
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence\
Vulnerability Srbds:                Not affected\
Vulnerability Tsx async abort:      Mitigation; TSX disabled\
\

lscpu guest (cpu set to host) - Debian 12 - 6.1.0-12-amd64

Code:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  24
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4389.68
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2
                         ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_kno
                         wn_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_
                         timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ss
                         bd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
                         bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx
                         512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabi
                         lities
Virtualization features:
  Virtualization:        VT-x
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):
  L1d:                   1.5 MiB (48 instances)
  L1i:                   1.5 MiB (48 instances)
  L2:                    192 MiB (48 instances)
  L3:                    32 MiB (2 instances)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-47
Vulnerabilities:
  Gather data sampling:  Unknown: Dependent on hypervisor status
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Mitigation; Enhanced IBRS
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; TSX disabled

There isn't anything showing up in the host logs. The guest is rebooting/crashing when the CPU load is high. My experience seems to align with the other 2 in this thread. My other VM's are hanging out- basically idling, no issues with them.

The lscpu differences:

5.15.116-1-pve:
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp\

6.2.16-14-pve:
Vulnerability Spec rstack overflow: Not affected\
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl\

ilia987 · Oct 4, 2023

finally i have encounter a side affect,

ceph mount inside the vm crashed, (umount and mount -a fixes it 95% of the times)

fiona · Oct 12, 2023

I've tried reproducing the issue, but haven't had success yet. The next time it happens, can you provide the output of

Code:

cat /sys/fs/cgroup/qemu.slice/<ID>.scope/cpu.pressure
cat /sys/fs/cgroup/qemu.slice/<ID>.scope/io.pressure
cat /sys/fs/cgroup/qemu.slice/<ID>.scope/memory.pressure

replacing <ID> with the ID of the VM? And also provide the VM configuration if you haven't done so already.

Is there anything in the host's system logs/journal around the time the issue happens?

ilia987 · Oct 15, 2023

Setting mitigation off reduced the amount of the errors (still the bigger the load the more errors),
but going back to kernel 5.15 removed the issue entirely

Whatever · Oct 15, 2023

ilia987 said:
Setting mitigation off reduced the amount of the errors (still the bigger the load the more errors),
but going back to kernel 5.15 removed the issue entirely

Unfortunately, PVE devs keep rejecting the requests to release 5.15 opt kernel for PVE8 even-though number of negative feedbacks on 6.2 kernel increase constantly

fweber · Oct 16, 2023

ilia987 said:
Setting mitigation off reduced the amount of the errors (still the bigger the load the more errors),
but going back to kernel 5.15 removed the issue entirely

The instabilities might be related to the issues reported in another thread [1]. There, users reported that disabling mitigations and KSM seems to avoid the freezes. If I understand correctly, you already disabled mitigations but still see the instabilities. Can you check whether KSM is active? You can do so by running:

Code:

cat /sys/kernel/mm/ksm/pages_shared

If this prints any number greater than 0, then KSM is active. In that case (and if the host has enough RAM), you can try disabling it [2] and see if the situation improves.

Also, if you can reliably reproduce the intermittent freezes, it would be great if you could gather the data I mentioned in the other thread [3] (in addition to the information that @fiona asked for earlier here [4]).

[1] https://forum.proxmox.com/threads/130727/
[2] https://pve.proxmox.com/wiki/Kernel_Samepage_Merging_(KSM)
[3] https://forum.proxmox.com/threads/130727/post-596802
[4] https://forum.proxmox.com/threads/133848/post-595898

ilia987 · Oct 17, 2023

fweber said:
The instabilities might be related to the issues reported in another thread [1]. There, users reported that disabling mitigations and KSM seems to avoid the freezes. If I understand correctly, you already disabled mitigations but still see the instabilities. Can you check whether KSM is active? You can do so by running:

Code:

cat /sys/kernel/mm/ksm/pages_shared

If this prints any number greater than 0, then KSM is active. In that case (and if the host has enough RAM), you can try disabling it [2] and see if the situation improves.

Also, if you can reliably reproduce the intermittent freezes, it would be great if you could gather the data I mentioned in the other thread [3] (in addition to the information that @fiona asked for earlier here [4]).

[1] https://forum.proxmox.com/threads/130727/
[2] https://pve.proxmox.com/wiki/Kernel_Samepage_Merging_(KSM)
[3] https://forum.proxmox.com/threads/130727/post-596802
[4] https://forum.proxmox.com/threads/133848/post-595898

unfortunately i rolled back to kernel 5.15 on all hosts with vms (after rollback no issues at all)
we use the servers in our production so i cannot risk another downtime.

the affect is only for VM.not for lxc

aatallah · Nov 12, 2023

I have the exact same problem on all my hosts that I upgraded to kernel 6.*

Easy way to reproduce:
- Install proxmox 8 with kernel 6
- Install a VM with Photon OS
- Setup some docker images that use a bunch of CPU (docker run --rm -it progrium/stress --cpu 24 --io 1 --vm 2 --vm-bytes 128M --timeout 10s)
- Wait and see the CPU hang

fweber · Nov 13, 2023

Hi, can you post the output of the following commands?

Code:

cat /sys/kernel/mm/ksm/pages_shared
lscpu

Also, is there anything in the journal of the host or the VM at the time of the hang? You can also send an excerpt of journalctl -b.

Are the Docker images run on the host or in the VM? Note that running Docker directly on a PVE host is not recommended, see FAQ #13 [2].

[1] https://forum.proxmox.com/threads/130727/page-7#post-603096
[2] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_frequently_asked_questions_2

aatallah · Nov 14, 2023

Docker is running inside the VM, not on the PVE host hardware.

The host has 144 cores, the VM is setup with 64 cores ( over 4 sockets, 16 cores per socket)

Host machine's kernel logs don't show anything specifically interesting.

The soft lockups happen whenever the VM is under high CPU stress, the `stress` command makes it easy to trigger, but it happens all the time as the VMs are in use. In the test below, the host CPUs are not all used (only 64 out of 144) and there are no other VMs on the host. When running on kernel 5.* (Proxmox 7) this never happened. When upgrading to Proxomox 8, this happens often enough to be problematic (the VM which's kernel is affect hang when this happens)

Code:

(This is on the PVE Host)

root@DrDoom02:~# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  144
  On-line CPU(s) list:   0-143
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) CPU E7-8880 v3 @ 2.30GHz
    BIOS Model name:     Intel(R) Xeon(R) CPU E7-8880 v3 @ 2.30GHz  CPU @ 2.3GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               63
    Thread(s) per core:  2
    Core(s) per socket:  18
    Socket(s):           4
    Stepping:            4
    CPU(s) scaling MHz:  98%
    CPU max MHz:         3100.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            4589.04
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pd
                         pe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
                         ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
                          rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad f
                         sgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush
                         _l1d
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   2.3 MiB (72 instances)
  L1i:                   2.3 MiB (72 instances)
  L2:                    18 MiB (72 instances)
  L3:                    180 MiB (4 instances)
NUMA:            
  NUMA node(s):          4
  NUMA node0 CPU(s):     0-17,72-89
  NUMA node1 CPU(s):     18-35,90-107
  NUMA node2 CPU(s):     36-53,108-125
  NUMA node3 CPU(s):     54-71,126-143
Vulnerabilities:  
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: Split huge pages
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; Clear CPU buffers; SMT vulnerable
 
 
 
root@DrDoom02:~# cat /sys/kernel/mm/ksm/pages_shared
0

When running the command (on the VM) the kernel of the VM shows:

Code:

(This is in the VM, in this case I didn't use docker, just ran at the console of the VM)

root@thanos:~# stress -c 90

Message from syslogd@thanos at Now 14 06:06:32 ...
kernel: [194.789336] watchdog: BUG: soft lockup - CPU#30 stuck for 36s! [stress:2171]

Messagefrom syslogd@thanos at Nov 14 06:06:32
kernel：［194.789303] watchdog: BUG: soft lockup - CPU#6 stuck for 36s! [stress:2132]

Message from syslogd@thanos at Nov 14 06:06:32
kernel: [194.789348] watchdog: BUG: soft lockup - CPU#21 stuck for 36s! [stress:21871]

Message from syslogd@thanos at Nov 14 06:06:32
kernel: [194.789336] watchdog: BUG: soft lockup - CPU#59 stuck for 35s! [stress:2152]

Message from syslogd@thanos at Nov 14 06:06:32
kernel: [194.789336] watchdog: BUG: soft lockup - CPU#52 stuck for 35s! [stress:2162]
[/CODE/

fweber · Nov 14, 2023

Interesting -- thanks! Can you check whether the freezes still happen after disabling automatic NUMA balancing [1] on the host?

Code:

echo 0 > /proc/sys/kernel/numa_balancing

[1] https://doc.opensuse.org/documentation/leap/tuning/html/book-tuning/cha-tuning-numactl.html

hadrian2002 · Nov 17, 2023

I can confirm, that mitigation=off and deactivated ksm mostly fixes cpu soft lockups, we experienced before with current pve version (pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-19-pve)

after applying the workaround my lscpu looks like this:


lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  80
  On-line CPU(s) list:   0-79
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  20
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            6200.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
                          ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbas
                         e tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d a
                         rch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   1.3 MiB (40 instances)
  L1i:                   1.3 MiB (40 instances)
  L2:                    40 MiB (40 instances)
  L3:                    71.5 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerabilities:
  Gather data sampling:  Vulnerable: No microcode
  Itlb multihit:         KVM: Vulnerable
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Vulnerable
  Retbleed:              Vulnerable
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
  Spectre v2:            Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; TSX disabled

In our setup, we can tolerate mitigations=off, but I would be happy to use ksm again.

I could also confirm, that from the kernel log in the guest, it seems to stuck more often in pagefault handling routines, but not consistently at a single function.

Maybe a hint or only a curiosity: I currently run only one VM with nearly all resources of the host but it triggers the soft locks. But KSM manages to share over 50GB from 500GB given to that VM, but as only one VM is running, it shares the pages with itself. I tried to only deactivate KSM without mitigations=off, what did not fix the problem.

Currently, I test whether activating KSM triggers again the problems, but even after some runtime, I have no shared pages, maybe I can give more experience in the next days.

hadrian2002 · Nov 17, 2023

Okay, after some runtime KSM was finally kicking in and the soft lockups occurred again.

So KSM needs to be disabled, to fix that problem.

An additional observation: when KSM is sharing pages, the soft lockups mostly happen with numbers in the 20s like watchdog: BUG: soft lockup - CPU#47 stuck for 23s!. But when deactivating KSM (and remove shared pages via echo 2 > /sys/kernel/mm/ksm/run , the numbers rised to 120s and more.

I now let my workload run without KSM and wait whether the soft lockups are really gone.

PVE 8 Upgrade: Kernel 6.2.16-*-pve causing consistent instability not present on 5.15.*-pve

Member

Active Member

New Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

New Member

Active Member

Proxmox Staff Member

Active Member

Renowned Member

Proxmox Staff Member

Active Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

PVE 8 Upgrade: Kernel 6.2.16--pve causing consistent instability not present on 5.15.-pve