[SOLVED] Proxmox 8.0 / Kernel 6.2.x 100%CPU issue with Windows Server 2019 VMs

Thanks for all the reports and discussions. We have tried to reproduce the intermittent freezes (CPU spikes / lost pings) reported in this thread in our test environment, but have not succeeded so far.

Hence, the root cause of the intermittent freezes is unfortunately still unclear. Let me try to summarize the reports from this thread:
  • All freezes were reported on host kernels 6.2, no freezes reported on kernel 5.15
  • There are no reports whether kernels 5.19 or 6.1 are affected or not
  • On kernel 6.2, disabling KSM and mitigations fixes the issue. Obviously, disabling mitigations is not advisable in the general case.
  • Most reported freezes mention a dual-socket CPU
  • The issue primarily affects Windows guests, more specifically Win2019
  • Freezes become more likely with higher amounts (> 128G) of configured guest memory
  • The intermittent freezes seem unrelated to the (permanent) 100% CPU freeze issues that was discussed over at [1], as those are fixed in kernels >=6.2.16-12, but e.g. @Whatever @Sebi-S report intermittent freezes here still with 6.2.16-15 and 6.2.16-12.
Please let me know if your observations contradict anything I wrote above, or if I missed anything.

Since @Neobin @Whatever @mygeeknc mentioned the KSM regressions on dual-socket machines with kernel 6.2 discussed at [2], we also tried to reproduce the intermittent freezes on the dual-socket test machine which does exhibit the KSM regressions, but no luck so far.

To everyone who can easily reproduce the freezes on a test machine:
  • Could you check whether there is anything in the (host) journal during the freezes?
  • Please fill in $YOUR_VMID in the following script, save it and run it during an intermittent freeze.
    Code:
    #/bin/bash
    VMID=$YOUR_VMID
    PID=$(cat /var/run/qemu-server/$VMID.pid)
    timeout 5 strace -c -p $PID
    grep '' /sys/fs/cgroup/qemu.slice/$VMID.scope/*.pressure
    for _ in {1..5}; do
        grep '' /proc/$PID/ksm*;
        sleep 1
    done
    Please post the output here -- it might give some clue about what's going on.
  • If you do *not* use ZFS, there may be one interesting (but very hacky, so beware!) thing you could try to see if there is any connection to the KSM regressions [2]:
    • 1) Install and boot Ubuntu mainline kernel 6.4.12 on the host, check whether the intermittent freezes are reproducible
    • 2) Install and boot Ubuntu mainline kernel 6.4.13 on the host, check whether they are still reproducible.
    If it turns out they are reproducible on 6.4.12 but not 6.4.13, this would hint at a connection to the KSM regressions, because as @aaron notes [3] the KSM regressions are fixed with Ubuntu mainline kernel >= 6.4.13.

    If you do want to try this: To install a Ubuntu mainline kernel, download the linux-image-unsigned-[...].deb and linux-modules-[...].deb from https://kernel.ubuntu.com/mainline/ (for 6.4.12 see [4], for 6.4.13 [5]), and install them with one apt command, i.e., apt install ./linux-image-unsigned-[...].deb ./linux-modules-[...].deb. Note that running PVE with a Ubuntu mainline kernel 6.4 is definitely not a supported setup, but this experiment could potentially be helpful in debugging this issue. :)

[1] https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/
[2] https://forum.proxmox.com/threads/ksm-memory-sharing-not-working-as-expected-on-6-2-x-kernel.131082/
[3] https://forum.proxmox.com/threads/k...ted-on-6-2-x-kernel.131082/page-3#post-595600
[4] https://kernel.ubuntu.com/mainline/v6.4.12/amd64/
[5] https://kernel.ubuntu.com/mainline/v6.4.13/amd64/
 
Last edited:
  • On kernel 6.2, disabling KSM and mitigations fixes the issue. Obviously, disabling mitigations is not advisable in the general case.
  • The issue primarily affects Windows guests, more specifically Win2019
  • The intermittent freezes seem unrelated to the (permanent) 100% CPU freeze issues that was discussed over at [1], as those are fixed in kernels >=6.2.16-12, but e.g. @Whatever @Sebi-S report intermittent freezes here still with 6.2.16-15 and 6.2.16-12.
Please let me know if your observations contradict anything I wrote above, or if I missed anything.
settings mitgation off did not solve the issue, just reduce the occurrence due to more efficient kernel ,

i have around 10 Ubuntu vms that the error occurs repeatedly under load (while running all nodes at 70% cpu capacity ) in less then an 1 hour i had the error at least on one of the nodes

i think the error is related in my case they come exactly in the same time
 
  • Like
Reactions: Sebi-S
Thanks for all the reports and discussions. We have tried to reproduce the intermittent freezes (CPU spikes / lost pings) reported in this thread in our test environment, but have not succeeded so far.

Hence, the root cause of the intermittent freezes is unfortunately still unclear. Let me try to summarize the reports from this thread:
  • All freezes were reported on host kernels 6.2, no freezes reported on kernel 5.15
  • There are no reports whether kernels 5.19 or 6.1 are affected or not
  • On kernel 6.2, disabling KSM and mitigations fixes the issue. Obviously, disabling mitigations is not advisable in the general case.
  • Most reported freezes mention a dual-socket CPU
  • The issue primarily affects Windows guests, more specifically Win2019
  • Freezes become more likely with higher amounts (> 128G) of configured guest memory
  • The intermittent freezes seem unrelated to the (permanent) 100% CPU freeze issues that was discussed over at [1], as those are fixed in kernels >=6.2.16-12, but e.g. @Whatever @Sebi-S report intermittent freezes here still with 6.2.16-15 and 6.2.16-12.

Two small notes here:
1) cpu spikes and ICMP lost appear when dozen number of RDS 2019 users log in (5-10 and more)
If I just boot up fresh Windows Server 2019 without RDS and work load neither CPU spikes nor ICMP reply lost observed

2) I used to install intel-microcodes on all my nodes
 
Last edited:
  • Like
Reactions: fweber
To everyone who can easily reproduce the freezes on a test machine:
  • Could you check whether there is anything in the (host) journal during the freezes?
  • Please fill in $YOUR_VMID in the following script, save it and run it during an intermittent freeze.
    Code:
    #/bin/bash
    VMID=$YOUR_VMID
    PID=$(cat /var/run/qemu-server/$VMID.pid)
    timeout 5 strace -c -p $PID
    grep '' /sys/fs/cgroup/qemu.slice/$VMID.scope/*.pressure
    for _ in {1..5}; do
        grep '' /proc/$PID/ksm*;
        sleep 1
    done
    Please post the output here -- it might give some clue about what's going on.
  • If you do *not* use ZFS, there may be one interesting (but very hacky, so beware!) thing you could try to see if there is any connection to the KSM regressions [2]:
    • 1) Install and boot Ubuntu mainline kernel 6.4.12 on the host, check whether the intermittent freezes are reproducible
    • 2) Install and boot Ubuntu mainline kernel 6.4.13 on the host, check whether they are still reproducible.
    If it turns out they are reproducible on 6.4.12 but not 6.4.13, this would hint at a connection to the KSM regressions, because as @aaron notes [3] the KSM regressions are fixed with Ubuntu mainline kernel >= 6.4.13.
Here we go...
root@pve-node-03486:~# uname -a
Linux pve-node-03486 6.2.16-16-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-16 (2023-10-03T05:42Z) x86_64 GNU/Linux

mitigations=off, KSM disabled (booted without KSM at all)
Code:
root@pve-node-03486:~# ./ctrace.sh  901
strace: Process 5349 attached
strace: Process 5349 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.92   67.257657      117995       570           ppoll
  0.04    0.028500          13      2104           write
  0.02    0.014489          28       515           recvmsg
  0.02    0.012439          22       548           read
  0.00    0.000347         173         2           close
  0.00    0.000133          66         2           accept4
  0.00    0.000044           7         6           ioctl
  0.00    0.000044           4        10           sendmsg
  0.00    0.000003           0         4           fcntl
  0.00    0.000002           1         2           getsockname
  0.00    0.000002           1         2           futex
------ ----------- ----------- --------- --------- ----------------
100.00   67.313660       17878      3765           total
/sys/fs/cgroup/qemu.slice/901.scope/cgroup.pressure:1
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:some avg10=0.04 avg60=0.02 avg300=0.00 total=1169038941
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=1164140765
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=117116
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=116584
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=2
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=2
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 18930050
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 18930050
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 18930050
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 18930050
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 18930050


C:\Users\andrey.matveev>ping 172.16.9.242 -t

Обмен пакетами с 172.16.9.242 по с 32 байтами данных:
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124

Start KSM and wait 30+ minutes
Code:
root@pve-node-03486:~# cat /sys/kernel/mm/ksm/pages_shared
3160
root@pve-node-03486:~# ./ctrace.sh 901
strace: Process 5349 attached
strace: Process 5349 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.69   23.099828       40597       569           ppoll
  0.77    0.179780         349       515           recvmsg
  0.49    0.115766          55      2104           write
  0.05    0.010572          19       548           read
  0.00    0.000105          17         6           ioctl
  0.00    0.000073          36         2           accept4
  0.00    0.000068           6        10           sendmsg
  0.00    0.000045          11         4           fcntl
  0.00    0.000044          22         2           getsockname
  0.00    0.000004           2         2           close
  0.00    0.000002           1         2           futex
------ ----------- ----------- --------- --------- ----------------
100.00   23.406287        6218      3764           total
/sys/fs/cgroup/qemu.slice/901.scope/cgroup.pressure:1
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=1169990572
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=1164902601
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=117153
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=116621
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=2
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=2
/proc/5349/ksm_merging_pages:192889
/proc/5349/ksm_stat:ksm_rmap_items 16907873
/proc/5349/ksm_merging_pages:193801
/proc/5349/ksm_stat:ksm_rmap_items 16922772
/proc/5349/ksm_merging_pages:193801
/proc/5349/ksm_stat:ksm_rmap_items 16922772
/proc/5349/ksm_merging_pages:193801
/proc/5349/ksm_stat:ksm_rmap_items 16922772
/proc/5349/ksm_merging_pages:194027
/proc/5349/ksm_stat:ksm_rmap_items 16929741

C:\Users\andrey.matveev>ping 172.16.9.242 -t

Обмен пакетами с 172.16.9.242 по с 32 байтами данных:
Ответ от 172.16.9.242: число байт=32 время=3мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=396мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=61мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=844мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124

After disabling KSM and unmerge all pages
Code:
root@pve-node-03486:~# service ksmtuned stop
root@pve-node-03486:~#  echo 2 > /sys/kernel/mm/ksm/run
root@pve-node-03486:~# cat /sys/kernel/mm/ksm/pages_shared
0

CPU spikes and ICMP echo reply increase still exist. Only node reboot (with disabled KSM) helps. Echo reply time increase and CPU spikes correlate a lot. I guess ICMP echo reply time increase is a side effect of CPU spikes

P.S nothing in dmesg and syslog
 
Last edited:
Thank you for the data!

Start KSM and wait 30+ minutes
Code:
...
/proc/5349/ksm_merging_pages:192889
/proc/5349/ksm_stat:ksm_rmap_items 16907873
/proc/5349/ksm_merging_pages:193801
/proc/5349/ksm_stat:ksm_rmap_items 16922772
/proc/5349/ksm_merging_pages:193801
/proc/5349/ksm_stat:ksm_rmap_items 16922772
/proc/5349/ksm_merging_pages:193801
/proc/5349/ksm_stat:ksm_rmap_items 16922772
/proc/5349/ksm_merging_pages:194027
/proc/5349/ksm_stat:ksm_rmap_items 16929741
The value of ksm_merging_pages does slightly increase during the 5-second window -- so it seems like KSM is indeed active for that particular VM.

A few follow-up questions:
mitigations=off, KSM disabled (booted without KSM at all)
Code:
root@pve-node-03486:~# ./ctrace.sh  901
strace: Process 5349 attached
strace: Process 5349 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.92   67.257657      117995       570           ppoll
...
...
Start KSM and wait 30+ minutes
Code:
...
root@pve-node-03486:~# ./ctrace.sh 901
strace: Process 5349 attached
strace: Process 5349 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.69   23.099828       40597       569           ppoll
...
...
Can you double-check that this is indeed the output of timeout 5 strace -c -p $PID? If yes, there is quite some time spent in the ppoll syscalls. Can you please post the VM config of VM 901 and the output of lscpu on the host?

CPU spikes and ICMP echo reply increase still exist. Only node reboot (with disabled KSM) helps.
This is quite unexpected -- if there was a causal connection to KSM, I would expect that freezes stop happening after /sys/kernel/mm/ksm/pages_shared has returned to 0.
 
Thank you for the data!


The value of ksm_merging_pages does slightly increase during the 5-second window -- so it seems like KSM is indeed active for that particular VM.

A few follow-up questions:


Can you double-check that this is indeed the output of timeout 5 strace -c -p $PID? If yes, there is quite some time spent in the ppoll syscalls. Can you please post the VM config of VM 901 and the output of lscpu on the host?


This is quite unexpected -- if there was a causal connection to KSM, I would expect that freezes stop happening after /sys/kernel/mm/ksm/pages_shared has returned to 0.

Yes, this is correct output of the command you provided earlier
Code:
root@pve-node-03486:~# cat ./ctrace.sh
#/bin/bash
VMID=$1
PID=$(cat /var/run/qemu-server/$VMID.pid)
timeout 5 strace -c -p $PID
grep '' /sys/fs/cgroup/qemu.slice/$VMID.scope/*.pressure
for _ in {1..5}; do
    grep '' /proc/$PID/ksm*;
    sleep 1
done


root@pve-node-03486:~# cat /etc/pve/qemu-server/901.conf
agent: 1
boot: order=scsi0;ide2
cores: 16
cpu: host,flags=+ssbd
ide2: DS-254-NFS-B:iso/virtio-win-0.1.240.iso,media=cdrom,size=612812K
machine: q35
memory: 131072
name: 009-srv-ts1.eliot.local
net0: virtio=DE:00:3D:1D:DA:4B,bridge=vmbr0,firewall=1,tag=90
numa: 1
onboot: 1
ostype: win10
scsi0: DS-254-NFS-A:901/vm-901-disk-0.raw,iothread=1,size=512G,ssd=1
scsi1: DS-254-NFS-A:901/vm-901-disk-1.raw,iothread=1,size=512G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=06c14319-3a36-4e49-be62-7c366bc0052c
sockets: 2
startup: up=60
tags: id9
unused0: Temp-254:901/vm-901-disk-0.raw
vmgenid: e5df7617-483d-4bff-aa29-ff6c72d4a2e2


root@pve-node-03486:~#
root@pve-node-03486:~# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
    BIOS Model name:      Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz        CPU @ 2.7GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               62
    Thread(s) per core:  2
    Core(s) per socket:  12
    Socket(s):           2
    Stepping:            4
    CPU(s) scaling MHz:  86%
    CPU max MHz:         3500.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            5386.95
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscal
                         l nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq d
                         tes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave av
                         x f16c rdrand lahf_lm cpuid_fault epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
                          xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):    
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    6 MiB (24 instances)
  L3:                    60 MiB (2 instances)
NUMA:                   
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-11,24-35
  NUMA node1 CPU(s):     12-23,36-47
Vulnerabilities:        
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Vulnerable
  L1tf:                  Mitigation; PTE Inversion; VMX vulnerable
  Mds:                   Vulnerable; SMT vulnerable
  Meltdown:              Vulnerable
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
  Spectre v2:            Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

P.S. I've fixed formatting in my previous message
 
Last edited:
Hi,
could you also install debugger and debug symbols with apt install pve-qemu-kvm-dbgsym gdb and while the VM is stuck, run the following script:
Code:
root@pve8a1 ~ # cat stacks-stats.sh
Code:
#/bin/bash
VMID=$1
if [ -z $VMID ]
then
    echo "usage: $0 <VMID>" > /dev/stderr
    exit 1
fi
PID=$(cat /var/run/qemu-server/$VMID.pid)
echo "=== BEGIN KERNEL STACK ==="
grep '' /proc/$PID/task/*/stack
echo "=== END KERNEL STACK ==="
echo "=== BEGIN USER STACK ==="
gdb --batch --ex 't a a bt' -p $PID
echo "=== END USER STACK ==="

echo "=== BEGIN vCPU stats I ==="
PERLPROG='
use strict;
use warnings;

use JSON;
use PVE::QemuServer::Monitor qw(mon_cmd);

my $vmid = shift or die "need to specify vmid\n";

my $res = eval { mon_cmd($vmid, "query-stats", target => "vcpu"); };
warn $@ if $@;
print to_json($res, { pretty => 1, canonical => 1 });
'
perl -e "$PERLPROG" $VMID
echo "=== END vCPU stats I ==="
sleep 5
echo "=== BEGIN vCPU stats II ==="
perl -e "$PERLPROG" $VMID
echo "=== END vCPU stats II ==="
The output is rather long, so best to redirect it to a file (replacing 123 with the actual ID of the VM):
Code:
bash stacks-stats.sh 123 > /tmp/stacks-stats-123.log
 
  • Like
Reactions: mariol
Host reboot required?
No. The debug symbols will even be valid for already running QEMU instances (assuming they were started with the same version).
 
Hi,
could you also install debugger and debug symbols with apt install pve-qemu-kvm-dbgsym gdb and while the VM is stuck, run the following script:

Here we go

mitigations=off, KSM enabled but not active so far
Code:
root@pve-node-03486:~# ./strace.sh 901
strace: Process 5349 attached
strace: Process 5349 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.81    0.978017        1724       567           ppoll
  1.50    0.014967           7      2104           write
  0.40    0.004010           7       548           read
  0.23    0.002290           4       515           recvmsg
  0.06    0.000612          61        10           sendmsg
  0.00    0.000016           2         6           ioctl
  0.00    0.000003           1         2           accept4
  0.00    0.000002           1         2           futex
  0.00    0.000001           0         2           getsockname
  0.00    0.000000           0         2           close
  0.00    0.000000           0         4           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00    0.999918         265      3762           total
/sys/fs/cgroup/qemu.slice/901.scope/cgroup.pressure:1
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=1465836653
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=1459795161
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=137828
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=137260
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=2
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=2
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 6310412
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 6382412
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 6453212
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 6525212
/proc/5349/ksm_merging_pages:0
/proc/5349/ksm_stat:ksm_rmap_items 6597212

root@pve-node-03486:~# cat /sys/kernel/mm/ksm/pages_shared
0

C:\Users\andrey.matveev>ping 172.16.9.242 -t

Обмен пакетами с 172.16.9.242 по с 32 байтами данных:
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124

30+ minutes latter
Code:
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=43мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=520мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=127мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=8мс TTL=124

root@pve-node-03486:~# ./strace.sh 901
strace: Process 5349 attached
strace: Process 5349 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 91.37   14.761194       26079       566           ppoll
  4.75    0.767996         365      2104           write
  2.78    0.449941         813       553           read
  1.08    0.174901         339       515           recvmsg
  0.01    0.001564         156        10           sendmsg
  0.00    0.000482          80         6           ioctl
  0.00    0.000007           0        12         1 futex
  0.00    0.000006           3         2           close
  0.00    0.000006           3         2           accept4
  0.00    0.000003           0         4           fcntl
  0.00    0.000001           0         2           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00   16.156101        4278      3776         1 total
/sys/fs/cgroup/qemu.slice/901.scope/cgroup.pressure:1
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=1470087002
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=1463974801
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=139267
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=138691
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=2
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=2
/proc/5349/ksm_merging_pages:439958
/proc/5349/ksm_stat:ksm_rmap_items 26423981
/proc/5349/ksm_merging_pages:442011
/proc/5349/ksm_stat:ksm_rmap_items 26323243
/proc/5349/ksm_merging_pages:444215
/proc/5349/ksm_stat:ksm_rmap_items 26211691
/proc/5349/ksm_merging_pages:446575
/proc/5349/ksm_stat:ksm_rmap_items 26100859
/proc/5349/ksm_merging_pages:449311
/proc/5349/ksm_stat:ksm_rmap_items 25979428

root@pve-node-03486:~# ./stacks-stats.sh 901 > ./stacks-stats-901.log 
42      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

stacks-stats-901.log  attached
 

Attachments

12 hours latter

Code:
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=672мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1035мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=730мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124


root@pve-node-03486:~# cat /sys/kernel/mm/ksm/pages_shared
369381

root@pve-node-03486:~# ./strace.sh 901
strace: Process 5349 attached
strace: Process 5349 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.18   27.108383       66934       405           ppoll
  2.93    0.835847         566      1476           write
  1.21    0.345998         884       391           read
  0.35    0.099728         275       362           recvmsg
  0.31    0.088299       14716         6           sendmsg
  0.00    0.000741          82         9           futex
  0.00    0.000706         706         1           accept4
  0.00    0.000582          64         9           ioctl
  0.00    0.000001           1         1           getsockname
  0.00    0.000000           0         2           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00   28.480285       10698      2662           total
/sys/fs/cgroup/qemu.slice/901.scope/cgroup.pressure:1
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=1546328573
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=1538759343
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=141361
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=140781
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=2
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=2
/proc/5349/ksm_merging_pages:4442572
/proc/5349/ksm_stat:ksm_rmap_items 12989569
/proc/5349/ksm_merging_pages:4442565
/proc/5349/ksm_stat:ksm_rmap_items 12989569
/proc/5349/ksm_merging_pages:4442532
/proc/5349/ksm_stat:ksm_rmap_items 12989569
/proc/5349/ksm_merging_pages:4442528
/proc/5349/ksm_stat:ksm_rmap_items 12989569
/proc/5349/ksm_merging_pages:4442523
/proc/5349/ksm_stat:ksm_rmap_items 12989569
 

Attachments

@Whatever - out of curiosity, what make and model of server are you running? We have Lenovo RD540.

Met on different clusters built on different server vendors: HP gen8/9, Dell R730xd, Supermicro

Common is only one thing: intel-microcodes installed on every host)
 
Last edited:
Hi
I've been following this issue for the last few days. However, we do not use pve.

We have kvm based virtualization controlling by MAAS. Our KVM machine and virtual machines are ubuntu-22.04.03 and we have 5.19.x kernel and 6.2.x kernel. So we have the same problem. We haven't tried any workaround fixes yet.

I'm adding it here for informational purposes.
 
Last edited:
Hi
I've been following this issue for the last few days. However, we do not use pve.

We have kvm based virtualization controlling by MAAS. Our KVM machine and virtual machines are ubuntu-22.04.03 and we have 5.19.x kernel and 6.2.x kernel. So we have the same problem. We haven't tried any workaround fixes yet.

I'm adding it here for informational purposes.

Our largest cluster is built on PVE 7.x with 5.15 kernel. It works smoothly and well. I just updated one of its nodes to PVE8 and kernel 6.2 just to check - got the same issue with KSM and CPU spikes (which slightly freezed VM - I can see it as ICMP echo reply time increase). All my debug data from that node

Code:
root@pve-node-03486:~# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
    BIOS Model name:      Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz        CPU @ 2.7GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               62
    Thread(s) per core:  2
    Core(s) per socket:  12
    Socket(s):           2
    Stepping:            4
    CPU(s) scaling MHz:  86%
    CPU max MHz:         3500.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            5386.95
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscal
                         l nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq d
                         tes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave av
                         x f16c rdrand lahf_lm cpuid_fault epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
                          xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all): 
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    6 MiB (24 instances)
  L3:                    60 MiB (2 instances)
NUMA:                
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-11,24-35
  NUMA node1 CPU(s):     12-23,36-47
Vulnerabilities:     
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Vulnerable
  L1tf:                  Mitigation; PTE Inversion; VMX vulnerable
  Mds:                   Vulnerable; SMT vulnerable
  Meltdown:              Vulnerable
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
  Spectre v2:            Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

root@pve-node-03486:~# dmidecode -t 1
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Handle 0x0100, DMI type 1, 27 bytes
System Information
        Manufacturer: HP
        Product Name: ProLiant DL360p Gen8
        Version: Not Specified
        Serial Number: MXQ33301L2  
        UUID: 39363436-3230-584d-5133-333330314c32
        Wake-up Type: Power Switch
        SKU Number: 646902-001  
        Family: ProLiant


root@pve-node-03486:~# dmidecode -t 17
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Handle 0x1100, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: None
        Locator: PROC  1 DIMM  1
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V

Handle 0x1101, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 1
        Locator: PROC  1 DIMM  2
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1102, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 2
        Locator: PROC  1 DIMM  3
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1103, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: 3
        Locator: PROC  1 DIMM  4
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V

Handle 0x1104, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 4
        Locator: PROC  1 DIMM  5
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1105, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 5
        Locator: PROC  1 DIMM  6
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1106, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 6
        Locator: PROC  1 DIMM  7
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1107, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 7
        Locator: PROC  1 DIMM  8
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1108, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: 8
        Locator: PROC  1 DIMM  9
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V

Handle 0x1109, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 9
        Locator: PROC  1 DIMM 10
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x110A, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 10
        Locator: PROC  1 DIMM 11
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x110B, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: 11
        Locator: PROC  1 DIMM 12
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V

Handle 0x110C, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: 12
        Locator: PROC  2 DIMM  1
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V

Handle 0x110D, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 13
        Locator: PROC  2 DIMM  2
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x110E, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 14
        Locator: PROC  2 DIMM  3
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x110F, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: 15
        Locator: PROC  2 DIMM  4
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V

Handle 0x1110, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 16
        Locator: PROC  2 DIMM  5
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1111, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 17
        Locator: PROC  2 DIMM  6
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1112, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 18
        Locator: PROC  2 DIMM  7
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1113, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 19
        Locator: PROC  2 DIMM  8
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1114, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: 20
        Locator: PROC  2 DIMM  9
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V

Handle 0x1115, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 21
        Locator: PROC  2 DIMM 10
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1116, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: 22
        Locator: PROC  2 DIMM 11
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous

Handle 0x1117, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x1001
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: 23
        Locator: PROC  2 DIMM 12
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous LRDIMM
        Speed: 1866 MT/s
        Manufacturer: HP 
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: 712384-081      
        Rank: 4
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: 1.5 V
        Maximum Voltage: 1.5 V
        Configured Voltage: 1.5 V
 
Last edited:
Interesting thing is that the problem occurs more often when >24 vCPU and more than 96Gb of RAM added to VM

VMs with small amount of vCPUs and memory (like 4 vCPUs + 16Gb vRAM) are not affected
 
Last edited:
  • Like
Reactions: _gabriel
@fweber @fiona
To get the data you requested i had to negotiate with my users to suffer the freezes and downgrading performance.
So any feedbacks would be very appreciated!
 
Last edited:
  • Like
Reactions: entilza
@Whatever, thanks a lot for gathering and posting the data!

I managed to set up a (nested) PVE instance with a Windows 2019 VM where I can also see outliers of >1000ms in the ping response time for the Windows VM. However, I am not completely convinced it is the same issue as reported here (see below).

My setup:
  • Nested PVE 8 VM with kernel 6.2.16-15 with 2 sockets x 5 cores and NUMA enabled
  • Start Windows 2019 VM with 1 socket x 4 cores in the nested PVE instance, ping its IP continuously
  • Disable ksmtuned, manually set pages_to_scan to 1250 (the maximum value set by ksmtuned [1]) and enabled KSM:
    Code:
    cd /sys/kernel/mm/ksm; echo 1250 > pages_to_scan ; echo 1 > run
  • Wait a few minutes until KSM starts merging pages (/sys/kernel/mm/ksm/pages_shared starts increasing)
  • Then, open a RDP session to the Windows 2019 and click around a bit
  • Note that the ping response times may occasionally jump to >1000ms (or even more, e.g. 3000ms) for a few seconds. However, this does not always happen (though I see this in ~70% of times I try this)
I can reproduce the ping response time spikes also on Ubuntu mainline kernel 6.4.13, which might indicate that it is not directly related to the KSM performance regression on dual-socket machines discussed at [2], as that one is apparently fixed in kernel 6.4.13.

More weirdly, I can also reproduce similar ping response time spikes on kernel 5.15.116 too, though the spikes are smaller -- for me, the response times jump to ~100-400ms for a few seconds. As kernel 5.15 was reported to be completely unaffected in this thread, I am not completely convinced I'm seeing the same issue as reported in this thread. But it is also possible that the response times of ~100-400ms are less noticeable than the >1000ms spikes on the 6.x kernels.

Similarly, the spikes on kernel 6.2.16-15 are much smaller (~600ms maximum) if I change the nested PVE instance to a single-socket machine with 10 cores. So maybe the multi-socket setup has something to do with it -- I'm wondering if anyone has seen this issue on single-socket PVE host machines?

Interesting thing is that the problem occurs more often when >24 vCPU and more than 96Gb of RAM added to VM

VMs with small amount of vCPUs and memory (like 4 vCPUs + 16Gb vRAM) are not affected
It does seem quite likely that the intermittent freezes are somehow related to KSM. In that case, it seems plausible that their frequency increases with the amount of configured RAM for the VM. Even more so because Windows VMs tend to reserve the whole configured amount of RAM even if it's not yet used in the guest -- I'm not exactly sure if Windows zeroes out all pages initially, but if it does, KSM can do a lot of work merging all those zero pages.

Hi
I've been following this issue for the last few days. However, we do not use pve.

We have kvm based virtualization controlling by MAAS. Our KVM machine and virtual machines are ubuntu-22.04.03 and we have 5.19.x kernel and 6.2.x kernel. So we have the same problem. We haven't tried any workaround fixes yet.

I'm adding it here for informational purposes.
Thanks. However, are you certain you are seeing the same issue as reported here? In particular, do the VMs "freeze" only intermittently (for a few seconds) but continue to run afterwards? If they actually freeze permanently, you might be seeing a different issue [3] that is fixed in the current PVE kernels [4].

[1] https://github.com/ksmtuned/ksmtuned/blob/master/src/ksmtuned.sh.in#L38
[2] https://forum.proxmox.com/threads/131082/post-595600
[3] https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/
[4] https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/post-587633
 
Thanks. However, are you certain you are seeing the same issue as reported here? In particular, do the VMs "freeze" only intermittently (for a few seconds) but continue to run afterwards? If they actually freeze permanently, you might be seeing a different issue [3] that is fixed in the current PVE kernels [4].
You are right. Our vms are completely freeze until we are reboot.
 
  • Like
Reactions: chr00t

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!