Proxmox 8.0 / Kernel 6.2.x 100%CPU issue with Windows Server 2019 VMs

Whatever · Oct 27, 2023

@fweber any updates?

p.s. just want to clarify: these ICMP ping response time values are not just values in the shell.
They are RDP session freezes. Yes, very short but quite annoying. It's enough for the end user to pluck the mouse cursor, miss the button (for example) and start complain when it becomes continuous problem. From the RDP user point of view - work is almost imposible

mygeeknc · Nov 1, 2023

I'd like to know if there is an update as well. @fweber

ness1602 · Nov 1, 2023

Did you test the 6.5 kernel?

Whatever · Nov 2, 2023

ness1602 said:
Did you test the 6.5 kernel?

from my perspective it’s much more correct to test it on the devs side (if Im not mistaken - they manage to reproduce the issue) rather than update my clients’ server to the test kernel and be punished afterwards

fweber · Nov 2, 2023

No significant updates from my side, unfortunately. As mentioned in my last message [1], I'm not entirely convinced the behavior I'm seeing is really the issue reported in this thread, because I'm also seeing the intermittent freezes (to a lower degree) in kernel 5.15.

If it is in fact the same issue, the cause might indeed be some change in KSM between the 5.15 and 6.2 kernels. I tried tuning some KSM settings (merge_across_nodes, use_zero_pages, max_page_sharing [2]) to see whether they make a difference, but this wasn't the case.

I'll do some more tests with 5.19 and 6.5 kernels in the next days, in the hope of pinpointing the issue a bit better, and report back here.

[1] https://forum.proxmox.com/threads/p...th-windows-server-2019-vms.130727/post-598711
[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html#ksm-daemon-sysfs-interface

fweber · Nov 3, 2023

I took another look at this. I carried out the steps I described in [1] (starting KSM and an RDP session) with the PVE host running on other PVE kernels and some Ubuntu mainline kernels from [2]:

On PVE kernel 5.15.116-1-pve and Ubuntu mainline kernels 5.16.20 and 5.17.15, I see some ping spikes of 100-600ms, but so far nothing exceeding 1 second.
On Ubuntu mainline kernels 5.18.19, and PVE kernels 5.19.17-2-pve, 6.5.3-1-pve, I see intermittent freezes (= ping spikes and frozen RDP session) of >3 seconds, sometimes even >10 seconds.

I then did some digging with bpftrace [3] in the hope of finding something that might correlate with the intermittent freezes. One interesting thing is that the freezes roughly seemed to match runs of the automatic NUMA balancer [4]. I tested this by monitoring the timestamps of task_numa_work [5] invocations on the PVE host (note that I'm only running one VM on that host, so there is not much activity):

Code:

bpftrace -e 'kprobe:task_numa_work { time("time=%s\n"); }'

... and monitoring ping responses of >100ms (from a different machine) at the same time:

Code:

ping -D VM_IP | egrep -e 'time=[0-9]{3,5}'

When seeing a very long ping response time, I subtracted it from the timestamp reported by ping -D, and every time the result was very close (<=1s) to the timestamp of a task_numa_work invocation on the PVE host.

As this seems to suggest a connection to NUMA balancing, I tried disabling it via:

Code:

echo 0 > /proc/sys/kernel/numa_balancing

And so far, I haven't seen intermittent freezes of >3 seconds. If someone would be able to check on their setup whether they see the same correlation of NUMA balancing and intermittent freezes, and whether disabling NUMA balancing helps, that would be very helpful.

Note that still I'm not entirely convinced I'm seeing the same kind of intermittent freezes as reported in this thread, and note that the steps from [1] are not 100% reliable for producing a freeze for me. So even though I didn't see long freezes on kernels <= 5.17 or after disabling NUMA balancing so far, I'm not entirely convinced (yet) that they don't happen anymore. On the off chance that the kernel version has something to do with the freezes, I took a look at the changes between Ubuntu mainline kernels 5.17.15 [6] and 5.18.19 [7] related to KVM, KSM and NUMA, but nothing stood out so far. I'll take another look next week.

[1] https://forum.proxmox.com/threads/130727/page-6#post-598711
[2] https://kernel.ubuntu.com/mainline/
[3] https://github.com/iovisor/bpftrace/
[4] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing
[5] https://git.kernel.org/pub/scm/linu...f6f76a6a29f36d2f3e4510d0bde5046672f6924#n3192
[6] https://git.launchpad.net/~ubuntu-k...t/mainline-crack/log/?h=cod/mainline/v5.17.15
[7] https://git.launchpad.net/~ubuntu-k...t/mainline-crack/log/?h=cod/mainline/v5.18.19

Whatever · Nov 8, 2023

fweber said:
As this seems to suggest a connection to NUMA balancing, I tried disabling it via:

Code:

echo 0 > /proc/sys/kernel/numa_balancing

And so far, I haven't seen intermittent freezes of >3 seconds. If someone would be able to check on their setup whether they see the same correlation of NUMA balancing and intermittent freezes, and whether disabling NUMA balancing helps, that would be very helpful.

I can confirm that setting

echo 0 > /proc/sys/kernel/numa_balancing

solves ICMP echo reply time increase and RDP freezes almost immediately even with KSM enabled and 6.2 kernel (mitigations still off)

Code:

root@046-pve-04315:~# uname -a
Linux 046-pve-04315 6.2.16-15-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-15 (2023-09-28T13:53Z) x86_64 GNU/Linux

root@046-pve-04315:~# cat /sys/kernel/mm/ksm/pages_shared
114480

root@046-pve-04315:~# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
    BIOS Model name:     Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz  CPU @ 3.0GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  12
    Socket(s):           2
    Stepping:            4
    CPU(s) scaling MHz:  99%
    CPU max MHz:         3700.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            6000.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pd
                         pe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_c
                         pl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdr
                         and lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpr
                         iority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap cl
                         flushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dt
                         herm ida arat pln pts pku ospke md_clear flush_l1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all): 
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    24 MiB (24 instances)
  L3:                    49.5 MiB (2 instances)
NUMA:                
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-11,24-35
  NUMA node1 CPU(s):     12-23,36-47
Vulnerabilities:     
  Gather data sampling:  Vulnerable
  Itlb multihit:         KVM: Vulnerable
  L1tf:                  Mitigation; PTE Inversion; VMX vulnerable
  Mds:                   Vulnerable; SMT vulnerable
  Meltdown:              Vulnerable
  Mmio stale data:       Vulnerable
  Retbleed:              Vulnerable
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
  Spectre v2:            Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
  Srbds:                 Not affected
  Tsx async abort:       Vulnerable

fweber · Nov 8, 2023

Whatever said:
I can confirm that setting

Code:

echo 0 > /proc/sys/kernel/numa_balancing

solves ICMP echo reply time increase and RDP freezes almost immediately even with KSM enabled and 6.2 kernel (mitigations still off)

Thanks a lot for trying this and reporting back! This strongly hints at some unfortunate interaction between KSM and the NUMA balancer that might have appeared somewhere between kernels 5.15 and 6.2 -- or, if my previous experiments are to be trused, 5.17 and 5.18. I'll try to find out more and report back.

Whatever · Nov 9, 2023

@fweber
Could u please clarify is there any relations between kernel/numa_balancing and "Enable NUMA" in VM config?
Thanks in advance

fweber · Nov 14, 2023

Whatever said:
@fweber
Could u please clarify is there any relations between kernel/numa_balancing and "Enable NUMA" in VM config?
Thanks in advance

Personally I'm not very familiar with NUMA systems and the internals of the automatic NUMA balancer, but in my understanding, "Enable NUMA" ( numa: 1) exposes NUMA topology to the guest [1], whereas automatic NUMA balancing on the host tries to automatically optimize (process) data placement on memory nodes, including e.g. memory of the QEMU processes, so guest memory [2]. With numa: 1 and if the guest OS can take advantage of the NUMA topology for memory placement, I would expect that the automatic NUMA balancer on the host would not need to migrate a lot of data (of QEMU processes) between memory nodes.

I'm not sure though if enabling NUMA for the VMs would help with the intermittent freezes -- to judge that, we would need to know what exactly causes the intermittent freezes. I'm still investigating this and will report back with my results here.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_numa
[2] https://doc.opensuse.org/documentation/leap/tuning/html/book-tuning/cha-tuning-numactl.html

mygeeknc · Nov 27, 2023

Out of curiosity, is this still an issue in Promox 8.1? I noticed it was running the 6.5 Linux kernel. Has anyone tried upgrading yet?

fweber · Nov 28, 2023

mygeeknc said:
Out of curiosity, is this still an issue in Promox 8.1? I noticed it was running the 6.5 Linux kernel. Has anyone tried upgrading yet?

I'm still seeing the intermittent freezes (= ping spikes and frozen RDP session) with kernel 6.5.11-5-pve.

I'm also working on a reproducer that doesn't involve Windows VMs -- that should make it easier to isolate the root cause of the freezes.

Emerica92 · Nov 30, 2023

fweber said:
I'm still seeing the intermittent freezes (= ping spikes and frozen RDP session) with kernel 6.5.11-5-pve.

I'm also working on a reproducer that doesn't involve Windows VMs -- that should make it easier to isolate the root cause of the freezes.

Hey,

I've been monitoring some of the forum posts since Saturday when i upgraded my Proxmox VE 6.4 to 8.1

I also had massive lag spikes and Windows 10 VM freeze/slow problems.

After finding the code

Code:

echo 0 > /proc/sys/kernel/numa_balancing

Lag spikes stopped and Windows 10 VM is running smoothly.

On the side note i have KSM enabled and mitigation=on.

Haven't had a chance to test without KSM and mitigation=off tho

Code:

CPU(s) 40 x Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz (2 Sockets)
Kernel Version Linux 6.5.11-4-pve (2023-11-20T10:19Z)
Boot Mode Legacy BIOS
Manager Version pve-manager/8.1.3/

Emerica92 · Nov 30, 2023

Emerica92 said:
Hey,

I've been monitoring some of the forum posts since Saturday when i upgraded my Proxmox VE 6.4 to 8.1

I also had massive lag spikes and Windows 10 VM freeze/slow problems.

After finding the code

Code:

echo 0 > /proc/sys/kernel/numa_balancing

Lag spikes stopped and Windows 10 VM is running smoothly.

On the side note i have KSM enabled and mitigation=on.

Haven't had a chance to test without KSM and mitigation=off tho

Code:

CPU(s) 40 x Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz (2 Sockets) Kernel Version Linux 6.5.11-4-pve (2023-11-20T10:19Z) Boot Mode Legacy BIOS Manager Version pve-manager/8.1.3/

With new kernel, I'm seeing spikes in CPU usages and in Server Load. Been looking at hourly graphics for some time now. Luckily it does not disrupt work and users aren't feeling it.

Whatever · Dec 12, 2023

fweber said:
I'm still seeing the intermittent freezes (= ping spikes and frozen RDP session) with kernel 6.5.11-5-pve.

I'm also working on a reproducer that doesn't involve Windows VMs -- that should make it easier to isolate the root cause of the freezes.

@fweber any news?

fweber · Dec 13, 2023

I didn't have too much time to further look into this issue, but here is my current state:

So far, I've only seen the intermittent freezes of Windows 2019 VMs in a nested PVE setup (see [1]), where the top-level PVE has no actual NUMA hardware, and the nested PVE has only virtual NUMA hardware. In other words, there was no actual NUMA hardware involved. Such a nested setup is quite artificial and somewhat brittle, so I've tried to reproduce the freezes on a PVE host with actual NUMA hardware (with 2 NUMA nodes). But I haven't succeeded so far -- the steps outlined in [1] do not produce freezes there. So far I haven't figured out why, but I'd guess that KSM and/or the NUMA balancer behave in subtly different ways in the two setups, resulting in freezes in one setup but not the other.

One factor that might make a difference is the distribution of the QEMU process memory (= VM memory) across the available NUMA nodes. @Whatever, you mentioned earlier [2] that VMs with >96G memory and >24 vCPUs seem to be most affected by the freezes. I wonder whether 96G exceeds the capacity of a single NUMA node -- this would mean that the process memory needs to be spread across multiple nodes.

Could you (and anyone who has seen the intermittent freezes) post the output of the following command on the PVE host? It shows the NUMA topology. You might have to install the numactl package first

Code:

numactl -H

Also, could you post the NUMA statistics of the QEMU process for some VM that was prone to freezing? No need to re-enable KSM/NUMA balancing for this, this would already be interesting to see for a configuration without actual freezes. Please substitute VMID for a VM that was previously prone to freezes:

Code:

numastat -v $(cat /var/run/qemu-server/VMID.pid)
numastat -v

Please also post the setup you are currently running (which kernel, is KSM enabled, is NUMA balancing enabled)?

[1] https://forum.proxmox.com/threads/p...ows-server-2019-vms.130727/page-6#post-598711
[2] https://forum.proxmox.com/threads/p...ows-server-2019-vms.130727/page-6#post-598223

Whatever · Dec 13, 2023

fweber said:
Also, could you post the NUMA statistics of the QEMU process for some VM that was prone to freezing? No need to re-enable KSM/NUMA balancing for this, this would already be interesting to see for a configuration without actual freezes. Please substitute VMID for a VM that was previously prone to freezes:

Code:

numastat -v $(cat /var/run/qemu-server/VMID.pid) numastat -v

Please also post the setup you are currently running (which kernel, is KSM enabled, is NUMA balancing enabled)?

Here we go:

Node:
CPU(s) 48 x Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (2 Sockets)
Kernel Version Linux 6.5.11-6-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-6 (2023-11-29T08:32Z)
RAM: 8*32=256 DDR3 (distributed equally - 4*32 per socket)
PVE Manager Version pve-manager/8.1.3/b46aac3b42da5d15

10 VMs, total CPU count = 59, total VMs RAM does not exeed 256GB

Code:

root@pve-node-03486:~# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 128864 MB
node 0 free: 42309 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 129012 MB
node 1 free: 37199 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

root@pve-node-03486:~# cat /etc/pve/qemu-server/901.conf
# Windows 2019 RDS
agent: 1
boot: order=scsi0;ide2
sockets: 2
cores: 16
cpu: host,flags=+ssbd
ide2: DS-254-NFS-B:iso/virtio-win-0.1.240.iso,media=cdrom,size=612812K
machine: q35
memory: 131072
name: 009-srv-ts1.eliot.local
net0: virtio=DE:00:3D:1D:DA:4B,bridge=vmbr0,firewall=1,tag=90
numa: 1
onboot: 1
ostype: win10
scsi0: DS-254-NFS-A:901/vm-901-disk-0.raw,iothread=1,size=512G,ssd=1
scsi1: DS-254-NFS-A:901/vm-901-disk-1.raw,iothread=1,size=512G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=06c14319-3a36-4e49-be62-7c366bc0052c
startup: up=60
tags: id9
unused0: Temp-254:901/vm-901-disk-0.raw
vmgenid: e5df7617-483d-4bff-aa29-ff6c72d4a2e2

Whatever · Dec 13, 2023

mitigations=off, numa_balancing=1, KSM active, RDP freezes

Code:

root@pve-node-03486:~# cat  /proc/sys/kernel/numa_balancing
1
root@pve-node-03486:~# ./strace.sh 901
strace: Process 2896 attached
strace: Process 2896 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00   16.015390     2669231         6           ppoll
  0.00    0.000156          78         2           read
  0.00    0.000078          13         6           ioctl
  0.00    0.000003           1         2           futex
------ ----------- ----------- --------- --------- ----------------
100.00   16.015627     1000976        16           total
/sys/fs/cgroup/qemu.slice/901.scope/cgroup.pressure:1
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=3016192915
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=3014037603
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=965208
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=961892
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=0
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=0
/proc/2896/ksm_merging_pages:6985325
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 6985325
/proc/2896/ksm_stat:ksm_process_profit 26464296768
/proc/2896/ksm_merging_pages:7001185
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 7001185
/proc/2896/ksm_stat:ksm_process_profit 26529259328
/proc/2896/ksm_merging_pages:7017178
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 7017178
/proc/2896/ksm_stat:ksm_process_profit 26594766656
/proc/2896/ksm_merging_pages:7033131
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 7033131
/proc/2896/ksm_stat:ksm_process_profit 26660110144
/proc/2896/ksm_merging_pages:7049463
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 7049463
/proc/2896/ksm_stat:ksm_process_profit 26727006016
root@pve-node-03486:~#
root@pve-node-03486:~# numastat -v $(cat /var/run/qemu-server/901.pid)

Per-node process memory usage (in MBs) for PID 2896 (kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         2.54           21.06           23.60
Stack                        0.01            0.94            0.95
Private                  43619.89        87507.04       131126.94
----------------  --------------- --------------- ---------------
Total                    43622.45        87529.04       131151.49
root@pve-node-03486:~#
root@pve-node-03486:~# numastat -v

Per-node numastat info (in MBs):
                          Node 0          Node 1           Total
                 --------------- --------------- ---------------
Numa_Hit             17454749.35      2805875.77     20260625.12
Numa_Miss                   0.00         8734.02         8734.02
Numa_Foreign             8734.02            0.00         8734.02
Interleave_Hit              3.22            3.96            7.18
Local_Node           17454725.11      2805833.40     20260558.50
Other_Node                 24.24         8776.39         8800.63


-----------------------------------

Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=129мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=2мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124

Статистика Ping для 172.16.9.242:
    Пакетов: sent = 870, recv = 870, lost = 0
    (0% потерь)
Приблизительное время приема-передачи в мс:
    min = 1msec, max = 947 msec, average = 6 msec

Whatever · Dec 13, 2023

Same node, same load but numa_balancing disabled

mitigations=off, numa_balancing=0, KSM active, no RDP freezes

Code:

root@pve-node-03486:~# echo 0 > /proc/sys/kernel/numa_balancing
root@pve-node-03486:~# ./strace.sh 901
strace: Process 2896 attached
strace: Process 2896 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.203288         363       560           ppoll
  0.00    0.000004           0         6           ioctl
  0.00    0.000001           0       543           read
  0.00    0.000000           0      2104           write
  0.00    0.000000           0         2           close
  0.00    0.000000           0        10           sendmsg
  0.00    0.000000           0       515           recvmsg
  0.00    0.000000           0         2           getsockname
  0.00    0.000000           0         4           fcntl
  0.00    0.000000           0         9           futex
  0.00    0.000000           0         2           accept4
------ ----------- ----------- --------- --------- ----------------
100.00    0.203293          54      3757           total
/sys/fs/cgroup/qemu.slice/901.scope/cgroup.pressure:1
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=3016955092
/sys/fs/cgroup/qemu.slice/901.scope/cpu.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=3014775485
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=965752
/sys/fs/cgroup/qemu.slice/901.scope/io.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=962435
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:some avg10=0.00 avg60=0.00 avg300=0.00 total=0
/sys/fs/cgroup/qemu.slice/901.scope/memory.pressure:full avg10=0.00 avg60=0.00 avg300=0.00 total=0
/proc/2896/ksm_merging_pages:10784446
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 10784446
/proc/2896/ksm_stat:ksm_process_profit 42025496384
/proc/2896/ksm_merging_pages:10795308
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 10795309
/proc/2896/ksm_stat:ksm_process_profit 42069991232
/proc/2896/ksm_merging_pages:10806605
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 10806605
/proc/2896/ksm_stat:ksm_process_profit 42116259648
/proc/2896/ksm_merging_pages:10817534
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 10817534
/proc/2896/ksm_stat:ksm_process_profit 42161024832
/proc/2896/ksm_merging_pages:10828739
/proc/2896/ksm_stat:ksm_rmap_items 33556163
/proc/2896/ksm_stat:ksm_merging_pages 10828739
/proc/2896/ksm_stat:ksm_process_profit 42206920512
root@pve-node-03486:~# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 128864 MB
node 0 free: 35550 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 129012 MB
node 1 free: 27910 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
root@pve-node-03486:~# numastat -v $(cat /var/run/qemu-server/901.pid)

Per-node process memory usage (in MBs) for PID 2896 (kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         2.55           21.05           23.60
Stack                        0.01            0.94            0.95
Private                  43460.95        87665.99       131126.94
----------------  --------------- --------------- ---------------
Total                    43463.50        87687.99       131151.49
root@pve-node-03486:~# numastat -v

Per-node numastat info (in MBs):
                          Node 0          Node 1           Total
                 --------------- --------------- ---------------
Numa_Hit             17460368.10      2807215.96     20267584.06
Numa_Miss                   0.00         8734.02         8734.02
Numa_Foreign             8734.02            0.00         8734.02
Interleave_Hit              3.22            3.96            7.18
Local_Node           17460343.79      2807173.16     20267516.95
Other_Node                 24.30         8776.82         8801.13

-----------------------

Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124

Статистика Ping для 172.16.9.242:
    Пакетов: sent = 184, recv = 184, lost = 0
    (0% потерь)
Приблизительное время приема-передачи в мс:
    max = 1msec, max = 7 msec, average = 1 msec
Control-C

Whatever · Dec 13, 2023

ness1602 said:
Did you test the 6.5 kernel?

Unfortunately, there is no difference with 6.5 kernel. Just tested
Still stay with mitigations=off and numa_balancing disabled

P.S. will try to rerun the latest tests with mitigations enabled (but it's gonna be very painful if it gets worse)

Proxmox 8.0 / Kernel 6.2.x 100%CPU issue with Windows Server 2019 VMs

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Proxmox Staff Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

New Member

New Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member