[SOLVED] Proxmox 8.0 / Kernel 6.2.x 100%CPU issue with Windows Server 2019 VMs

So far so good. It seems that the problem is gone. With a windows server 2022 vm and around 60 rdp users no freezes so far. With 6.5 kernel after 2 or 3 hours i had to disable NUMA.

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
BIOS Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz CPU @ 2.2GHz
BIOS CPU family: 179
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
Stepping: 1
CPU(s) scaling MHz: 92%
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 4394.92
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx p
dpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monito
r ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f
16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid
ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_ll
c cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 1.4 MiB (44 instances)
L1i: 1.4 MiB (44 instances)
L2: 11 MiB (44 instances)
L3: 110 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-21,44-65
NUMA node1 CPU(s): 22-43,66-87
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Vulnerable
L1tf: Mitigation; PTE Inversion; VMX vulnerable
Mds: Vulnerable; SMT vulnerable
Meltdown: Vulnerable
Mmio stale data: Vulnerable
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Vulnerable
Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Srbds: Not affected
Tsx async abort: Vulnerable


uname -a
Linux pve 6.8.1-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.1-1 (2024-04-02T16:19Z) x86_64 GNU/Linux


grep "" /proc/sys/kernel/numa_* /sys/kernel/debug/sched/preempt /sys/kernel/mm/ksm/*~
/proc/sys/kernel/numa_balancing:1
/proc/sys/kernel/numa_balancing_promote_rate_limit_MBps:65536
/sys/kernel/debug/sched/preempt:none (voluntary) full
grep: /sys/kernel/mm/ksm/*~: No such file or directory

PS: the vm has 80GB memory - 22 cores (numa 2x11) and i have ksm and mitigations disabled.
 
Last edited:
@trey.b @Whatever @thiagotgc @jens-maus @JL17 @emunt6
https://forum.proxmox.com/threads/o...ve-8-available-on-test-no-subscription.144557

Can any of you guys check if Kernel 6.8 helps? Its available and @t.lamprecht was a lot faster then i expected :)
Thank you a lot @t.lamprecht !!!

I have no issues here, so im the last guy to ask if 6.8 improves anything, i get my 2 genoa CPU's 9274f on 15th April, so i can only test after mid April, not before sadly...

Cheers
I did all available updates for pve 8.1.3 automatically, and I'm now on pve 8.1.10 on pve-kernel 6.5.135.

Has anyone tested it and noticed whether it resolved the congement problems?

I'm scared to update toara 6.8.
 
I did all available updates for pve 8.1.3 automatically, and I'm now on pve 8.1.10 on pve-kernel 6.5.135.

Has anyone tested it and noticed whether it resolved the congement problems?

I'm scared to update toara 6.8.
it should be resolved on 6.8 only.
 
  • Like
Reactions: zeuxprox and fweber
If numa_balancing being disabled doesn't seem to impact anything, does that mean we're experiencing a different problem? I'm new to Proxmox and started with 8.1, updated on the no-subscription repo to the latest.

I have exactly 1 VM which is Windows Server 2022, while the VM is idle at 2-8% CPU, the KVM process is at 40-70% of a core. Specifically the 'KVM" main thread.

2605 root 20 0 17.9g 16.1g 5824 S 68.3 12.8 103:47.93 kvm
2680 root 20 0 17.9g 16.1g 5824 S 11.2 12.8 15:07.53 CPU 1/KVM
2681 root 20 0 17.9g 16.1g 5824 S 9.6 12.8 12:12.65 CPU 2/KVM
2679 root 20 0 17.9g 16.1g 5824 S 9.2 12.8 15:45.43 CPU 0/KVM
2683 root 20 0 17.9g 16.1g 5824 S 7.9 12.8 12:02.87 CPU 4/KVM
2685 root 20 0 17.9g 16.1g 5824 S 7.9 12.8 11:33.76 CPU 6/KVM
2684 root 20 0 17.9g 16.1g 5824 S 7.6 12.8 12:15.73 CPU 5/KVM
2682 root 20 0 17.9g 16.1g 5824 S 7.3 12.8 12:10.74 CPU 3/KVM
2686 root 20 0 17.9g 16.1g 5824 S 7.3 12.8 12:03.53 CPU 7/KVM

An strace shows nothing but ppoll() keeping the CPU high.

Linux pve2 6.5.13-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) x86_64 GNU/Linux

I tried the 6.8 kernel, unfortunately it's not going to be an option for a significant about of time as there appear to be some significant ABI changes, and every single dkms module fails to build. Definitely going to be some time before all the vendors are able to do new releases.
 
If numa_balancing being disabled doesn't seem to impact anything, does that mean we're experiencing a different problem? I'm new to Proxmox and started with 8.1, updated on the no-subscription repo to the latest.

I have exactly 1 VM which is Windows Server 2022, while the VM is idle at 2-8% CPU, the KVM process is at 40-70% of a core. Specifically the 'KVM" main thread.

2605 root 20 0 17.9g 16.1g 5824 S 68.3 12.8 103:47.93 kvm
2680 root 20 0 17.9g 16.1g 5824 S 11.2 12.8 15:07.53 CPU 1/KVM
2681 root 20 0 17.9g 16.1g 5824 S 9.6 12.8 12:12.65 CPU 2/KVM
2679 root 20 0 17.9g 16.1g 5824 S 9.2 12.8 15:45.43 CPU 0/KVM
2683 root 20 0 17.9g 16.1g 5824 S 7.9 12.8 12:02.87 CPU 4/KVM
2685 root 20 0 17.9g 16.1g 5824 S 7.9 12.8 11:33.76 CPU 6/KVM
2684 root 20 0 17.9g 16.1g 5824 S 7.6 12.8 12:15.73 CPU 5/KVM
2682 root 20 0 17.9g 16.1g 5824 S 7.3 12.8 12:10.74 CPU 3/KVM
2686 root 20 0 17.9g 16.1g 5824 S 7.3 12.8 12:03.53 CPU 7/KVM

An strace shows nothing but ppoll() keeping the CPU high.

Linux pve2 6.5.13-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) x86_64 GNU/Linux

I tried the 6.8 kernel, unfortunately it's not going to be an option for a significant about of time as there appear to be some significant ABI changes, and every single dkms module fails to build. Definitely going to be some time before all the vendors are able to do new releases.
I updated 15 PVE nodes, and so far everything has gone well!
 
I updated 15 PVE nodes, and so far everything has gone well!
Yeah for anyone who doesn't need any out of tree drivers it'll be fine. In this case neither my DKMS NIC or the nVidia vgpu drivers will build on 6.8 at present. They'll get to it eventually I'm sure, but usually takes some months to update/release.

I think my issue is different though, as I disabled NUMA entirely in the BIOS and I'm still seeing this behavior.

With VirtIO NICs I was able to get full bandwidth downstream in the VM, but the VM was capped at ~1mbit upstream. Changing to E1000 seems to have solved this. I saw someone else mention that in the thread here too, so again thought it was related. Seems like maybe several different issues that overlap.

Weird high CPU persists though. Is this by any chance a common issue with PCIe passthrough?
 
Last edited:
  • Like
Reactions: zeuxprox
Yeah for anyone who doesn't need any out of tree drivers it'll be fine. In this case neither my DKMS NIC or the nVidia vgpu drivers will build on 6.8 at present. They'll get to it eventually I'm sure, but usually takes some months to update/release.

I think my issue is different though, as I disabled NUMA entirely in the BIOS and I'm still seeing this behavior.
look, the issue is simply, that there are fixes, that are not backportable to 6.5, because they are incompatible as far i know/have to much dependencies.

I mean the Proxmox team cannot be responsible if manufacturers doesn't take effort to fix compilations on newer kernels in time.

Some posts ago i wrote some things that you can try, that helps with the issue. Disabling mitigrations for example helps.
Otherwise if you tryed everything and nothing helps, well shit happens :-(
 
  • Like
Reactions: zeuxprox
look, the issue is simply, that there are fixes, that are not backportable to 6.5, because they are incompatible as far i know/have to much dependencies.

I mean the Proxmox team cannot be responsible if manufacturers doesn't take effort to fix compilations on newer kernels in time.

Some posts ago i wrote some things that you can try, that helps with the issue. Disabling mitigrations for example helps.
Otherwise if you tryed everything and nothing helps, well shit happens :-(

Well, if the problem is well identified in terms of fixes forward, then we also know where we need to go back to before the problem was introduced.

Not suggesting it's Proxmox's responsibility to fix vendor drivers. But it's also well known that running latest tends to mean compatibility issues.

So - if what you say is true and the problem and fix are clear with 6.8, then what is the latest version prior to the problem being introduced that the rest of us need to downgrade to?
 
Thanks to everyone who has reported back with their testing results!

Yeah for anyone who doesn't need any out of tree drivers it'll be fine. In this case neither my DKMS NIC or the nVidia vgpu drivers will build on 6.8 at present. They'll get to it eventually I'm sure, but usually takes some months to update/release.

I think my issue is different though, as I disabled NUMA entirely in the BIOS and I'm still seeing this behavior.

With VirtIO NICs I was able to get full bandwidth downstream in the VM, but the VM was capped at ~1mbit upstream. Changing to E1000 seems to have solved this. I saw someone else mention that in the thread here too, so again thought it was related. Seems like maybe several different issues that overlap.

Weird high CPU persists though. Is this by any chance a common issue with PCIe passthrough?
If you have disabled NUMA entirely, it seems likely you are seeing a different issue than the one discussed in this thread. Could you please open a new thread for your issue? Feel free to mention me (@fweber) there so I don't miss it. In that thread, please include the output of pveversion -v, lscpu and qm config VMID --current (replace VMID with the VMID of the affected VM), and please include more information about the symptoms -- besides high CPU utilization by the kvm process, do you have any other issues (does the VM "freeze" occasionally?)
 
Our new hardware keeps getting delayed, but I managed to upgrade all of our PVE8 hosts to latest and I spun up 20 Windows VMs total on Dell R7625 with AMD 9554Px2 which is what would freeze about 3 VMs out of 40 every day during high load with 8C and 32GB of RAM, but not for nearly 2 months with 2C and 16 GB pre-fix.

I figure if it doesn't freeze in 30 days I have 100% confidence it's fixed.
 
  • Like
Reactions: sdettmer
1 week in, so far so good.

Also, this update drastically improves disk IO latency and delay (not really throughput much), I'm assuming because the bug interrupts how ZFS distributes writes across multiple NUMA domains. CPU utilization on the host has decreased a lot, our newer Intel servers with Gen 4 SSDs don't even crack 2% IO delay with all 25 Windows VMs booting up simultaneously.

Can you spot when I upgraded?
1715264848594.png
 
Last edited:
1 week in, so far so good.

Also, this update drastically improves disk IO latency and delay (not really throughput much), I'm assuming because the bug interrupts how ZFS distributes writes across multiple NUMA domains. CPU utilization on the host has decreased a lot, our newer Intel servers with Gen 4 SSDs don't even crack 2% IO delay with all 25 Windows VMs booting up simultaneously.

Can you spot when I upgraded?
View attachment 67860
For Windows Server scenarios, do you use write-back caching or not? Because the PVE Wiki recommends using... However, it has consumed a lot of SWAP...
 
For Windows Server scenarios, do you use write-back caching or not? Because the PVE Wiki recommends using... However, it has consumed a lot of SWAP...
No, here's an example VM setting.
1715266585484.png
We're not using Windows server. I've thought about it, but our products deploy to regular Windows and Windows NT is used by all Windows OSes these days.


I'm skeptical that write back caching would help us because our build VMs are generic for all of R&D, and any single build could have 400 GB of artifacts, tens if not 100s of thousands of files. Commonly installed apps are in the template, so that's covered, I just don't see that many files being cached correctly. That said, I haven't even looked at it. We do use ZFS ARC cache around 256 GB for servers hosting 25 VMs.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!