I am putting together a new server today and will install 8.1 and see if I can test out this new kernel.
@Whatever can try it since he also very early reported on these kind of 100%CPU freezes and suggestes solutions like disabling mitigations and kvm which in fact worked for me and our 8.1.4 PVE production cluster works flawlessly since then.
proxmox-kernel-6.5.13-1-pve
with the scheduler patch [1] did not seem to fix the freezes reported in this thread, we decided to revert [2] the scheduler patch in proxmox-kernel-6.5.13-2-pve
, which is now available in the pvetest
repositories. So, for the time being, PVE kernel 6.5 does not include a fix for the freezes reported in this thread -- disabling the NUMA balancer still seems like the most viable workaround.Hi all, since kernelproxmox-kernel-6.5.13-1-pve
with the scheduler patch [1] did not seem to fix the freezes reported in this thread, we decided to revert [2] the scheduler patch inproxmox-kernel-6.5.13-2-pve
, which is now available in thepvetest
repositories. So, for the time being, PVE kernel 6.5 does not include a fix for the freezes reported in this thread -- disabling the NUMA balancer still seems like the most viable workaround.
Fortunately, the "proper" KVM fix that was proposed upstream made it into mainline kernel release 6.8 [3] [4]. Hence, the fix will be part of the PVE kernel 6.8, which will become available in the next weeks to months. I'll let you know in this thread when this kernel is available for testing.
[1] https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=29cb6fcbb78e0d2b0b585783031402cc8d4ca148
[2] https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=46bc78011a4d369a8ea17ea25418af7efcb9ca68
[3] https://git.kernel.org/pub/scm/linu.../?id=d02c357e5bfa7dfd618b7b3015624beb71f58f1f
[4] https://lore.kernel.org/lkml/CAHk-=wiehc0DfPtL6fC2=bFuyzkTnuiuYSQrr6JTQxQao6pq1Q@mail.gmail.com/#t
Oh, I wasn't suggesting they work late to do anything.An 6.8 Kernel right now, is not as easy, or maybe even not possible right now.
ZFS isn't even ready for the 6.8 Kernel at the moment. The 6.8 Kernel is just realeased some days ago.
Don't expect that Lamprecht & Others put hours of work to fix Compilation, which can be extremely Risky even if they work, into a testing Kernel right now, while ZFS at least will anyway support the 6.8 Kernel in Probably ~2 Weeks.
Thats just a useless work, or at least not really worth it.
Especially because there are partially working Workarounds around the issue, which allows bridging the time gap.
And tbh, im talking only about ZFS, they have to collect/migrate their own Proxmox Kernel Patches to 6.8 either, which i believe they are doing right now and probably some other hurdles that breaks compiling right now.
So there is no other solution, to wait at least around 2 weeks till zfs supports 6.8 fully with 2.2.4, after that we can start penetrate Lamprecht xD
Cheers
This sounds like a very complex issue. Do I understand correctly thatI have a couple of AMD Zen 4 Dell R7625 servers with dual 9554P CPUs (128C physical total, base 3.1 GHz) that I reported in a ticket and was referenced the thread mentioned here.
In my case, I would experience freezes on Windows VMs to a non-trivial degree. The servers have capacity for a lot of high performance VMs, and at one point we had 45 Windows VMs with 8C each. We'd see up to 3 freezes per day, but sometimes only 1-2 a week.
I spent a couple months chasing it, and there definitely seemed to be a correlation with disk IO load. I tweaked a bunch of bandwidth caps, I think 150 MB/s was mostly solid, but I would still see a few freezes per host per week, and the absurd caps would make a 3 hour large Packer qemu-img rebase/merge job take over a day on Linux. I ended up "resolving" it by moving all Windows VMs to Intel hosts which never hit the issue, although the latest Intel chip we have is a 2019 core design. I have 40-45 Linux VMs (Ubuntu 20.04 LTS) constantly stressing the system and have had them for 3 weeks without issue.
By "disabling NUMA", do you mean "disabling the NUMA balancer"? And if so, do I understand correctly disabling the NUMA balancer noticeably degrades VM performance? if so, that would be an interesting data point.I did disable the NUMA load balancer for a week, and torture tested things, and it didn't freeze, but I've had periods of 1-2 weeks without freezes regardless, so I'm not 100% confident this is the same bug. Disabling NUMA for us isn't a solution because while it doesn't freeze, it degrades system performance so much our VMs, which are Azure DevOps Pipelines agents, would micro disconnect long enough to abort jobs.
There is no supported way to get the KVM fix [2] before our 6.8 kernel is available (of course technically nothing prevents you from compiling your own kernel or trying a mainline kernel, but I would generally recommend against doing that on a production system, and these setups are not supported by us). If it had been the case that the KVM fix can be easily applied on a 6.5 kernel, we would have done so to make the fix available as soon as possible -- but unfortunately it has a bunch of dependencies on other KVM changes [3] which are not straightforward to apply on a 6.5 kernel.Suppose I get the hardware in April and 6.8 isn't out yet, is there a way to setup repos to get the KVM fix build? Or is it inextricable from the kernel?
This sounds like a very complex issue. Do I understand correctly that
Especially (1) makes me suspect that the issue discussed in this thread may not be the only factor at play here.
- you have Windows VMs that did temporarily(!) freeze on AMD hosts, and do not freeze on Intel hosts (both have multiple NUMA nodes, I presume?)
- and Linux VMs never froze on the AMD hosts
By "disabling NUMA", do you mean "disabling the NUMA balancer"? And if so, do I understand correctly disabling the NUMA balancer noticeably degrades VM performance? if so, that would be an interesting data point.
If optimal NUMA placement is that critical in your usecase, one option would be to use the affinity settings [1] to pin VMs to the cores belonging to one NUMA node -- in that case, I presume the NUMA balancer wouldn't have much to do anyway.
Thanks for the info. Adding some details below, I'll try some things out once we get new servers in coming weeks.@trey.b
Thanks for the detailed writedown.
I think the Proxmox Team or @Thomas Lamprecht will be able to Provide the 6.8 Kernel, or as Testing Kernel at fullowing Dates:
Starting from 11th April, at that timeframe OpenZFS could release OpenZFS 2.2.4 ready for the 6.8 Kernel.
- March 28, 2024 (UTC) : kernel feature freeze
- April 1, 2024 (UTC) : beta freeze
- April 11, 2024 (UTC) : kernel freeze
- April 18, 2024 (UTC) : final freeze
- April 25, 2024 (UTC) : final release
But thats only an assumption.
Im waiting for the 6.8 Kernel either, because i will have surely the same issues with my 2x Genoa 9274F Servers.
Let's just hope it will get solved then, because if not, it will get much worse for all of us.
Cheers
agent: 1
balloon: 0
boot: order=virtio0
cores: 8
cpu: host
machine: pc-i440fx-7.2
memory: 32768
meta: creation-qemu=7.2.0,ctime=1689404255
name: REDACTED
net0: virtio=5E:0EE:00:1C:05,bridge=vmbr0,firewall=1
onboot: 1
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=f4ae0855-3c7a-4584-a9e1-770e903743b6
virtio0: local-zfs:base-128114553-disk-0/vm-22805-disk-0,aio=threads,discard=on,iothread=1,size=1500G
virtio1: local-zfs:vm-22805-disk-1,iothread=1,size=1G
vmgenid: 70c26a4f-4552-4056-a344-b78048f61498
agent: 1
balloon: 0
bios: ovmf
boot: order=virtio0
cores: 8
cpu: host
efidisk0: local-zfs:vm-22814-disk-efi,efitype=2m,pre-enrolled-keys=1,size=1M
machine: pc-q35-7.2
memory: 32768
meta: creation-qemu=8.1.2,ctime=1705600543
name: REDACTED
net0: virtio=5E:0EE:00:1C:0E,bridge=vmbr0,firewall=1
onboot: 1
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=b65c7f56-9307-4c32-8a83-b43c824a87ab
tpmstate0: local-zfs:vm-22814-disk-tpm,size=4M,version=v2.0
virtio0: local-zfs:base-128785902-disk-0/vm-22814-disk-0,aio=threads,discard=on,iothread=1,size=1500G
vmgenid: 92e866d1-a463-48e6-82d5-31828df71f56
agent: 1
balloon: 0
bios: ovmf
boot: order=virtio0
cores: 2
cpu: host
efidisk0: local-zfs:vm-23001-disk-efi,efitype=2m,pre-enrolled-keys=1,size=1M
machine: pc-q35-7.2
memory: 16384
meta: creation-qemu=8.1.2,ctime=1704632845
name: REDACTED
net0: virtio=5E:0EE:00:1E:01,bridge=vmbr0,firewall=1
onboot: 1
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=96e0edfd-9379-4526-a1a6-2ff325160b04
tpmstate0: local-zfs:vm-23001-disk-tpm,size=4M,version=v2.0
virtio0: local-zfs:base-130785902-disk-0/vm-23001-disk-0,aio=threads,discard=on,iothread=1,size=1000G
vmgenid: cec19d07-9fb6-4ce3-9ca5-85dabca6e07d
agent: 1
balloon: 0
bios: seabios
boot: order=virtio0
cores: 8
cpu: host
machine: pc-i440fx-7.2
memory: 32768
meta: creation-qemu=7.2.0,ctime=1691796451
name: REDACTED
net0: virtio=5E:0EE:00:1D:96,bridge=vmbr0,firewall=1
onboot: 1
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=97754d63-c807-4bc4-b1eb-ffbbb95d77cb
virtio0: local-zfs:base-129497837-disk-0/vm-32922-disk-0,aio=threads,discard=on,iothread=1,size=1500G
vmgenid: cdf46e5c-1bde-41ab-8f9b-04a929f4e62e
Yes, indeed, kernel 6.8 is now available for testing. It includes the KVM patch [1] that intends to fix the temporary freezes on hosts with multiple NUMA nodes (in combination with KSM and/or the NUMA balancer). Anyone who has been affected by these freezes: It would be great if you could check out kernel 6.8 (see [2] for instructions) and report back whether it fixes the temporary freezes in your case. You should be able to re-enable KSM and the NUMA balancer.Can any of you guys check if Kernel 6.8 helps? Its available and @t.lamprecht was a lot faster then i expected
Thank you a lot @t.lamprecht !!!
Cheers
lscpu
uname -a
grep "" /proc/sys/kernel/numa_* /sys/kernel/debug/sched/preempt /sys/kernel/mm/ksm/*
Thank you for your effort! Some remarks:I Summerized this Thread a little, to help to keep Track:
It really depends on the number of NUMA nodes reported to the host (IIRC some/many AMD processors allow configuring how many NUMA nodes are presented to the host). You can check the number of NUMA nodes in the output ofNotes about Disabling Numa and Mitigrations:
- Disabling Numa-Balancer comes with a big Performance Penalty in terms of Memory Bandwidth/Latency on Dual-Socket Systems.
-- On Single-Socket Systems the Performance Penalty should be minimal. Even on High-Core-Count Genoa CPU's.
lscpu
. If the host only sees a single NUMA node, it should not be affected by the temporary freezes reported in this thread. If anyone sees temporary freezes on a host with a single NUMA node, please let me know.Currently I don't think disabling the NUMA balancer always comes with a big performance penalty (but I haven't done any extensive performance testing either). I can imagine it makes a difference for some workloads (e.g. apparently for @trey.b it does). But if you are affected by the temporary freezes reported here and can't test kernel 6.8 yet, disabling the NUMA balancer seems worth a try in my opinion.- On Dual-Socket Systems this can be mitigrated with CPU Pinning, or at least avoid using Multiple Sockets in VM's Configuration.
Just for posterity, while the two issues both involve VM freezes, they are quite different:You may look into this Thread: https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/
The issue is almost the same / similar to this Thread.
6.2.16-12
).I suspect the freeze issues you're seeing are not only due to the issues reported in this thread, but also due to other factors. Hence, could you please open a new thread for them? Feel free to mention me there. When you do, please attach the output ofConfirming we experience the freeze on Linux, just way less common. We need it back online in production and I'm assuming the investigation is complete, so I'm putting it back, but if there's a list of commands next time it happens let me know and I can run them.
lscpu
from all affected hosts, in particular the one running the Linux VM that froze. Please also check the journal of the Linux VM for any messages during the freezes, and if you find some, please attach them.Currently I don't think disabling the NUMA balancer always comes with a big performance penalty (but I haven't done any extensive performance testing either). I can imagine it makes a difference for some workloads (e.g. apparently for @trey.b it does). But if you are affected by the temporary freezes reported here and can't test kernel 6.8 yet, disabling the NUMA balancer seems worth a try in my opinion.
I suspect the freeze issues you're seeing are not only due to the issues reported in this thread, but also due to other factors. Hence, could you please open a new thread for them? Feel free to mention me there. When you do, please attach the output oflscpu
from all affected hosts, in particular the one running the Linux VM that froze. Please also check the journal of the Linux VM for any messages during the freezes, and if you find some, please attach them.
[0] https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/