Here is some more interesting stuff. This has been a issue since the 6.x kernel hit the streets for proxmox.
Host: HP DL 560 Gen10
Storage: iSCSI Alletra 6000 NVMe
I have 2 very active CentOS7 VM's running httpd/java/tomcat, typically seeing 2k-3k httpd sessions at any given time.
As...
We hit soft lockups again with the VM set to 192 cores and our compiles set to use 128 of those cores (NUMA on and CPU hotplug disabled).
No benchmark's, these are production, we don't have time to do that kind of stuff. Hence the reason we pay for enterprise repo's.
Just updated the...
Good questions.
Just checked Supermicro's site and the host is on the latest Bios. They don't seem to offer any microcode updates as of yet for this model.
Doing some testing without CPU hotplug enabled and I found the following.
CPU Hotplug Enabled
- VM Boots aok with all the cores from the...
Running on the 4th Gen Intel with any of the 6.5.x kernels results in the following within a matter of an hour or so.
I can drop core counts, memory, enable numa etc, none of it matters and we still hit CPU lockups.
Move the VM back to a 5.15.x kernel and the VM is rock solid. Move the VM...
The use cases are large VM's, from LAMP to very proprietary databases.
Numa on or off doesn't matter, it just happened to be the option that was left after testing. We typically have numa on, but we wanted to try every option.
Like I said, this is a Intel XEON 4th Gen CPU issue.
I can't...
Also noticing this being repeated in the logs on any host I move to the 6.5.x kernel.
[Fri Nov 3 06:51:50 2023] bpfilter: read fail 0
[Fri Nov 3 06:51:50 2023] bpfilter: Loaded bpfilter_umh pid 72492
[Fri Nov 3 06:51:50 2023] bpfilter: read fail 0
[Fri Nov 3 06:51:50 2023] bpfilter: Loaded...
I do appreciate the information, but this has nothing to do with that.
This is a Intel Gen 4 XEON issue.
We have Gen2 and Gen3 quad socket setups running 6.x.x kernel with no performance issues.
I have all kinds of hardware that I test on.
For the KSM issue I have been testing on a HP DL 380 Gen9. That is dual socket.
For the performance issues on the latest 4th Gen Intel CPU's I am testing on a newer Supermicro chassis. Its a quad socket...
So far its also looking like KSM is still completely broke in 6.5.
All of this is such a bummer for real enterprise environments.
EDIT:
Might have spoke to soon. We will see were this goes within the next 45 minutes. Went from 0 to 981MB in a matter of a minute or so.
EDIT #2...
Here is the VM config. Its a Debian 12 VM.
root@ccsprogmiscrit1:~# cat /etc/pve/nodes/ccsprogmiscrit1/qemu-server/154.conf
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 62
cpu: host,flags=+pdpe1gb
efidisk0: MissionCrit-Alletra:vm-154-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K...
Bummer.
Tested 6.5 on one of my new SuperMicro front ends with 4x Intel Xeon Gold 6448H. VM locks up under load with CPU's stuck. I do run zfs on root with 2 Micron 5400 Pro's.
Server.
https://www.supermicro.com/en/products/system/mp/2u/sys-241e-tnrttp
VM storage is on HPE Alletra NVMe...
IMO most of us running large real production clusters are having to many issues on any of the 5.15.x and 6.2.x kernels. Its been a mess.
KSM is and always has been a solid tool, I would have known far sooner about KSM if I didn't move to 5.15.x and realize that live migration was completely...
What can be done to prevent this in the future? KSM is critical to production for so many environments.
IMO PVE8 isn't production ready with this issue.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.