KSM Memory sharing not working as expected on 6.2.x kernel

adamb

Famous Member
Mar 1, 2012
1,320
71
113
Hey all. I have a 7 node cluster with roughly 600 CentOS7 VM's running on it. Typically average anywere from 150-500G in KSM memory sharing.

After moving to the 6.2.x kernel on 2 of the 7 front ends those 2 front ends are barely using any KSM sharing.

Front end with 6.2.x
1690195219720.png

Front end with older 5.13.x kernel
1690195259157.png

ksmtuned.conf hasn't been changed in years so I am not really sure what to look at.
 
Here I have reported similar behaviour regarding KSM sharing in a Proxmox8+kernel 6.2 cluster environment being the main cultrip of performance (100%CPU spike, Proxmox noVNC console stalling, high ICMP ping times) issues with VMs under high memory pressure we recently solved by either downgrading to kernel 5.15 or by switching off KSM sharing completely:

https://forum.proxmox.com/threads/p...ows-server-2019-vms.130727/page-3#post-574707
https://forum.proxmox.com/threads/p...cpu-issue-with-windows-server-2019-vms.130727

So it really seems there is a changed KSM sharing behavior since kernel 6 and that this changed behavior even affects the performance of VMs/Proxmox hosts under high memory pressure.
 
Last edited:
Here I have reported similar behaviour regarding KSM sharing in a Proxmox8+kernel 6.2 cluster environment being the main cultrip of performance (100%CPU spike, Proxmox noVNC console stalling, high ICMP ping times) issues with VMs under high memory pressure we recently solved by either downgrading to kernel 5.15 or by switching off KSM sharing completely:

https://forum.proxmox.com/threads/p...ows-server-2019-vms.130727/page-3#post-574707
https://forum.proxmox.com/threads/p...cpu-issue-with-windows-server-2019-vms.130727

So it really seems there is a changed KSM sharing behavior since kernel 6 and that this changed behavior even affects the performance of VMs/Proxmox hosts under high memory pressure.
Yikes, 5.15 kernel is a mess to. Live migration between different CPU's and identical CPU's is pretty much broken all together.

Looks like we are going back to thee 5.13 kernel as nothing has been stable in 6-12 months.
 
Last edited:
  • Like
Reactions: templar
Also worth mentioning I am having performance issues on the 6.2.x kernels without ksm enabled. We have pretty much lost 20-30% performance in the 6.2.x kernel's.

These are heavy lifting CentOS7 LAMP servers that have no issues on the 5.13 and 5.15 kernel (Besides live migration on the 5.15.x kernels).

Live migration is important to our enviroment and so is KSM. We have no option but to stick with 5.13.x.

I really hope the dev's are paying attention here as stuff like this is going to push customers like ourselves away and we have been here for over a decade at this point.
 
Last edited:
Also worth mentioning I am having performance issues on the 6.2.x kernels without ksm enabled. We have pretty much lost 20-30% performance in the 6.2.x kernel's.

Try mitigations=off as workaround. It could compensate performance drop, especially under memory pressure
 
Hmm, @adamb can you please post pveversion -v? I am seeing similar behavior no matter which kernel I test. After some time, the KSM shared memory is increased to roughly the same values and the very full (>95%) memory is going down quite a bit.

Tested Kernels were:
  • 5.15.102-1-pve
  • 5.19.17-2-pve
  • 6.1.15-1-pve
  • 6.2.16-4-bpo11-pve
On pve-manager/7.4-16. It is an AMD Epyc Rome system, but I would be surprised if the CPU vendor has an impact on this.
 
Hmm, @adamb can you please post pveversion -v? I am seeing similar behavior no matter which kernel I test. After some time, the KSM shared memory is increased to roughly the same values and the very full (>95%) memory is going down quite a bit.

Tested Kernels were:
  • 5.15.102-1-pve
  • 5.19.17-2-pve
  • 6.1.15-1-pve
  • 6.2.16-4-bpo11-pve
On pve-manager/7.4-16. It is an AMD Epyc Rome system, but I would be surprised if the CPU vendor has an impact on this.

We have been on the 5.13.x kernel because the entire 5.15 kernel is a mess imo. Live migration being a #1.

I can't do any testing on 5.15.x for that reason.
 
Ah missed 5.13. Now tested it as well with 5.13.19-6-pve and it worked the same as with all other kernels.
I do not have such an extreme situation though.

ksmtuned.conf hasn't been changed in years so I am not really sure what to look at.
It has been modified in the past? Could you please post its contents to verify?

And please the pveversion -v output to check other package versions.
 
Ah missed 5.13. Now tested it as well with 5.13.19-6-pve and it worked the same as with all other kernels.
I do not have such an extreme situation though.


It has been modified in the past? Could you please post its contents to verify?

And please the pveversion -v output to check other package versions.
On the latest packages and kernel.

root@ccscloud4:~# pveversion
pve-manager/7.4-16/0f39f621 (running kernel: 6.2.16-4-bpo11-pve)

ksmtuned.conf was adjusted for when it kicks in (KSM_THRES_COEF=50) other than that its pretty basic.

root@ccscloud4:~# pveversion
pve-manager/7.4-16/0f39f621 (running kernel: 6.2.16-4-bpo11-pve)
root@ccscloud4:~# cat /etc/ksmtuned.conf
# Configuration file for ksmtuned.

# How long ksmtuned should sleep between tuning adjustments
KSM_MONITOR_INTERVAL=60

# Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
KSM_SLEEP_MSEC=100

# KSM_NPAGES_BOOST=300
# KSM_NPAGES_DECAY=-50
# KSM_NPAGES_MIN=64
# KSM_NPAGES_MAX=1250

KSM_THRES_COEF=50
# KSM_THRES_CONST=2048

# uncomment the following if you want ksmtuned debug info

# LOGFILE=/var/log/ksmtuned
# DEBUG=1


Even after 6 days of uptime we don't have much KSM sharing across 30 or so Centos7 VM's running the same software.

1690457644074.png

I can almost guarantee that if I go back to 5.13.x this front end would have 100G+ of KSM sharing.

This cluster has been in place since Proxmox4 and ksm sharing has been a game changer for us.
 
Last edited:
I can almost guarantee that if I go back to 5.13.x this front end would have 100G+ of KSM sharing.
I believe you. I just need to get a reproducer working here, so we can inspect what the root cause might be. And so far I wasn't able to do so.

I am trying with the same ksmtuned.conf to see if that has an affect.

And please provide the pveversion -v with the -v ;)
Post the results ideally inside [CODE][/CODE] tags. The editor should also show the code and icode buttons next to the list format one.
 
Last edited:
I believe you. I just need to get a reproducer working here, so we can inspect what the root cause might be. And so far I wasn't able to do so.

I am trying with the same ksmtuned.conf to see if that has an affect.

And please provide the pveversion -v with the -v ;)
Post the results ideally inside [CODE][/CODE] tags. The editor should also show the code and icode buttons next to the list format one.

Got it.

Code:
root@ccscloud4:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 6.2.16-4-bpo11-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-6.2: 7.4-4
pve-kernel-5.15: 7.4-4
pve-kernel-5.13: 7.1-9
pve-kernel-6.2.16-4-bpo11-pve: 6.2.16-4~bpo11+1
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.4.174-2-pve: 5.4.174-2
pve-kernel-4.4.134-1-pve: 4.4.134-112
pve-kernel-4.4.114-1-pve: 4.4.114-108
pve-kernel-4.4.19-1-pve: 4.4.19-66
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
Did some tests on a E5-2620 v4 machine. The closest we have around for tests to the E5-2670 v3 you have.

But that also doesn't make a difference and works as expected with kernel 6.2.


To get some more infos from your systems, we can enable logging for KSM. For that, uncomment the last 2 lines in /etc/ksmtuned.conf and restart the ksmtuned service.

Every minute you should see a few lines in the /var/log/ksmtuned file.
An example from my current test system:
Code:
ue 01 Aug 2023 04:02:17 PM CEST: committed 857117848 free 13873844
Tue 01 Aug 2023 04:02:17 PM CEST: 889982808 > 65729920, start ksm
Tue 01 Aug 2023 04:02:17 PM CEST: 13873844 < 32864960, boost
Tue 01 Aug 2023 04:02:17 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:03:17 PM CEST: committed 857120932 free 14294436
Tue 01 Aug 2023 04:03:17 PM CEST: 889985892 > 65729920, start ksm
Tue 01 Aug 2023 04:03:17 PM CEST: 14294436 < 32864960, boost
Tue 01 Aug 2023 04:03:17 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:04:17 PM CEST: committed 857122988 free 14662940
Tue 01 Aug 2023 04:04:17 PM CEST: 889987948 > 65729920, start ksm
Tue 01 Aug 2023 04:04:17 PM CEST: 14662940 < 32864960, boost
Tue 01 Aug 2023 04:04:17 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:05:17 PM CEST: committed 857133268 free 15062264
Tue 01 Aug 2023 04:05:17 PM CEST: 889998228 > 65729920, start ksm
Tue 01 Aug 2023 04:05:17 PM CEST: 15062264 < 32864960, boost
Tue 01 Aug 2023 04:05:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:06:18 PM CEST: committed 857133268 free 15501284
Tue 01 Aug 2023 04:06:18 PM CEST: 889998228 > 65729920, start ksm
Tue 01 Aug 2023 04:06:18 PM CEST: 15501284 < 32864960, boost
Tue 01 Aug 2023 04:06:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:07:18 PM CEST: committed 857140464 free 15896628
Tue 01 Aug 2023 04:07:18 PM CEST: 890005424 > 65729920, start ksm
Tue 01 Aug 2023 04:07:18 PM CEST: 15896628 < 32864960, boost
Tue 01 Aug 2023 04:07:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:08:18 PM CEST: committed 906934656 free 13719736
Tue 01 Aug 2023 04:08:18 PM CEST: 939799616 > 65729920, start ksm
Tue 01 Aug 2023 04:08:18 PM CEST: 13719736 < 32864960, boost
Tue 01 Aug 2023 04:08:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:09:18 PM CEST: committed 1007000468 free 4567776
Tue 01 Aug 2023 04:09:18 PM CEST: 1039865428 > 65729920, start ksm
Tue 01 Aug 2023 04:09:18 PM CEST: 4567776 < 32864960, boost
Tue 01 Aug 2023 04:09:18 PM CEST: KSMCTL start 1250 25

This is while the system is actively merging memory. Would be interesting what it looks like for you on the system where it doesn't work, and one where it is in a good state.

Additionally, could you run the following one-liner on the problematic and a good system?
for i in /sys/kernel/mm/ksm/*; do echo "$i:"; cat $i; done
 
Did some tests on a E5-2620 v4 machine. The closest we have around for tests to the E5-2670 v3 you have.

But that also doesn't make a difference and works as expected with kernel 6.2.


To get some more infos from your systems, we can enable logging for KSM. For that, uncomment the last 2 lines in /etc/ksmtuned.conf and restart the ksmtuned service.

Every minute you should see a few lines in the /var/log/ksmtuned file.
An example from my current test system:
Code:
ue 01 Aug 2023 04:02:17 PM CEST: committed 857117848 free 13873844
Tue 01 Aug 2023 04:02:17 PM CEST: 889982808 > 65729920, start ksm
Tue 01 Aug 2023 04:02:17 PM CEST: 13873844 < 32864960, boost
Tue 01 Aug 2023 04:02:17 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:03:17 PM CEST: committed 857120932 free 14294436
Tue 01 Aug 2023 04:03:17 PM CEST: 889985892 > 65729920, start ksm
Tue 01 Aug 2023 04:03:17 PM CEST: 14294436 < 32864960, boost
Tue 01 Aug 2023 04:03:17 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:04:17 PM CEST: committed 857122988 free 14662940
Tue 01 Aug 2023 04:04:17 PM CEST: 889987948 > 65729920, start ksm
Tue 01 Aug 2023 04:04:17 PM CEST: 14662940 < 32864960, boost
Tue 01 Aug 2023 04:04:17 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:05:17 PM CEST: committed 857133268 free 15062264
Tue 01 Aug 2023 04:05:17 PM CEST: 889998228 > 65729920, start ksm
Tue 01 Aug 2023 04:05:17 PM CEST: 15062264 < 32864960, boost
Tue 01 Aug 2023 04:05:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:06:18 PM CEST: committed 857133268 free 15501284
Tue 01 Aug 2023 04:06:18 PM CEST: 889998228 > 65729920, start ksm
Tue 01 Aug 2023 04:06:18 PM CEST: 15501284 < 32864960, boost
Tue 01 Aug 2023 04:06:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:07:18 PM CEST: committed 857140464 free 15896628
Tue 01 Aug 2023 04:07:18 PM CEST: 890005424 > 65729920, start ksm
Tue 01 Aug 2023 04:07:18 PM CEST: 15896628 < 32864960, boost
Tue 01 Aug 2023 04:07:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:08:18 PM CEST: committed 906934656 free 13719736
Tue 01 Aug 2023 04:08:18 PM CEST: 939799616 > 65729920, start ksm
Tue 01 Aug 2023 04:08:18 PM CEST: 13719736 < 32864960, boost
Tue 01 Aug 2023 04:08:18 PM CEST: KSMCTL start 1250 25
Tue 01 Aug 2023 04:09:18 PM CEST: committed 1007000468 free 4567776
Tue 01 Aug 2023 04:09:18 PM CEST: 1039865428 > 65729920, start ksm
Tue 01 Aug 2023 04:09:18 PM CEST: 4567776 < 32864960, boost
Tue 01 Aug 2023 04:09:18 PM CEST: KSMCTL start 1250 25

This is while the system is actively merging memory. Would be interesting what it looks like for you on the system where it doesn't work, and one where it is in a good state.

Additionally, could you run the following one-liner on the problematic and a good system?
for i in /sys/kernel/mm/ksm/*; do echo "$i:"; cat $i; done


Here is the system were ksm is broken.

Code:
root@ccscloud4:~# for i in /sys/kernel/mm/ksm/*; do echo "$i:"; cat $i; done
/sys/kernel/mm/ksm/full_scans:
7635
/sys/kernel/mm/ksm/max_page_sharing:
256
/sys/kernel/mm/ksm/merge_across_nodes:
1
/sys/kernel/mm/ksm/pages_shared:
447156
/sys/kernel/mm/ksm/pages_sharing:
3935104
/sys/kernel/mm/ksm/pages_to_scan:
1250
/sys/kernel/mm/ksm/pages_unshared:
675784
/sys/kernel/mm/ksm/pages_volatile:
2870874
/sys/kernel/mm/ksm/run:
1
/sys/kernel/mm/ksm/sleep_millisecs:
10
/sys/kernel/mm/ksm/stable_node_chains:
147
/sys/kernel/mm/ksm/stable_node_chains_prune_millisecs:
2000
/sys/kernel/mm/ksm/stable_node_dups:
6468
/sys/kernel/mm/ksm/use_zero_pages:
0

Code:
root@ccscloud4:~# tail -f /var/log/ksmtuned
Wed 02 Aug 2023 06:45:24 AM EDT: 37401644 < 198050444, boost
Wed 02 Aug 2023 06:45:24 AM EDT: KSMCTL start 1250 10
Wed 02 Aug 2023 06:46:24 AM EDT: committed 454337640 free 37468604
Wed 02 Aug 2023 06:46:24 AM EDT: 652388084 > 396100888, start ksm
Wed 02 Aug 2023 06:46:24 AM EDT: 37468604 < 198050444, boost
Wed 02 Aug 2023 06:46:24 AM EDT: KSMCTL start 1250 10
Wed 02 Aug 2023 06:47:24 AM EDT: committed 454337640 free 37506892
Wed 02 Aug 2023 06:47:24 AM EDT: 652388084 > 396100888, start ksm
Wed 02 Aug 2023 06:47:24 AM EDT: 37506892 < 198050444, boost
Wed 02 Aug 2023 06:47:24 AM EDT: KSMCTL start 1250 10
Wed 02 Aug 2023 06:48:24 AM EDT: committed 454337640 free 37569192
Wed 02 Aug 2023 06:48:24 AM EDT: 652388084 > 396100888, start ksm
Wed 02 Aug 2023 06:48:24 AM EDT: 37569192 < 198050444, boost
Wed 02 Aug 2023 06:48:24 AM EDT: KSMCTL start 1250 10


Here is the system that is aok
Code:
root@ccscloud6:~# for i in /sys/kernel/mm/ksm/*; do echo "$i:"; cat $i; done
/sys/kernel/mm/ksm/full_scans:
148
/sys/kernel/mm/ksm/max_page_sharing:
256
/sys/kernel/mm/ksm/merge_across_nodes:
1
/sys/kernel/mm/ksm/pages_shared:
7293173
/sys/kernel/mm/ksm/pages_sharing:
107317077
/sys/kernel/mm/ksm/pages_to_scan:
64
/sys/kernel/mm/ksm/pages_unshared:
21330037
/sys/kernel/mm/ksm/pages_volatile:
257315437
/sys/kernel/mm/ksm/run:
1
/sys/kernel/mm/ksm/sleep_millisecs:
10
/sys/kernel/mm/ksm/stable_node_chains:
658
/sys/kernel/mm/ksm/stable_node_chains_prune_millisecs:
2000
/sys/kernel/mm/ksm/stable_node_dups:
26996
/sys/kernel/mm/ksm/use_zero_pages:
0

Code:
root@ccscloud6:~# tail -f /var/log/ksmtuned
Wed 02 Aug 2023 06:46:29 AM EDT: 1662187252 > 1585199616, decay
Wed 02 Aug 2023 06:46:29 AM EDT: KSMCTL start 64 10
Wed 02 Aug 2023 06:47:29 AM EDT: committed 1977560292 free 1662163440
Wed 02 Aug 2023 06:47:29 AM EDT: 3562759908 > 3170399232, start ksm
Wed 02 Aug 2023 06:47:29 AM EDT: 1662163440 > 1585199616, decay
Wed 02 Aug 2023 06:47:29 AM EDT: KSMCTL start 64 10
Wed 02 Aug 2023 06:48:29 AM EDT: committed 1977568488 free 1662157036
Wed 02 Aug 2023 06:48:29 AM EDT: 3562768104 > 3170399232, start ksm
Wed 02 Aug 2023 06:48:29 AM EDT: 1662157036 > 1585199616, decay
Wed 02 Aug 2023 06:48:29 AM EDT: KSMCTL start 64 10
Wed 02 Aug 2023 06:49:29 AM EDT: committed 1977568488 free 1662265664
Wed 02 Aug 2023 06:49:29 AM EDT: 3562768104 > 3170399232, start ksm
Wed 02 Aug 2023 06:49:29 AM EDT: 1662265664 > 1585199616, decay
Wed 02 Aug 2023 06:49:29 AM EDT: KSMCTL start 64 10

Its also worth mentioning that I am havinng the issue on a newer CPU as well. This front end would typically have 200G+ of KSM sharing on the 5.13.x kernel.

1690973436328.png
 
@adamb sorry, but so far I was still not able to reproduce this.

Just to make sure, the only difference was to reboot the host with an older kernel (5.13) to get great KSM sharing results again, right? No other packages were up- or downgraded?

The other difference to my test system that I can see is that they are all using multiple CPUs. Do you have a single socket system around to see how it behaves there?


Maybe also try to set merging across NUMA nodes to off and see if that has an effect:
echo 0 > /sys/kernel/mm/ksm/merge_across_nodes
 
@adamb sorry, but so far I was still not able to reproduce this.

Just to make sure, the only difference was to reboot the host with an older kernel (5.13) to get great KSM sharing results again, right? No other packages were up- or downgraded?

The other difference to my test system that I can see is that they are all using multiple CPUs. Do you have a single socket system around to see how it behaves there?


Maybe also try to set merging across NUMA nodes to off and see if that has an effect:
echo 0 > /sys/kernel/mm/ksm/merge_across_nodes

I did do some package updates when I moved to the newer kernel. Its worth me testing to go back to 5.13.x and see how it runs.

I was able to get it to do a little better by bumping the following settings.

KSM_NPAGES_BOOST=500000000
KSM_NPAGES_MIN=2000000000
KSM_NPAGES_MAX=3000000000

I should be able to get one of the front ends back on 5.13.x by the end of the week.
 
I did do some package updates when I moved to the newer kernel. Its worth me testing to go back to 5.13.x and see how it runs.
Okay, can you compare the versions, especially of the qemu-server package? If the version differs there, this could also be the cause :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!