Strange KSM Sharing

Jun 25, 2022
81
8
13
hi,
this is a strange behavior of KSM sharing in one of the host of our three node cluster, as i understand ksm sharing activated if ram consumption gone above 80% of the physical memory in the host, but here strangely the total memory consumption is around 52-54% but a big chunk of 21 GB of memory consumed or shared in KSM sharing. i find it very strange , request you to kindly have a look at this strange observation of the three node cluster. i am attaching the screen shot of the problem. please help

B P Banerjee
 

Attachments

Hi,
please share the output of
Code:
pveversion -v
cat /etc/ksmtuned.conf

Can it be that the load was above the threshold before (i.e. KSM started kicking in and the saved memory made it go back below the threshold)?
 
hi
"Can it be that the load was above the threshold before (i.e. KSM started kicking in and the saved memory made it go back below the threshold)?" .. yes it may be true , but if it go back to below threshold , then KSM should also be become neutral … but it is stuck there. I am attaching the both command u asked to share.
 

Attachments

hi
"Can it be that the load was above the threshold before (i.e. KSM started kicking in and the saved memory made it go back below the threshold)?" .. yes it may be true , but if it go back to below threshold , then KSM should also be become neutral … but it is stuck there. I am attaching the both command u asked to share.
No, as soon as hosts RAM drops below 80% KSM will completely stop working including the GC that frees up shared RAM.
 
No, as soon as hosts RAM drops below 80% KSM will completely stop working including the GC that frees up shared RAM.
Dont misunderstand.
It depends on your VM configs (Ballooning), see my Screenshot above.
In this example, all my VMs on this node consumes about 80% real RAM as "max Memory" (100%) and "min Memory" (50%) (Ballooning).
So, KSM kicks in and stay in. The Web-Gui shows the values calculated with KSM (here ~40%) but in reality it "reserves/consumes" 80% from the host.
This is why i like KSM ;)
 
Last edited:
Here it is unrelated to VM configs. Ballooning will kick in at fixed 80% host RAM usage. By default KSM will kick in at 80% too but you can change that by editing the ksmtuned.conf. As far as I understand it the "KSM_NPAGES_DECAY" option will slow down KSM activity when RAM usage is below thresholds and "KSM_NPAGES_BOOST" will increase KSM activity when above thresholds. So as long as the host isn't getting low on RAM KSM basically won't try to free up shared RAM so it won't waste CPU time and memory bandwidth when not really needed.

You can test that by bringing your host to 75% RAM usage after a reboot with ballooning disabled everywhere. KSM should show that 0GB RAM will be shared. Then start a VM to bring the hosts RAM to something like 85%. Now KSM kicks in and deduplicated RAM and the KSM shared RAM is growing. Because of deduplication the hosts RAM usage should shrink again and it should stop shrinking at 79% as then KSM will be disabled again and won't continue to scan for pages that could be deduplicated. If you now shutdown some VMs hosts RAM usage will drop way above 79% and the reported RAM shared by KSM won't change anymore until you start some new VMs bringing the hosts RAM usage above 80% again.
 
Last edited:
I think it's also that KSM will stop de-duplicating new pages when it gets below the threshold, but it won't re-duplicate pages it already de-duplicated.

If you really have issues, see here for how to unmerge pages.
 
  • Like
Reactions: Dunuin
yes i am facing a strange problem of VM shutting down randomly everyday early morning, this is a three node cluster, only one host has this KSM sharing issue, rest of the two host not facing any problem like that, rest of the two host do not have any KSM sharing (0, zero), and also the people who are using these vm telling it is running slow compared to the other vm in rest of the two host. waiting for a solution.
 
yes i am facing a strange problem of VM shutting down randomly everyday early morning,
Do you have backups or some other task running at that time? Are there any messages in /var/log/syslog?
this is a three node cluster, only one host has this KSM sharing issue, rest of the two host not facing any problem like that, rest of the two host do not have any KSM sharing (0, zero), and also the people who are using these vm telling it is running slow compared to the other vm in rest of the two host. waiting for a solution.
 
i cross checked every where, there is no backup task going on in that time, syslog do not have any entry also, just randomly shutting down these vm, today i migrated these vm to third host, where no KSM sharing is visible, it did not shutdown in early morning. i tested that if KSM sharing is showing any value, these shutting down problem is happening.... other vm on other host never get these problem. very strange behavior ?
 
Without any logs, we cannot say what's going on there.

If KSM bothers you so much, just disable it globally. But as others already pointed out, you ran into a over 80% utilizations and then deduplicated the space, so without KSM you may run into other problems. Please also check to disable all disk-caching (cache=no) for all VMs, so that those caches will not use any more space than could trigger a KSM event or even an OOM. Without disk-caching disabled, the memory usage prediction is totally random.
 
i shifted all the vm in this 2nd host to 1 st host, at present there is no vm running, but still ksm sharing of 21.23 gb is showing, it is not coming down, may be some bug there in the gui or something else which we are not able to figure it out.
 
i shifted all the vm in this 2nd host to 1 st host, at present there is no vm running, but still ksm sharing of 21.23 gb is showing, it is not coming down, may be some bug there in the gui or something else which we are not able to figure it out.
Thats feature, not a bug. See above.
 
now i restarted the 2nd host with ksm sharing of 22gb, after restart ksm is "0", i migrated all the vm back to host 2 from host 1, i waited for last two days to observe then before writing this ..... now the vm is not shutting down in early morning, perfectly running never been shutdown in last two days, and the performance(speed) is also increased a bit, as observed by the user of these vm .... what conclusion i will get from this experiment ? something is wrong somewhere which we unable to figure it out ? may be i hope proxmox team can find a solution .......