[SOLVED] Proxmox commits more memory than total of VMs

Thomas15 · Mar 1, 2021

Hello everyone,

I have a problem with RAM consumption on proxmox nodes. For example there is one node with a total of 192GB RAM and virtual machines with balooning devices running with a max memory total of 64GB RAM. Though the system has about 140GB RAM in use, the rest is buffered/cached and free ram is nearly not there at all. I can see in htop that nearly all VMs have a VIRT Ram of more than the max size of they memory. Also the systems even starts to use the 8GB of the swap so that it becomes very unresponsive.

First I suspected that the NUMA settings were the problem. We use dual socket nodes and enabled always two sockets for a VM instead of one socket. Though that did not fix the issue.

Does anybody know what could be wrong here? I´ll glady provide more details if required.

Greetings,
Thomas

PS: that free -m was taken after rebooting some VMs that used up alot of swap (used smem to find out) Also I kept a few VMs shutdown.

Code:

              total        used        free      shared  buff/cache   available
Mem:         192089      147078        7507         368       37503       43216
Swap:          8191        1818        6373

aaron · Mar 2, 2021

Do you by any chance use ZFS? That could account for the higher RAM usage as it is using empty RAM as a cache (up to 50% of total RAM if available). You can check the size of the ZFS cache (ARC) with the command arcstat.

Thomas15 · Mar 2, 2021

Hi, thanks for your answer. No I do not use ZFS.

Tonight I decided to reboot the hosts and set the swappiness to 0 so that swap is only used as absolutely last resort. So far it looks decent, I have all VMs enabled, KSM share values are higher as well and the RAM usage is significantly lower. Also I feel that all VMs and applications are alot faster. I don´t know what was the cause. I still suspect the issue with numa. l´ll check how it goes.

Code:

free -m
              total        used        free      shared  buff/cache   available
Mem:         192089       45801       88228          81       58059      144777
Swap:          8191           0        8191

Thomas15 · Apr 6, 2021

Hello,

I need to reopen this thread as it turned out that those things I tried so far did not fix the issue (lower swapiness, change CPU config). The RAM consumption increased again slowly but steadily until the first host node consumped all its swap and died the performance death. As those systems are live I could not really debug, but one interesting thing I could see quick is that when I move all VMs away from a node the memory consumptions stays extremely high. There are ~80% of the 192GB used although no VM is running on the node. Also these 80% usage are much more that the VMs can even have as max memory. (Yes, no ZFS!)

All the memory didn´t show up in software like htop, so it´s not consumed by a process but by the kernel instead. Now I suspect the ballooning kernel module to make some problems. So I decided to disable ballooning for all VMs, set static RAM instead and to reboot all machines. The consumption significantly lowered again, but now I still have to check and wait if it will raise again.

Anybody encountered an issue similar like that or has an explanation?

Greetings,
Thomas

aaron · Apr 6, 2021

Quick question, are you by any chance using CheckMK to monitor the hosts? There was a memory leak a few versions back caused by the systemd unit file of the checkmk service.

If so, make sure that the parameter "KillMode=process" is changed to "Type=forking".

See https://checkmk.com/de/werk/10070

Thomas15 · Apr 6, 2021

Yes, I´m using CheckMK to monitor the VMs as well as the nodes. Thanks for that tip! Also the version of CheckMK fits. I adapted the systemd unit file on all nodes and reloaded and restarted it but the RAM is not free yet. Maybe there is one last reboot needed to fix this problem?

Using systemd-cgls -au system-check_mk.slice still lists a ton of instances.

EDIT: It´s enough to just systemctl stop system-check_mk.slice && systemctl start system-check_mk.slice

It took a while to stop the slices but after that the RAM cleared massively. Thanks for your help aaron.

aaron · Apr 6, 2021

Good to hear that the problem is solved. I went ahead and marked the thread as solved.

Search

Search

[SOLVED] Proxmox commits more memory than total of VMs

Thomas15

Member

Attachments

aaron

Proxmox Staff Member

Thomas15

Member

Thomas15

Member

aaron

Proxmox Staff Member

Thomas15

Member

aaron

Proxmox Staff Member

We value your privacy