Getting out of memory warnings on host

nicedevil · Jan 9, 2022

Good morning guys,

I get oom warnings from time to time.
I have 32 GB of RAM and only 22 of them are assigned to LXCs/VMs on my system. Ofc I use ZFS on my system without deduplication but I can't figure out what the hell is burning my RAM.

If I take a look at my host's overview I can see this: (after a restart)

After a few hours it gets to 28/30 GB.

On my PiKVM I can see this in the console (overlayFS is already answered by other guys here):

I hope someone can help :/

dylanw · Jan 10, 2022

If you run top -c on the host and press 'Shift+m' you can sort processes by the percentage of memory used (top's '-c' flag shows the full command).

leesteken · Jan 10, 2022

If you did not limit the amount of memory ZFS is allowed to use, it will try to use up to 50% of your RAM for cache. In my experience, ZFS does not always give the memory back (quickly enough) when it is needed for other programs.

nicedevil · Jan 10, 2022

avw said:
If you did not limit the amount of memory ZFS is allowed to use, it will try to use up to 50% of your RAM for cache. In my experience, ZFS does not always give the memory back (quickly enough) when it is needed for other programs.

We set this up to 8 GB of RAM 5 days ago, here my actual output (seems like it is the example from proxmox wiki)

root@gateway:~# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=8589934592

And ofc. we did a reboot after this.

I summed up all assigned RAM from my VMs/LXCs running on this machine.
It is about 20 GB of RAM.
My system has 2x16 GB RAM (effective available 30.7 GB on Summary).
So with 8 GB dedicated for ZFS, there should be 2 GB (a bit more) for the host itself... Isn't that enough?

I have 2x2TB SSDs (ZFS Mirror) and 1x 512 GB NVMe M.2 SSD (XFS) inside it. So 8 GB should be fine right

nicedevil · Jan 11, 2022

dylanw said:
If you run top -c on the host and press 'Shift+m' you can sort processes by the percentage of memory used (top's '-c' flag shows the full command).

I did take a look at this shortly after an other OOM and it looks like this:

As you can see there are only 2 "high" memory consumptions that did take about 10G of my memory... I would say not that much that I have to cry on it right?

This was the error showing up on my screen:

dylanw · Jan 11, 2022

nicedevil said:
As you can see there are only 2 "high" memory consumptions that did take about 10G of my memory... I would say not that much that I have to cry on it right?

From this screenshot, it doesn't look like it. /usr/bin/kvm relates to one of your running virtual machines. I don't recognize the command running under it, as it's been cut out of the screenshot. But it may be worth looking into, if it was taking 14% of your system memory.

nicedevil said:
This was the error showing up on my screen:

Apologies, I misread the error message the first time around. If the warning is regarding cgroups, it could be due to a container maxing out its assigned memory. Could you also monitor your containers to see if this is the case?

nicedevil · Jan 11, 2022

From this screenshot, it doesn't look like it. /usr/bin/kvm relates to one of your running virtual machines. I don't recognize the command running under it, as it's been cut out of the screenshot. But it may be worth looking into, if it was taking 14% of your system memory.

It is from my sophos XG, that is absolutly ok, it has 6 GB memory assigned and is using 3-4 GB (if I take a look at this VM's summary).

Apologies, I misread the error message the first time around. If the warning is regarding cgroups, it could be due to a container maxing out its assigned memory. Could you also monitor your containers to see if this is the case?

Mhhh is there a usefull command to have a nice overview for all at once?
I just clicked through all my LXCs and they all are running at 20-30% (of 512 MB memory or something around that).

dylanw · Jan 12, 2022

nicedevil said:
Mhhh is there a usefull command to have a nice overview for all at once?

Not a command, but if you go to the "Search" panel of the node or datacentre, you can get a good rundown of stats for each running guest.

nicedevil · Jan 12, 2022

Yeah that is still looking good isn't it?

dylanw · Jan 12, 2022

Yeah in all regards it looks fine. Are you still seeing the out of memory logs? I would keep an eye on it over the next days, to see if you notice any gradual increases or memory spikes.

nicedevil · Jan 12, 2022

dylanw said:
Yeah in all regards it looks fine. Are you still seeing the out of memory logs? I would keep an eye on it over the next days, to see if you notice any gradual increases or memory spikes.

Yeah I saw this cgroup memory oom thing on yesterday.
And the overview on this search was always looking like this after I got the error :/

Maybe it isn't the best Idea to let Docker run inside an LXC as some guys are teaching us on YouTube and other content pages (overlay FS issue as well with this).

dylanw · Jan 12, 2022

nicedevil said:
Yeah I saw this cgroup memory oom thing on yesterday.
And the overview on this search was always looking like this after I got the error :/

The message indicates a process was killed though, which may have been why everything looked normal following the incident. In the top right corner of the summary page, you can set the time resolution of the graphs from hourly to weekly. Could you do this and check if you can identify the spike in memory?

nicedevil said:
Maybe it isn't the best Idea to let Docker run inside an LXC as some guys are teaching us on YouTube and other content pages (overlay FS issue as well with this).

I don't have any personal experience, but from what I've read, Docker should run okay in LXC as long as it's unprivileged with nesting and keyctl enabled.

nicedevil · Jan 12, 2022

dylanw said:
The message indicates a process was killed though, which may have been why everything looked normal following the incident. In the top right corner of the summary page, you can set the time resolution of the graphs from hourly to weekly. Could you do this and check if you can identify the spike in memory?

I don't have any personal experience, but from what I've read, Docker should run okay in LXC as long as it's unprivileged with nesting and keyctl enabled.

First question:

2nd "question LXC"...:

This error could only be removed by installing docker in a VM or on a non ZFS storage (was an other thread here, where this was told to me).

dylanw · Jan 13, 2022

nicedevil said:
First question:

Sorry, I meant to check the containers weekly report, to see if any spiked in memory. It shouldn't be too tedious as one you change the period for one, it sets it for them all.

nicedevil said:
This error could only be removed by installing docker in a VM or on a non ZFS storage (was an other thread here, where this was told to me).

Ah I hadn't considered it with ZFS, but yes I guess there are still some issues there.

nicedevil · Jan 13, 2022

That is the LXC with the highest used Memory.... the increase in Total was done by myself to make sure that all of them got enough if they need for testing...
As you can see not one spike anywhere... that is the same for my SMB share LXC, Pihole LXC, Wireguard LXC and and and.... realy crazy tho :/

leesteken · Jan 13, 2022

Maybe check /proc/spl/kstat/zfs/arcstats to see if c_max is equal to your zfs_arc_max? You can also look at the c to check the current number of bytes allocated for the ZFS cache.

dbuenoparedes · Jan 17, 2022

In my experience, I'm only running VMs and not LXC containers but ZFS does take a pretty big amount of RAM for it's cache. My server has 128GB of RAM showing that is using 74.5% of the total RAM where more than half of the RAM is taken by ZFS ARC, but this info isn't shown in top nor free -h commands. Running VMs have assigned in total 24GBs.

You may be able to confirm how much RAM is ZFS ARC taking by running the following command:

arc_summary -s arc

And look for the following section:

Code:

ARC size (current):                                   100.2 %   63.0 GiB
        Target size (adaptive):                       100.0 %   62.9 GiB
        Min size (hard limit):                          6.2 %    3.9 GiB
        Max size (high water):                           16:1   62.9 GiB
        Most Frequently Used (MFU) cache size:          8.4 %    5.3 GiB
        Most Recently Used (MRU) cache size:           91.6 %   57.5 GiB
        Metadata cache size (hard limit):              75.0 %   47.2 GiB
        Metadata cache size (current):                  0.7 %  320.5 MiB
        Dnode cache size (hard limit):                 10.0 %    4.7 GiB
        Dnode cache size (current):                     0.6 %   27.4 MiB

Edit: I'm also showing below the output of the top and free -h commands:

Code:

top - 09:44:49 up 248 days, 16:24,  2 users,  load average: 3.78, 3.81, 3.76
Tasks: 768 total,   1 running, 767 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.5 us,  0.1 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem : 128826.0 total,   2703.7 free,  95415.0 used,  30707.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  30980.7 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                 
 9684 root      20   0   17.2g  16.1g  14264 S 209.6  12.8 158751:18 kvm                                                                                                                     
27332 root      20   0 4853608   4.0g  14324 S 106.0   3.2  71383:20 kvm                                                                                                                     
39270 root      20   0 2954100   2.1g  13512 S   4.6   1.6  14212:50 kvm                                                                                                                     
10729 root      20   0 2971308   2.1g  14244 S   0.3   1.6   1413:16 kvm                                                                                                                     
 2613 root      20   0 2677528 828176   5936 S   0.0   0.6   1222:42 glusterfsd                                                                                                               
 2636 root      rt   0  585564 189972  63860 S   1.3   0.1   5149:37 corosync                                                                                                                 
 1432 root      20   0  190672 144608  41864 S   0.0   0.1  27:54.22 systemd-journal                                                                                                         
39508 root      20   0  369772 138196  10396 S   0.0   0.1   0:01.84 pvedaemon worke                                                                                                         
31250 root      20   0  369508 137452   9872 S   0.0   0.1   0:03.91 pvedaemon worke                                                                                                         
12176 www-data  20   0  354644 137068  17624 S   0.0   0.1   1:54.64 pveproxy                                                                                                                 
 2728 www-data  20   0  368412 136760  10612 S   0.0   0.1   0:03.12 pveproxy worker                                                                                                         
12430 root      20   0  369528 136636   9140 S   0.0   0.1   0:00.61 pvedaemon worke                                                                                                         
19016 www-data  20   0  368064 135852  10000 S   0.0   0.1   0:00.56 pveproxy worker                                                                                                         
14891 www-data  20   0  367832 135744  10000 S   0.0   0.1   0:00.69 pveproxy worker                                                                                                         
 2708 root      20   0  353056 120616   2548 S   0.0   0.1   3:37.77 pvedaemon                                                                                                               
 2716 root      20   0  337812 101460   7948 S   0.0   0.1  40:13.31 pve-ha-crm                                                                                                               
 2888 root      20   0  337396 101356   8268 S   0.0   0.1  83:45.84 pve-ha-lrm                                                                                                               
 2661 root      20   0  304248  89760   8936 S   0.0   0.1   1605:08 pvestatd                                                                                                                 
 2657 root      20   0  305916  89556   7176 S   0.0   0.1 615:00.27 pve-firewall                                                                                                             
 2567 root      20   0 2407272  72100  52000 S   0.0   0.1 518:03.05 pmxcfs                                                                                                                   
33189 root      20   0  183896  60348   4524 S   0.0   0.0   0:06.60 chef-client                                                                                                             
 2886 www-data  20   0   70320  56264   7512 S   0.0   0.0   4:25.34 spiceproxy                                                                                                               
 6690 www-data  20   0   70568  52232   3260 S   0.0   0.0   0:00.92 spiceproxy work                                                                                                         
40382 consul    20   0  182892  25204   6956 S   0.3   0.0 396:40.09 consul                                                                                                                   
33452 root      20   0  729808  24900   7624 S   0.0   0.0   1441:59 node_exporter                                                                                                           
 3105 root      20   0  573900  14024   5668 S   0.0   0.0  65:30.70 glusterfs                                                                                                               
 2235 root      20   0  584376  11184   6028 S   0.0   0.0  32:02.57 glusterd                                                                                                                 
 2655 root      20   0   27000  10224   9092 S   0.0   0.0   9:42.22 corosync-qdevic                                                                                                         
 2622 root      20   0  811600  10000   4356 S   0.0   0.0   7:18.26 glusterfs                                                                                                               
    1 root      20   0  171308   8708   5408 S   0.0   0.0 143:01.51 systemd                                                                                                                 
23795 root      20   0   21400   8156   6568 S   0.0   0.0   0:00.03 systemd                                                                                                                 
23789 root      20   0   16896   7280   6172 S   0.0   0.0   0:00.01 sshd                                                                                                                     
40807 postfix   20   0   43832   6556   5692 S   0.0   0.0   0:00.00 pickup                                                                                                                   
33191 systemd+  20   0   93080   6404   5480 S   0.0   0.0   0:06.06 systemd-timesyn

Code:

              total        used        free      shared  buff/cache   available
Mem:          125Gi        93Gi       2.6Gi       1.4Gi        29Gi        30Gi
Swap:            0B          0B          0B

nicedevil · Jan 17, 2022

ok that is looking like this

while top -c / shift+m looks like this

so in total I would say there is about 40% of memory taken by my processes on my host and about 25% bei arc cache right?
So there should be plenty of space left to do the "normal" stuff and don't run OOM isn't it?

dbuenoparedes · Jan 18, 2022

Afaik the ZFS ARC cache should be freed as soon as something else needs that memory, same as with the Linux buff/cache shown in top as well as with the free -h command, is a temporary memory that the OS and ZFS use as long as it's free-to-use (not used by anything else).

Not entirely sure if this is exactly the way it works, someone with more ZFS knowledge might clarify further. As @avw mentioned earlier in this thread I could be wrong and ZFS could stick to its ARC cache much longer than expected/desired:

If you did not limit the amount of memory ZFS is allowed to use, it will try to use up to 50% of your RAM for cache. In my experience, ZFS does not always give the memory back (quickly enough) when it is needed for other programs.

nicedevil · Jan 19, 2022

My experience so far was that ZFS will use 50% of system memory without deduplication turned on (that is the case for me). so 16gb gone with my ZFS mirror ok... but the last 16 gb should be enough to run everything else but it is not and I cant figure out what is using too much memory at least more than I assigned to my VMs/LXCs.

to get more free memory I already reduced the ZFS memory amount to 8GB instead of the 50%(16gb)...

sure I can try to use 64GB memory now but I don't want to spent 200€ for a problem that is not solved afterwards

Getting out of memory warnings on host

Member

Proxmox Retired Staff

Distinguished Member

Member

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Distinguished Member

Member

Member

Member

Member

We value your privacy