Why is proxmox running out of memory?

yswery · Apr 18, 2022

We have a PVE node (7.1) with 32GB RAM

We have only 6 LXC containers running on this node, each (according to PVE WebUI) is using approx. 100MB of ram (max allowed per LXC container is 512MB)

So my question is... Why and where did all the RAM go? yes we run ZFS but seems thats only using 509.6 MiB

Screen Shot 2022-04-18 at 9.19.51 AM.png

proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve)
pve-manager: 7.1-12 (running version: 7.1-12/b3c09de3)
pve-kernel-helper: 7.1-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-5-pve: 5.13.19-13
pve-kernel-5.13.19-4-pve: 5.13.19-9
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-7
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-6
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Code:

------------------------------------------------------------------------
ZFS Subsystem Report                            Mon Apr 18 16:28:30 2022
Linux 5.13.19-4-pve                                           2.1.2-pve1
Machine: elbrus (x86_64)                                      2.1.2-pve1

ARC status:                                                    THROTTLED
        Memory throttle count:                                     90665

ARC size (current):                                    50.5 %  517.0 MiB
        Target size (adaptive):                        50.0 %  512.0 MiB
        Min size (hard limit):                         50.0 %  512.0 MiB
        Max size (high water):                            2:1    1.0 GiB
        Most Frequently Used (MFU) cache size:         45.6 %  195.8 MiB
        Most Recently Used (MRU) cache size:           54.4 %  233.4 MiB
        Metadata cache size (hard limit):              75.0 %  768.0 MiB
        Metadata cache size (current):                 15.9 %  122.4 MiB
        Dnode cache size (hard limit):                 10.0 %   76.8 MiB
        Dnode cache size (current):                    30.3 %   23.3 MiB

options zfs zfs_arc_min=536870911
options zfs zfs_arc_max=1073741824
options zfs l2arc_noprefetch=0

Code:

total        used        free      shared  buff/cache   available
Mem:            31Gi        29Gi       854Mi        68Mi       909Mi       373Mi
Swap:             0B          0B          0B

dcsapak · Apr 19, 2022

check top or htop to see which process consumes the memory (sort by memory)

corsinvest · Apr 19, 2022

You can also use smem -tw
generally ZFS uses a lot of free memory to manage storage pools.

The bigger the pools are, more RAM will be used

LnxBil · Apr 19, 2022

corsinvest said:
The bigger the pools are, more RAM will be used

Generally not true. You have the arc_max setting which is a hard upper limit.

Counter example: If you have 8 GB for arc_max and a 8 GB pool, you can already run into the 8 GB arc_max limit if you cache your pool.

LnxBil · Apr 19, 2022

yswery said:
total used free shared buff/cache available
Mem: 31Gi 29Gi 854Mi 68Mi 909Mi 373Mi
Swap: 0B 0B 0B

That does not correspond to the screenshot you took.

corsinvest · Apr 20, 2022

LnxBil said:
Generally not true. You have the arc_max setting which is a hard upper limit.

Counter example: If you have 8 GB for arc_max and a 8 GB pool, you can already run into the 8 GB arc_max limit if you cache your pool.

@LnxBil,
Be careful with limiting ARC size, as the Proxmox wiki says: "Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage."

https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

yswery · Apr 20, 2022

dcsapak said:
check top or htop to see which process consumes the memory (sort by memory)

Thats the thing... It doesnt show anything special? I have attached a screenshot of the htop order by MEM

Screen Shot 2022-04-20 at 11.26.31 AM.png

corsinvest said:
You can also use smem -tw

I am not too sure which processes are taking this memory?

Code:

root @ elbrus ➜  ~  smem -tw
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory      30019992     775008   29244984
userspace memory            1773932     192116    1581816
free memory                 1042492    1042492          0
----------------------------------------------------------
                           32836416    2009616   30826800

LnxBil said:
That does not correspond to the screenshot you took.

There was formatting issue, here is the same content, but formatted better:

Code:

               total        used        free      shared  buff/cache   available
Mem:            31Gi        29Gi       1.0Gi        71Mi       990Mi       623Mi
Swap:             0B          0B          0B

Does anyone have ideas why this is the case. I did try to reboot and everything went back to what I consider is normal but after a few days its back to max memory use again (with out any special use what so ever)

leesteken · Apr 20, 2022

I though it might be tmpfs mounts that are growing, but my local test ruled that out because that shows (in top) as buff/cache, reducing free but not increasing used.
A search on "kernel dynamic memory" only brings up possible issues with zram and kernel space leaks, nothing concrete.

LnxBil · Apr 21, 2022

corsinvest said:
@LnxBil,
Be careful with limiting ARC size, as the Proxmox wiki says: "Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage."

https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

The rule is correct, but that was not my point. Your sentence

corsinvest said:
The bigger the pools are, more RAM will be used

Is technically not correct. The maximum ram used depends on the given RAM size and the arc_max setting (if set) and can be further limited by running processes but does not depend on the size of the pool. Of course more RAM is always better for performance (also not non-ZFS related issues) than having to live with less.

LnxBil · Apr 21, 2022

yswery said:
I am not too sure which processes are taking this memory?

smem can also show that to some extend:

Code:

$ smem --columns="vss command" -r | head -10
     VSS Command
39596384 /usr/bin/kvm -id 12000 -nam
22042708 /usr/bin/kvm -id 2023 -name
19746812 /usr/bin/kvm -id 1021 -name

In my case, every process uses a 6-7 GB more than configured. 12000 uses 32 GB, 2023 uses 16GB, 1021 uses also 16GB.

The problem with every accounting IMHO is that you have to differentiate between shared and non-shared memory. VSS does list all mapped content, so some of the kvm/qemu stuff is mapped more than once and therefore also counted more than once. Another aspect is the caching setting of your VM disks, if you something other than None, your host will cache stuff in useland memory, because kvm runs in userspace. So you should see an decrease in memory consomption for VMs with None as VM disk cache setting.

So please compare the configured memory setting of you machines with the actual memory usage. Also check the VM caching settings

yswery · Apr 21, 2022

LnxBil said:
smem --columns="vss command" -r | head -10

this is my output:

Code:

$ smem --columns="vss command" -r | head -10
     VSS Command
  657056 /usr/sbin/corosync -f
  323904 pvescheduler
  329556 pve-ha-crm
  329232 pve-ha-lrm
  270292 pve-firewall
  262108 pvestatd
  366600 pveproxy worker
  360748 pveproxy worker
  838876 /usr/bin/pmxcfs

Consider the top processes uses not even 1GB, I cant understand where I can find the actual (non cache) used of memory, or is free/htop misreporting?

LnxBil · Apr 21, 2022

Does not seems to be sorted, is it?

Just read that you only use LXC, so in your list are only PVE processes. If you only have one node, you can maybe shutdown corosync, pve-ha-crm and pve-ha-lrm and get a little space from them back.

Search

Search

Why is proxmox running out of memory?

yswery

Well-Known Member

dcsapak

Proxmox Staff Member

corsinvest

Renowned Member

LnxBil

Distinguished Member

LnxBil

Distinguished Member

corsinvest

Renowned Member

yswery

Well-Known Member

leesteken

Distinguished Member

LnxBil

Distinguished Member

LnxBil

Distinguished Member

yswery

Well-Known Member

LnxBil

Distinguished Member