Exaggerated cluster memory consumption

frankz · Feb 26, 2021

Hi everyone, after the last updates I noticed that the cluster in particular the main node, after about 30 minutes, starts to occupy 95% of the memory. This never happened, where can I inquire?

proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve) pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f) pve-kernel-5.4: 6.3-5 pve-kernel-helper: 6.3-5 pve-kernel-5.3: 6.1-6 pve-kernel-5.0: 6.0-11 pve-kernel-5.4.98-1-pve: 5.4.98-1 pve-kernel-5.4.78-2-pve: 5.4.78-2 pve-kernel-5.4.78-1-pve: 5.4.78-1 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.0.21-5-pve: 5.0.21-10 pve-kernel-5.0.15-1-pve: 5.0.15-1 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.0-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.0.7 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.1-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.3-4 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.1-1 libpve-storage-perl: 6.3-7 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.0.8-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.4-5 pve-cluster: 6.2-1 pve-container: 3.3-4 pve-docs: 6.3-1 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.2-2 pve-ha-manager: 3.1-1 pve-i18n: 2.2-2 pve-qemu-kvm: 5.2.0-2 pve-xtermjs: 4.7.0-3 qemu-server: 6.3-5 smartmontools: 7.1-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.3-pve1

aaron · Feb 26, 2021

Do you see any processes that take up all that memory?
top -o %MEM

Do you by any chance, have check_mk installed?

frankz · Feb 26, 2021

aaron said:
Do you see any processes that take up all that memory?
top -o %MEM

Do you by any chance, have check_mk installed?

First of all thanks for answering me, I'll list the top of node1. However, my report is legitimate as the cluster worked like a clock, after the last update in particular that relating to the ZFS features the result is this. I can't figure out which of the processes it eats from memory. It is also strange because all the VMs after a restart are in line and the memory is about 21 GB occupied, after 30 mn the result is this one in the attachment

frankz · Feb 26, 2021

aaron said:
Do you by any chance, have check_mk installed?

NO

aaron · Feb 26, 2021

Do you use ZFS? Then it is possible that ZFS is using RAM for its cache. If you run arcstat you can see how much it is using. By default it will take up to 50% of RAM if it is available. If it detects that it is needed, it will be released.

frankz · Feb 27, 2021

aaron said:
Do you use ZFS? Then it is possible that ZFS is using RAM for its cache. If you run arcstat you can see how much it is using. By default it will take up to 50% of RAM if it is available. If it detects that it is needed, it will be released.

Hi aaron, here's what I get through the arcstat command:
What am I seeing is the memory used? If so, how can I set a lower value and is it correct for a 32GB node?

However the whole cluster ie all nodes are at 95% after upgrading to ZFS futures. How can I do a rollback or change the ZFS cache? Thank you .

aaron · Mar 1, 2021

As you can see, the ARC has a size of 14G. I think that accounts for the higher usage?

If you run into some problems, for example a VM is not able to start due to not enough RAM available, you can try to limit the RAM usage of ZFS. But keep in mind that unused RAM is wasted RAM and ZFS will use it to serve read operations from cache.

frankz · Mar 1, 2021

aaron said:
As you can see, the ARC has a size of 14G. I think that accounts for the higher usage?

If you run into some problems, for example a VM is not able to start due to not enough RAM available, you can try to limit the RAM usage of ZFS. But keep in mind that unused RAM is wasted RAM and ZFS will use it to serve read operations from cache.

Thanks Aaron for answering me, however I had some information in this regard, and they told me to do some tests, that is by running an arcstat where I see 16 GB used and only 3GB available on the main node, I was written to modify the overall memory of a VM and see if the amount of GB of arcstat dropped in number. I modified a windows 2019 with 2048 MB and I changed it to 8192 MB. I started the machine despite the node reporting 95% ram used. I ran a shell and arcstat started decrementing the memory by 8GB. So I deduce that the indication of the node at 95% of the memory usage is a condition that can be defined in the norm, even if as written in my post, it had never happened to me before the ZFS all 2.x update. That's all .

aaron · Mar 2, 2021

No idea why ZFS didn't cache as much before that. I can only recommend to set up some monitoring system and to also monitor the ARC as well. From my experience, it can grow and shrink quite a bit. Also very interesting is to know how many read requests can be satisfied from the ARC directly without accessing the disks.

For my small production PVE installation it is usually in the 99% and up range with drops when doing backups or installing updates or new software in the VMs.

frankz · Mar 2, 2021

aaron said:
No idea why ZFS didn't cache as much before that. I can only recommend to set up some monitoring system and to also monitor the ARC as well. From my experience, it can grow and shrink quite a bit. Also very interesting is to know how many read requests can be satisfied from the ARC directly without accessing the disks.

For my small production PVE installation it is usually in the 99% and up range with drops when doing backups or installing updates or new software in the VMs.

Si Aaron, per quanto mi riguarda vedo che il comportamento lo si può definire un guardarlo di tipo random .... Comunque uso ZFS da poco su proxmox, la mia conoscenza è limitata ad alcune cose, come dataset, quote sui dataset, snapshot e altre piccole cose tra cui il ZRAID (5) che ho testato per simulare la rottura di una discoteca. Vedo che è molto complesso e mi accorgo che ha una gestione molto complessa nelle opzioni. Grazie per le tue info, inoltre spero che sia stato risolto un bug, da me postato tempo fà a proposito di ZFS e replica. https://forum.proxmox.com/threads/v...ut-errors-but-existed-on-other-volumes.81528/

Search

Search

Exaggerated cluster memory consumption

frankz

Well-Known Member

Attachments

aaron

Proxmox Staff Member

frankz

Well-Known Member

Attachments

frankz

Well-Known Member

aaron

Proxmox Staff Member

frankz

Well-Known Member

Attachments

aaron

Proxmox Staff Member

frankz

Well-Known Member

aaron

Proxmox Staff Member

frankz

Well-Known Member

We value your privacy