Exaggerated cluster memory consumption

frankz

Member
Nov 16, 2020
359
23
23
Hi everyone, after the last updates I noticed that the cluster in particular the main node, after about 30 minutes, starts to occupy 95% of the memory. This never happened, where can I inquire?


proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve) pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f) pve-kernel-5.4: 6.3-5 pve-kernel-helper: 6.3-5 pve-kernel-5.3: 6.1-6 pve-kernel-5.0: 6.0-11 pve-kernel-5.4.98-1-pve: 5.4.98-1 pve-kernel-5.4.78-2-pve: 5.4.78-2 pve-kernel-5.4.78-1-pve: 5.4.78-1 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.0.21-5-pve: 5.0.21-10 pve-kernel-5.0.15-1-pve: 5.0.15-1 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.0-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.0.7 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.1-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.3-4 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.1-1 libpve-storage-perl: 6.3-7 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.0.8-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.4-5 pve-cluster: 6.2-1 pve-container: 3.3-4 pve-docs: 6.3-1 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.2-2 pve-ha-manager: 3.1-1 pve-i18n: 2.2-2 pve-qemu-kvm: 5.2.0-2 pve-xtermjs: 4.7.0-3 qemu-server: 6.3-5 smartmontools: 7.1-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.3-pve1
 

Attachments

  • mem1.png
    mem1.png
    44.4 KB · Views: 9
  • mem2.png
    mem2.png
    57.8 KB · Views: 8
Do you see any processes that take up all that memory?
top -o %MEM

Do you by any chance, have check_mk installed?
 
Do you see any processes that take up all that memory?
top -o %MEM

Do you by any chance, have check_mk installed?
First of all thanks for answering me, I'll list the top of node1. However, my report is legitimate as the cluster worked like a clock, after the last update in particular that relating to the ZFS features the result is this. I can't figure out which of the processes it eats from memory. It is also strange because all the VMs after a restart are in line and the memory is about 21 GB occupied, after 30 mn the result is this one in the attachment
 

Attachments

  • mem.png
    mem.png
    116 KB · Views: 11
Do you use ZFS? Then it is possible that ZFS is using RAM for its cache. If you run arcstat you can see how much it is using. By default it will take up to 50% of RAM if it is available. If it detects that it is needed, it will be released.
 
  • Like
Reactions: frankz
Do you use ZFS? Then it is possible that ZFS is using RAM for its cache. If you run arcstat you can see how much it is using. By default it will take up to 50% of RAM if it is available. If it detects that it is needed, it will be released.
Hi aaron, here's what I get through the arcstat command:
What am I seeing is the memory used? If so, how can I set a lower value and is it correct for a 32GB node?

However the whole cluster ie all nodes are at 95% after upgrading to ZFS futures. How can I do a rollback or change the ZFS cache? Thank you .
 

Attachments

  • arcstat.png
    arcstat.png
    27.5 KB · Views: 8
  • arcstat_summary.pdf
    45.9 KB · Views: 2
Last edited:
As you can see, the ARC has a size of 14G. I think that accounts for the higher usage?

If you run into some problems, for example a VM is not able to start due to not enough RAM available, you can try to limit the RAM usage of ZFS. But keep in mind that unused RAM is wasted RAM and ZFS will use it to serve read operations from cache.
 
  • Like
Reactions: frankz
As you can see, the ARC has a size of 14G. I think that accounts for the higher usage?

If you run into some problems, for example a VM is not able to start due to not enough RAM available, you can try to limit the RAM usage of ZFS. But keep in mind that unused RAM is wasted RAM and ZFS will use it to serve read operations from cache.
Thanks Aaron for answering me, however I had some information in this regard, and they told me to do some tests, that is by running an arcstat where I see 16 GB used and only 3GB available on the main node, I was written to modify the overall memory of a VM and see if the amount of GB of arcstat dropped in number. I modified a windows 2019 with 2048 MB and I changed it to 8192 MB. I started the machine despite the node reporting 95% ram used. I ran a shell and arcstat started decrementing the memory by 8GB. So I deduce that the indication of the node at 95% of the memory usage is a condition that can be defined in the norm, even if as written in my post, it had never happened to me before the ZFS all 2.x update. That's all .
 
  • Like
Reactions: aaron
No idea why ZFS didn't cache as much before that. I can only recommend to set up some monitoring system and to also monitor the ARC as well. From my experience, it can grow and shrink quite a bit. Also very interesting is to know how many read requests can be satisfied from the ARC directly without accessing the disks.

For my small production PVE installation it is usually in the 99% and up range with drops when doing backups or installing updates or new software in the VMs.
 
No idea why ZFS didn't cache as much before that. I can only recommend to set up some monitoring system and to also monitor the ARC as well. From my experience, it can grow and shrink quite a bit. Also very interesting is to know how many read requests can be satisfied from the ARC directly without accessing the disks.

For my small production PVE installation it is usually in the 99% and up range with drops when doing backups or installing updates or new software in the VMs.
Si Aaron, per quanto mi riguarda vedo che il comportamento lo si può definire un guardarlo di tipo random .... Comunque uso ZFS da poco su proxmox, la mia conoscenza è limitata ad alcune cose, come dataset, quote sui dataset, snapshot e altre piccole cose tra cui il ZRAID (5) che ho testato per simulare la rottura di una discoteca. Vedo che è molto complesso e mi accorgo che ha una gestione molto complessa nelle opzioni. Grazie per le tue info, inoltre spero che sia stato risolto un bug, da me postato tempo fà a proposito di ZFS e replica. https://forum.proxmox.com/threads/v...ut-errors-but-existed-on-other-volumes.81528/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!