Cluster nodes uses too much RAM

andrea68 · Jan 4, 2021

Hi,

I have a little project with 3 nodes 64GB ram each
I use gluster on top ZFS as network filesystem on dedicated Nic's at 10GB ad another 10GB for corosynch (plus a third 10GB for internet access).

I know that ZFS use from 4 to 8GB of RAM, but the problem is that much more ram missing and I don't know why.
For example: on node one I have 4 VM (KVM) that uses in total 22GB of ram, but proxmox said that memory in use is 88% (55GB).
33GB of ram for proxmox and ZFS seems too much for me...
In the others 2 nodes the situation is pretty much the same...
Can you help me on this?

oguz · Jan 4, 2021

hi,

what do you see if you run free -mh on the pve host?

andrea68 · Jan 4, 2021

total used free shared buff/cache available
Mem: 62Gi 55Gi 6.3Gi 63Mi 1.3Gi 6.9Gi
Swap: 31Gi 60Mi 31Gi

I guess the problem could be the ballooning?
Maybe can I allocate some swap disk from the Os disk? (is a mdm raid1 of 2 NVMe 1TB)

oguz · Jan 4, 2021

what about ps aux --sort=-%mem | head -n 10 ?

which processes are using the memory this much?

andrea68 · Jan 4, 2021

oguz said:
what about ps aux --sort=-%mem | head -n 10 ?

which processes are using the memory this much?

Seems qemu the responsible, this are the first lines:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 23934 17.3 13.0 10117580 8576248 ? Sl Jan03 203:37 /usr/bin/kvm -id 114
root 27209 4.6 9.4 8048300 6194424 ? Sl Jan02 135:02 /usr/bin/kvm -id 104
root 31518 2.5 5.7 5055676 3796468 ? Sl Jan02 74:13 /usr/bin/kvm -id 106
root 1775 0.4 0.2 1905748 194776 ? Ssl 2020 35:40 /usr/sbin/glusterfsd -s stor01 --volfile-id DATASTORE

oguz · Jan 4, 2021

can you post: qm config 114 ?

andrea68 · Jan 4, 2021

oguz said:
can you post: qm config 114 ?

It's a very simple conf:

boot: order=scsi0;ide2;net0
cores: 1
ide2: none,media=cdrom
memory: 8192
name: web.domain.name
net0: virtio=7E:B2:83:AE:88:69,bridge=vmbr4001,firewall=1
numa: 0
ostype: l26
scsi0: DATASTORE:114/vm-114-disk-0.qcow2,size=160G
scsihw: virtio-scsi-pci
smbios1: uuid=8dd29b52-d5c9-4cab-a5e8-52d3451cb04e
sockets: 1
vmgenid: 3ded56db-fcae-49a1-a77e-d97aafe92573

oguz · Jan 4, 2021

ok, looks normal.

maybe you can limit ZFS memory usage [0]

you can also try running echo 3 | tee /proc/sys/vm/drop_caches to drop caches (however cache seems to be fine...)

[0]: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

andrea68 · Jan 4, 2021

Tuning manually ZFS could be risky and tricky.
There is another approach to work with a network filesystem?
I originally think to Ceph that don't need ZFS but handle disks as single OSD.
But I have only 10GB dedicated to each node for network storage and seems not enough.

Thanks anyway.

andrea68 · Jan 25, 2021

[UPDATE] It's all very strange and odd: I upgraded all 3 nodes from 64GB ECC to 128GB ECC ram.
Well: with the SAME vm's on top with same memory usage for KVM, all 3 hosts proxmox take another 20GB each...

For example: in this node there are 4 KVM for a total of 16GB (2+4+4+6) used, before ram upgrade the RAM usage are about 50GB (so: about 44GB for proxmox). Now with ram upgrade the situation is pictured in the photo: 82G USED (65%) with same 4 VM as before! Almost 66GB for proxmox!!!

Seems to me very difficult to believe that this is only for ZFS.
Can anyone can explain how this is happening and why?

Thanks on advance.

andrea68 · Jan 25, 2021

oguz said:
ok, looks normal.

maybe you can limit ZFS memory usage [0]

you can also try running echo 3 | tee /proc/sys/vm/drop_caches to drop caches (however cache seems to be fine...)

[0]: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

I try this command and RAM usage drop from 65% (82GB) to 18% (23GB)....
I don't understand why cache is so high after a while... (it's increasing again)

After the command:

oguz · Jan 26, 2021

andrea68 said:
Seems to me very difficult to believe that this is only for ZFS.
Can anyone can explain how this is happening and why?

don't worry! take a look here [0]

[0]: https://www.linuxatemyram.com/

andrea68 · Jan 26, 2021

Ok, thanks, I know cache is good, but how can I add more VM if system eat all available ram?
I have 4 VM with total of 16GB on a host with 128GB of ram and I'm not totally confident to work wit a 90% of RAM occupied...

H4R0 · Jan 26, 2021

You have to configure ZFS, by default arc uses 50% of total RAM.

You can add as much RAM as you like it will still use 50% if not limited.

drop_caches clears arc so it temporarily goes down.

Has nothing to do with proxmox.

andrea68 · Jan 26, 2021

Thank you: do you have an idea on how much arc does it need?
What is a good configuration in my scenario?
(every host has 128GB ram, and 6 SSD 1TB in ZFS for data)

H4R0 · Jan 26, 2021

andrea68 said:
Thank you: do you have an idea on how much arc does it need?
What is a good configuration in my scenario?
(every host has 128GB ram, and 6 SSD 1TB in ZFS for data)

It depends on the pool size and workload how low you can go.

Minimum would be 1GB for 1TB pool size so 6GB to cache only metadata.

To cache frequently read files and metadata I would recommend you 16-24GB.

More arc is always better for performance.

Code:

cat << 'EOF' > /etc/modprobe.d/zfs.conf
# set zfs arc size 16-24GB XX * 1024³
options zfs zfs_arc_min=17179869184
options zfs zfs_arc_max=25769803776
EOF

update-initramfs -u

reboot

Search

Search

Cluster nodes uses too much RAM

andrea68

Renowned Member

Attachments

oguz

Proxmox Retired Staff

andrea68

Renowned Member

oguz

Proxmox Retired Staff

andrea68

Renowned Member

oguz

Proxmox Retired Staff

andrea68

Renowned Member

oguz

Proxmox Retired Staff

andrea68

Renowned Member

andrea68

Renowned Member

andrea68

Renowned Member

oguz

Proxmox Retired Staff

andrea68

Renowned Member

H4R0

Well-Known Member

andrea68

Renowned Member

H4R0

Well-Known Member

We value your privacy