PVE9 Memory management problems

Ballistic

Active Member
Oct 4, 2018
26
2
43
24
Hello,

Since upgrading to 9, I have big memory management problems on my systems.
Storage is local NVME ZFS pool.

1: VM memory reported wrong.
1755853589850.png
Latest QEMU agent is installed ofcourse (IP adresses are detected). Proxmox reports wrong stats for alot of VM's.
This VM is only using ~3GB

2: ZFS ARC eating all memory. On PVE8, both my systems where always in the 80% range regarding memory usage which was fine. Now ARC eats up to 150GB (of 256GB) of memory.
1755853889953.png
Since PVE9, the machine got up to ~90% usage pretty quickly. This image shows me starting another 8GB VM which it happily took from the free memory, instead of giving back from ARC. Total memory usage is now 95%! and even started swapping;
1755854008177.png

I really regret updating to PVE9 so quickly. Anyone experiencing the same issues?
 
Has that host been installed a while ago? Because, IIRC, since about 8.1, the installer limits the ARC by default. If you installed earlier, you can manually set a limit on the ARC: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_limit_memory_usage

What has changed with the newer ZFS 2.3 is that it will by default claim up to 90% of the memory if available, but it is also easier for the kernel to reclaim that memory if it needs it.

With 256G of memory, you might want to disable swap completely. Those 8 G of swap won't help much anyway if the 256 G of mem are full.
To do so, run swapoff -a and then comment out the swap line in /etc/fstab.

One more note regarding swap itself: It is used even if plenty of RAM is still available. See https://chrisdown.name/2018/01/02/in-defence-of-swap.html for more details.
 
I think the initial install of the 2 hosts (both show same problems) was with PVE7 so yeah its a few years ago.

Usually I disable swap but it's enabled again every update and/or reboot. I have now set it back to 10% as I usually do.

"What has changed with the newer ZFS 2.3 is that it will by default claim up to 90% of the memory if available"
I guess that explains the change in behaviour.

Thanks for the tip _gabriel. I'll think about doing this. Unused memory is wasted memory though :) but capping it to 80% usage might be smart so my monitoring can still report it to me if something goes wrong and usage goes over 80%
 
  • Like
Reactions: Johannes S
I did not change anything and out of the blue this happened;

1755936643865.png


Only on one of my nodes. The other still runs at 90%+ memory usage while this host magically seems to have ARC limited to 27GB now...