Proxmox gradually takes all the RAM

goodtm

New Member
Nov 27, 2023
14
0
1
Hi. My problem is that proxmox gradually consumes all the RAM and then it freezes tightly. The hypervisor itself is installed on two nvme disks, 256 GB each, in the ZFS mirror. This is a new installation, I hardly even use any containers or virtual machines yet. Now there is only a container with glpi, which is not even installed. 32GB of RAM is installed on the server (this is a normal user computer). In addition to the two nvme disks in the ZFS mirror, the other disks are not formatted. I don't understand what's going on with the memory, even taking into account the fact that ZFS can take 50% of the memory, my memory completely leaks after a few days.

root@pvs:~# free -h
total used free shared buff/cache available
Mem: 30Gi 29Gi 1.2Gi 35Mi 387Mi 1.1Gi
Swap: 0B 0B 0B

root@pvs:~# nano /etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=5368709100
options zfs zfs_arc_max=5368709120
 
Last edited:
Hi,
could you post the output of top -o %MEM -b -n 1, cat /proc/meminfo and cat /proc/spl/kstat/zfs/arcstats? Those might give us a hint what is actually using the memory.
 
So far it does look normal, but I assume that when you created those files, you were not experiencing the problem?
Could you send them again, when you are seeing the abnormal memory usage? I'll be watching this thread, so don't worry if it takes a few days like you described in your first post.
 
The fact is that I have always had this, from the moment I installed the hypervisor. I used the official Proxmox distribution. During installation I used a ZFS mirror. Even if there are no virtual machines or containers, the memory gradually fills up completely within a few days and the hypervisor freezes. I have a computer with an AMD processor.
at 12:23, 6.58 GB of RAM was busy; at 13:32, 6.9 GB of RAM was already occupied. Now only the hypervisor is running, the only container is not running.
 
Unfortunately at those relatively small differences, it's hard to make out what is using up the ram. When reaching higher numbers, let's say 15-20GB, it should hopefully be obvious which process is leaking memory, or if zfs runs haywire.
In the meantime, could you check if your installation is up to date?
 
Hi,
have you been able to record a high ram usage situation?
 
RAM usage 94.19% (28.73 GiB of 30.50 GiB)
 

Attachments

  • Screenshot 2023-12-01 220209.jpg
    Screenshot 2023-12-01 220209.jpg
    63.8 KB · Views: 26
ZFS in Proxmox does not take 50%, but takes almost all the memory. The zfs zfs_arc_max options entry does not limit memory consumption in any way
 

Attachments

  • Screenshot 2023-12-07 140823.jpg
    Screenshot 2023-12-07 140823.jpg
    84.8 KB · Views: 16
ZFS in Proxmox does not take 50%, but takes almost all the memory. The zfs zfs_arc_max options entry does not limit memory consumption in any way
Please post the output or arcstat in CODE tags and also post how you think that you've limited the zfs_arc_max parameter.
 
Code:
arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
16:01:29     0       0     0       0     0      0    0   1.3G   4.0G  -749M
 
nano /etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=4294967296
options zfs zfs_arc_max=6442450944
 
So 4G is your maximum ZFS, only 1.3G are used and the rest ist somewhere else. The limit is working correctly.

Can you post the output of top -o %MEM once more?
 
Code:
top - 21:19:32 up 5 days, 22:33,  3 users,  load average: 2.05, 2.07, 2.08
Tasks: 406 total,   3 running, 403 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  6.9 sy,  0.0 ni, 93.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31235.8 total,    345.7 free,  30935.5 used,    376.0 buff/cache     
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    300.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                             
   1641 www-data  20   0  234636 153480  18176 S   0.0   0.5   0:06.01 pveproxy                                                                             
1498056 www-data  20   0  247836 150920   9856 S   0.0   0.5   0:01.33 pveproxy worker                                                                     
1499819 www-data  20   0  244524 145672   8064 S   0.0   0.5   0:01.03 pveproxy worker                                                                     
1495658 www-data  20   0  243324 145288   8320 S   0.0   0.5   0:01.90 pveproxy worker                                                                     
1460833 www-data  20   0  248212 144900   3328 S   0.0   0.5   0:01.38 pveproxy worker                                                                     
1446318 root      20   0  241712 142156   7040 S   0.0   0.4   0:00.94 pvedaemon worke                                                                     
1449285 root      20   0  241868 142028   6784 S   0.0   0.4   0:00.75 pvedaemon worke                                                                     
1449316 root      20   0  241820 141900   6656 S   0.0   0.4   0:00.66 pvedaemon worke                                                                     
1505492 root      20   0  241712 138184   2944 S   0.0   0.4   0:00.00 task UPID:pvs:0                                                                     
1455219 root      20   0  241212 137880   2944 S   0.0   0.4   0:00.56 task UPID:pvs:0                                                                     
   1629 root      20   0  233304 136776   2688 S   0.0   0.4   0:04.41 pvedaemon                                                                           
 303589 100110    20   0 1412016 123696  16000 S   0.0   0.4   1:30.69 mariadbd                                                                             
   1655 root      20   0  214508 113316   2688 S   0.0   0.4   0:22.81 pvescheduler                                                                         
   1601 root      20   0  157308  98884   4352 S   0.0   0.3  10:25.79 pve-firewall                                                                         
   1603 root      20   0  152232  96116   6784 S   0.0   0.3  14:02.04 pvestatd                                                                             
   1648 www-data  20   0   80788  59648   9344 S   0.0   0.2   0:04.59 spiceproxy                                                                           
 813453 www-data  20   0   81044  53340   2944 S   0.0   0.2   0:04.51 spiceproxy work                                                                     
   1514 root      20   0  581964  52776  40536 S   0.0   0.2   2:20.31 pmxcfs                                                                               
 303426 100000    20   0   64252  34848  33952 S   0.0   0.1   0:02.99 systemd-journal                                                                     
 303571 100000    20   0  263128  26108  18056 S   0.0   0.1   0:10.38 apache2                                                                             
    614 root      20   0   54696  23812  22788 S   0.0   0.1   0:01.32 systemd-journal                                                                     
 303487 100000    20   0   30084  16000   7296 S   0.0   0.1   0:00.02 networkd-dispat                                                                     
1295803 100033    20   0  263468  14168   6016 S   0.0   0.0   0:00.00 apache2                                                                             
1295804 100033    20   0  263468  14168   6016 S   0.0   0.0   0:00.00 apache2                                                                             
1295805 100033    20   0  263468  14168   6016 S   0.0   0.0   0:00.00 apache2                                                                             
1295806 100033    20   0  263468  14168   6016 S   0.0   0.0   0:00.00 apache2                                                                             
1295807 100033    20   0  263468  14168   6016 S   0.0   0.0   0:00.00 apache2                                                                             
 303492 100106    20   0   25528  10976   6784 S   0.0   0.0   0:00.32 systemd-resolve                                                                     
 303295 100000    20   0  182080   9216   6528 S   0.0   0.0   0:02.73 systemd                                                                             
 257217 root      20   0   19208   8832   6912 S   0.0   0.0   0:00.06 systemd                                                                             
      1 root      20   0  168448   8704   5632 S   0.0   0.0   0:02.21 systemd                                                                             
 303472 100105    20   0   16120   6528   5632 S   0.0   0.0   0:00.38 systemd-network                                                                     
   1260 root      20   0   50040   6144   5120 S   0.0   0.0   0:00.45 systemd-logind                                                                       
   1416 root      20   0   15408   5760   4480 S   0.0   0.0   0:00.00 sshd
 
Sorry for the absence, I was unexpectedly out of office.
I think we can rule out zfs at this point, especially because it would free up the cache if memory gets tight.
The same goes for buffers/caches and top shows no application that uses up the memory.

That leaves us with the kernel itself. My guess at this point would be, that some kernel module, i.e. a device driver, has a memory leak, which is not that easy to track. Could you try booting to an older kernel version and see if the problem persists?
 
Good afternoon. And so, I did not install the old version of PVE, but installed the latest version 8.2.1 and immediately updated to 8.2.2. As follows from the description, "For new installations starting with Proxmox VE 8.1, the ARK usage limit will be set to 10% of the installed physical memory".. I checked my settings and I have options zfs zfs_arc_max=3278897152. But, unfortunately, this statement does not stand up to me. I have updated the BIOS firmware of my MSI motherboard to the latest version. I have installed a new version of PVE from scratch, there are no virtual machines and containers. I only have a PVE installation on a mirror with ZFS. Memory is gradually leaking away, and it already takes up more than 10% - RAM usage 13.64% (4.16 Gb of 30.53 GiB). What's going on? How can I make PVE work stably without memory leaks?
 
I installed version 7.4-3. On this version, RAM is not leaking as fast as on the latest version. It's been 4 days and I have only 5.96% RAM usage (1.82 Gb of 30.52 GiB). There are no virtual machines or containers. Only a clean install on RAID 1 ZFS. Memory is consumed, but very, very slowly. I will probably need to install virtual machines to check if there is a memory leak or not. And for some reason, no one is responding to this problem. It can be seen that proxmox has problems with high memory consumption. There are many topics with high RAM consumption.
 
I wouldnt say there is a problem. At least for me, i have 10 or more Servers now, none is leaking Ram. Some have a lot of uptime, at least the uptime from the last time i needed to reboot, because of updates.
2 Servers are new, started with 8.2 and all other Server started at least with PVE 7, now everything is on 8.2.2.

Not a single one has any sort of leakage, all of them run as expected. No crashes or anything else.
Some Servers especially those who have large arrays of extremely fast enterprise nvme drives, are set to primarycache=metadata instead of "all", this is because the nvme array is so fast, that memory caching has no benefit, but it saves a ton of momery, since there is almost no ARC in memory.
On the other side, i have a backup server, which is set to primarycache=all and i even increased arc_min and arc_max, to consume at least ~400GB of memory just for the ZFS Pool. This has performance reasons either, since the Backup ZFS pool has deduplication/compression with zstd/and a lot of other things enabled and it uses an array of slow HDD Drives without SLOG or Special Device.
On 2 Servers primarycache is default "all" and arc_min/max is default and i see nothing unusual, the memory consumption is a little over 50% which is expected with VM's.

So i wouldn't generalize it, that there is any Problem.
I have even 2x Intel NUCs as PVE Hypervisors, that aren't Servers. One of them is an 8th gen with 8gb memory, the other is 11gen with 16gb.
I think i have even an 13th gen NUC as PVE, not sure as i have to much PVE Hypervisors tbh.
However some of them have Stock ZFS Settings, on one i think i setted primarycache=metadata, i think on the one with only 8GB of memory, but it has at least an NVME Drive.
Those Crap Consumer Hardware behaves on PVE here like it should either and i have zero issues.
I just would never buy minis forums or any other Chinese Brand, which produces hardware without knowledge and non existent bios updates or buggy bioses. But thats my personal opinion, everyone has an own opinion about that.

However this writeup is just to tell you, that there is no general PVE issue and thats the case why no one replies.
Your issue is simply highly specific to your setup or hardware. Could be some sort of misconfiguration, could be some sort of buggy bios or wrong settings, but it could be that indeed ZFS or the Kernel or something "leaks".

However, you could provide an "dmesg --level err" and "dmesg --level warn" output + "journalctl -p 3 -xb" output, those could provide maybe some infos. Basically checking all possible errors/warnings at this point may show the issue if you are lucky.

Cheers
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!