Proxmox fails badly in low memory conditions

Sep 13, 2022
69
9
8
Hi,

I analyzed some strange behavior (including not being able to login) and I think I found the root cause by a shortage of main memory, leading to OOM killer various unneeded tasks such as systemd and sshd, which can lead to arbitrary bad and unreasonable behavior. A reboot solved them all, fortunately.

The machine was doing a disk load test (writing pseudo random data in big files, with random offsets, and comparing hashes of these big files). It has 48 GB RAM. There are to VMs with 16 GB each and one container with 4 GB, so 36 GB in total, leaving 12 GB for PVE+ZFS.
The "memory usage" graph in "node | summary" in the web GUI shows for "Day maximum" show 48 Gi total (46,86Gi) and has a peak in "RAM usage" 43.58 Gi at 16:00:00, 36.62 GB at 17:00:00 and 38.63 GB at 17:30:00 (no value in between). So no visible out of memory condition here.

But there was.

journalctl contained:

Code:
root@pve:/var/log# journalctl |grep "out of memory"
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1395069 ((sd-pam)) total-vm:168576kB, anon-rss:2956kB, file-rss:0kB, shmem-rss:0kB, UID:100000 pgtables:92kB oom_score_adj:100
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1395068 (systemd) total-vm:18716kB, anon-rss:1408kB, file-rss:128kB, shmem-rss:0kB, UID:100000 pgtables:76kB oom_score_adj:100
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1119534 (systemd) total-vm:168744kB, anon-rss:3712kB, file-rss:0kB, shmem-rss:0kB, UID:100000 pgtables:92kB oom_score_adj:0
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1395779 (sshd) total-vm:17956kB, anon-rss:1792kB, file-rss:128kB, shmem-rss:0kB, UID:100000 pgtables:72kB oom_score_adj:0
root@pve:/var/log#

(NB: This is a the real complete output of "journalctl | grep "out of memory", I did not select the processes or shortened output))

Selecting systemd, sd-pam and sshd to be killed seems to be close to the worsest posible choice.

Is there a way to improve such behavior?

I know that OOM is hard to deal with, but could possible containers put in somehow "lower priority" or such? In my case, the 4 GB container is just a test container and I had preferred if it had been shutdown instead. At least, a visible error in such a fatal situation in the web GUI would be good (although in my case of couse I could not even login).
 
Last edited:
Note that ZFS can use up to half of the system memory for caching, 24GB in this case. ZFS will free memory if needed, but this process might be slower than the rate on which the memory is being filled leading to memory being full. You could try setting ZFS's arc, see [1] for example.

As was already mentioned, you should have enough swap available too.

[1] https://forum.proxmox.com/threads/disable-zfs-arc-or-limiting-it.77845/
 
And you can set the swappiness to "0". In that case the swap won't be used at all, except the server is really running out of RAM. Only then it will swap out to prevent OOM.
In case you fear that swapping will hurt the disks life expectation or tje servers performance.
 
Last edited:
Hi,

thank you all for your replies. I had no swap configured (and sufficient memory). Proxmox installation does not reserve swap and I followed the valuable defaults.

Now I added a bit of swap, but I don't think this makes it safe: if ZFS takes too long to release memory, it still may be problematic, because surely ZFS "cache" memory would not be swapped (what would be sense of memory-caching disk and backing up that cache memory to disk again? Usually disk caches shall never swap but just get released).

So I don't think swap solves the issue. Am I wrong?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!