Proxmox fails badly in low memory conditions

sdettmer · Jul 22, 2023

Hi,

I analyzed some strange behavior (including not being able to login) and I think I found the root cause by a shortage of main memory, leading to OOM killer various unneeded tasks such as systemd and sshd, which can lead to arbitrary bad and unreasonable behavior. A reboot solved them all, fortunately.

The machine was doing a disk load test (writing pseudo random data in big files, with random offsets, and comparing hashes of these big files). It has 48 GB RAM. There are to VMs with 16 GB each and one container with 4 GB, so 36 GB in total, leaving 12 GB for PVE+ZFS.
The "memory usage" graph in "node | summary" in the web GUI shows for "Day maximum" show 48 Gi total (46,86Gi) and has a peak in "RAM usage" 43.58 Gi at 16:00:00, 36.62 GB at 17:00:00 and 38.63 GB at 17:30:00 (no value in between). So no visible out of memory condition here.

But there was.

journalctl contained:

Code:

root@pve:/var/log# journalctl |grep "out of memory"
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1395069 ((sd-pam)) total-vm:168576kB, anon-rss:2956kB, file-rss:0kB, shmem-rss:0kB, UID:100000 pgtables:92kB oom_score_adj:100
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1395068 (systemd) total-vm:18716kB, anon-rss:1408kB, file-rss:128kB, shmem-rss:0kB, UID:100000 pgtables:76kB oom_score_adj:100
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1119534 (systemd) total-vm:168744kB, anon-rss:3712kB, file-rss:0kB, shmem-rss:0kB, UID:100000 pgtables:92kB oom_score_adj:0
Jul 21 17:07:54 pve kernel: Memory cgroup out of memory: Killed process 1395779 (sshd) total-vm:17956kB, anon-rss:1792kB, file-rss:128kB, shmem-rss:0kB, UID:100000 pgtables:72kB oom_score_adj:0
root@pve:/var/log#

(NB: This is a the real complete output of "journalctl | grep "out of memory", I did not select the processes or shortened output))

Selecting systemd, sd-pam and sshd to be killed seems to be close to the worsest posible choice.

Is there a way to improve such behavior?

I know that OOM is hard to deal with, but could possible containers put in somehow "lower priority" or such? In my case, the 4 GB container is just a test container and I had preferred if it had been shutdown instead. At least, a visible error in such a fatal situation in the web GUI would be good (although in my case of couse I could not even login).

gurubert · Jul 24, 2023

How much swap space is available to the kernel to be able to handle these OOM situations better?

Maximiliano · Jul 24, 2023

Note that ZFS can use up to half of the system memory for caching, 24GB in this case. ZFS will free memory if needed, but this process might be slower than the rate on which the memory is being filled leading to memory being full. You could try setting ZFS's arc, see [1] for example.

As was already mentioned, you should have enough swap available too.

[1] https://forum.proxmox.com/threads/disable-zfs-arc-or-limiting-it.77845/

Dunuin · Jul 24, 2023

And you can set the swappiness to "0". In that case the swap won't be used at all, except the server is really running out of RAM. Only then it will swap out to prevent OOM.
In case you fear that swapping will hurt the disks life expectation or tje servers performance.

sdettmer · Aug 4, 2023

Hi,

thank you all for your replies. I had no swap configured (and sufficient memory). Proxmox installation does not reserve swap and I followed the valuable defaults.

Now I added a bit of swap, but I don't think this makes it safe: if ZFS takes too long to release memory, it still may be problematic, because surely ZFS "cache" memory would not be swapped (what would be sense of memory-caching disk and backing up that cache memory to disk again? Usually disk caches shall never swap but just get released).

So I don't think swap solves the issue. Am I wrong?

Search

Search

Proxmox fails badly in low memory conditions

sdettmer

Active Member

gurubert

Distinguished Member

Maximiliano

Proxmox Staff Member

Dunuin

Distinguished Member

sdettmer

Active Member

We value your privacy