OOM-Killer issue

RudyBzh

Member
Jul 9, 2020
18
1
23
44
Dear,

Since some days, I have an issue with OOM-Killer killing randomly VMs.
Everything runs fine since 1 year at least. And this issue is new. Don't know what as changed (upgrade to PVE7 ? Switch from 9p to virtioFS ? ...).
I do have ZFS with default ARC (up to 50% if I understood). Swappiness default to 60% I guess. ...

In the attached capture, I think OOM-Killer is going to happen and I don't know exactly why the host doesn't gain RAM from ARC for example.
On the bottom right, the host. On the bottom left, the VM ID 106.

From Proxmox Web-UI, RAM usage is 74,86% (23.46G) whereas htop tells me 30G.
Is it normal ? Think this different point of view could be the reason the host is not trying to free RAM before OOM-Killer happens or something like that ?!

I probably have to much RAM affected to my VMs but as I told, it was working like that since months.

Do not hesitate to ask me for more inputs.
Thanks for you advices.

Regards.
 

Attachments

  • Capture d’écran 2021-09-21 102116.jpg
    Capture d’écran 2021-09-21 102116.jpg
    861.8 KB · Views: 26
what does 'free' show? (this has a more detailed listing of the memory usage)

how much memory did you assign to the vms (in total)?

if you have 32GiB Memory and ZFS can take up to 50% you should at max assign ~14GIB to the vms in total (leaving 2GiB for the host) so that you do not overcommit. If you overcommit, OOM can happen when multiple VMs need that memory
 
Thanks for the reply.

As told, I was totally overcommiting memory (like up to 64G affected for 32G available with ZFS on top :/ ), but it has worked like this for months without a single OOM. My question was "what could have change recently".

For now, I totally redefined my memory allocations to not overcommit so much. Have now from 11G to 21G affected to VMs (min to max if "balloon" makes his job, which I'm not really sure...). I do understand the "14G" you tell me but I give it a try like this (knowing KSM permits optimizations).
 
Hi everyone,

I just had a case of Proxmox killing a process inside a LXC via OOM at 3AM, after It has been running for 7 days (camera monitoring software)
But I don't understand what's the reason for this and what I should do to avoid It.

Here's the monitoring data at the time:

For the LXC:

- CPU usage around 8% on average
- Memory usage around 350 MiB of 2Gi configured
- Swap usage around 100 MiB of 512 MiB configured

For the whole Proxmox node:

- CPU usage around 17% on average
- Memory usage around 4.85GiB of 8Gi
- Swap usage around 1.15 GiB of 3Gi

Proxmox 7.1-7

Any help much appreciated.
Thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!