Feature Request: OOM Score for VMs

darkpixel · Mar 15, 2024

OOM killer is a headache in Proxmox 8.x.
Systems that have been running for years under previous versions are now getting OOM killed several times per week.

Every site we manage that uses Proxmox has the VMs--their main Windows Server, a remote access server, and a "vendor" machine that we don't really care if it gets killed.

It would be nice to have a config option for each VM to we can prioritize it for OOM killing.

Maybe a pick-list with "high priority", "normal priority" and "low priority".

High priority could set the OOM score to...say....10, normal could set it to 500, and low could set it to 1,000.

That way if the system strangely runs into a low memory condition when it has 128 GB RAM and 3 VMs that use a total of 32 GB of RAM, it will kill the least important VM first instead of taking down an entire site.

bbgeek17 · Mar 15, 2024

Have you looked into why OOM happens?

Does the OOM trace provide any useful information? It sounds like either there is a leak that should be addressed, or perhaps you are using Huge Pages? Are there extra programs running directly on the PVE host that could be leaking?

You may also want to look into Memory Ballooning https://pve.proxmox.com/wiki/Dynamic_Memory_Management#Requirements_for_Linux_VM

Another option is to create a VM hook that manually sets OOM priority:
echo "-1000" > /proc/<pid>/oom_score_adj

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Kingneutron · Mar 15, 2024

If they don't implement this in the GUI anytime soon, you could do it with a bash script on the host once all the VMs are running.

Until then I would also recommend you adjust your guest VM allocation based on actual weekly usage, increase in-guest swap and add more RAM to the host if at all possible.

If adding RAM not possible, maybe time for another host server to handle the load; or build out your cluster with more nodes.

Example:
echo -17 > /proc/$(pidof squid)/oom_score_adj

EDIT: Also look into:

echo 1 > /proc/sys/vm/overcommit_memory

bbgeek17 · Mar 15, 2024

In any case, feature requests should be going here: https://bugzilla.proxmox.com , things in forum are not tracked for the pipeline.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

darkpixel · Mar 15, 2024

Yeah, we already have a salt state that looks at running VMs and adjusts the score because the "unimportant" VM has the same name and ID at every location.

But I guess the "feature-request" tag is pointless and should be deleted in the forum software?

darkpixel · Mar 15, 2024

Bug created: https://bugzilla.proxmox.com/show_bug.cgi?id=5311

bbgeek17 · Mar 15, 2024

Tags are arbitrary, with enough seniority you can enter anything you want.

To be honest, considering only the information you provided : trying to fix a clearly broken system via GUI/CLI knobs is a Sisyphean task. If Kernel cant kill few big VMs what is it going to kill next? Systemd?

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Kingneutron · Mar 15, 2024

> if the system strangely runs into a low memory condition when it has 128 GB RAM and 3 VMs that use a total of 32 GB of RAM, it will kill the least important VM first

Hold on, the server is only using 1/4th of the RAM for running VMs? Are you using ZFS?
If so, look into limiting ARC usage, there's something going on there.

If you have a support subscription I would also recommend opening a ticket with Proxmox support.

BobhWasatch · Mar 15, 2024

Kingneutron said:
> if the system strangely runs into a low memory condition when it has 128 GB RAM and 3 VMs that use a total of 32 GB of RAM, it will kill the least important VM first

Hold on, the server is only using 1/4th of the RAM for running VMs? Are you using ZFS?

"Use" or "assigned"? Maybe they "use" 32 but the assigned amount is much bigger?

emunt6 · Mar 16, 2024

Add swap disk on Proxmox server ( you can add swap file not require swap volume ).

Also, check (Proxmox server):

Code:

$> sysctl -a | grep -i swappiness

The value of "vm.swappiness = 20" recommended.

Code:

/etc/sysctl.conf
vm.swappiness = 20

The value "20" meaning: 100-20 = 80 ( if ram usage reach 80% , then begin to use swap disk ).

The above settings with swap disk prevent OOM kill, if you run out physical memory.

leesteken · Mar 16, 2024

emunt6 said:
Add swap disk on Proxmox server ( you can add swap file not require swap volume ).

Files are not possible on ZFS (because of holes) and I'm not sure about Btrfs. Proxmox advises against swap on ZFS (but it's fine on other drives).

emunt6 said:
The value of "vm.swappiness = 20" recommended.

Who recommends that and why? Can you please explain in more detail?

emunt6 said:
The value "20" meaning: 100-20 = 80 ( if ram usage reach 80% , then begin to use swap disk ).

It's not as simple as that: https://github.com/torvalds/linux/blob/v5.0/Documentation/sysctl/vm.txt#L809

emunt6 said:
The above settings with swap disk prevent OOM kill, if you run out physical memory.

Is that your own experience? How much swap space (relative to physical RAM?) do you use?

emunt6 · Mar 16, 2024

leesteken said:
Files are not possible on ZFS (because of holes) and I'm not sure about Btrfs. Proxmox advises against swap on ZFS (but it's fine on other drives).

Who recommends that and why? Can you please explain in more detail?

It's not as simple as that: https://github.com/torvalds/linux/blob/v5.0/Documentation/sysctl/vm.txt#L809

Is that your own experience? How much swap space (relative to physical RAM?) do you use?

Your Proxmox root is on ZFS?

I have using EXT4 for root, where i can add the swap files.

The "vm.swappiness" value is working as i wrote above, from experience.

You can adjust the swap file size, as you oversize the memory usage, usually 16G or 24G is enough, if you need more you can add as you need.

Example:

Code:

$> mkdir /swap               
$> cd /swap
$> dd if=/dev/zero of=swap0-2G.swap bs=1G count=2
$> mkswap swap0-2G.swap
$> chown root:root swap0-2G.swap
$> chmod 0400 swap0-2G.swap

/etc/fstab
    /swap/swap0-2G swap swap 0 0


$> swapon /swap/swap0-2G.swap

iwik · Mar 28, 2024

darkpixel said:
OOM killer is a headache in Proxmox 8.x.
Systems that have been running for years under previous versions are now getting OOM killed several times per week.

Every site we manage that uses Proxmox has the VMs--their main Windows Server, a remote access server, and a "vendor" machine that we don't really care if it gets killed.

It would be nice to have a config option for each VM to we can prioritize it for OOM killing.

Maybe a pick-list with "high priority", "normal priority" and "low priority".

High priority could set the OOM score to...say....10, normal could set it to 500, and low could set it to 1,000.

That way if the system strangely runs into a low memory condition when it has 128 GB RAM and 3 VMs that use a total of 32 GB of RAM, it will kill the least important VM first instead of taking down an entire site.

If you are using ZFS, do you set zfs arc cache limits? Default setting is unlimited (can take 80% of memory). If it grows, VMs can be killed with oom.
https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

darkpixel · Apr 26, 2024

Sorry--the forums didn't alert me to all the replies.

Yes, the VMs (according to top) are using around 1/4 of the memory on the box.
There's plenty of "avail mem" (plenty being 10-20 GB) around the time it happens.
VMs aren't using ballooning or anything--just a fixed amount of memory.

Root is ZFS, VM storage is ZFS.

> trying to fix a clearly broken system via GUI/CLI knobs is a Sisyphean task. If Kernel cant kill few big VMs what is it going to kill next? Systemd?

Yeah...I thought I was fairly clear about it. If there's a "big production VM" and several smaller completely unimportant VMs, I'd rather kill them first. OOM killer always goes after the "big production VM" first.

Anyways, I have it manually adjusted and now it knifes the useless VM first.

oetiker · May 16, 2024

I have created a little systemd setup todo this. It watches the vm pid file and then configures the oom killer once the vm is up and running. Take a look at the gist: https://gist.github.com/oetiker/8e7fccee8f1f2ad87c5006d23aef872e

omarkb93 · 2024-11-06T18:17:22+0100

Just to reinforce this feature request, in my case this gets triggered by remote backup systems using a lot of RAM. Of course I can try to fix the RAM usage of those things but it would be really nice to be able to tell proxmox to only kill the most important VMs running as a last resort, that would probably be useful regardless of the backup issues.

Search

Search

Feature Request: OOM Score for VMs

darkpixel

Renowned Member

bbgeek17

Distinguished Member

Kingneutron

Active Member

bbgeek17

Distinguished Member

darkpixel

Renowned Member

darkpixel

Renowned Member

bbgeek17

Distinguished Member

Kingneutron

Active Member

BobhWasatch

Famous Member

emunt6

Active Member

leesteken

Distinguished Member

emunt6

Active Member

iwik

Member

darkpixel

Renowned Member

oetiker

New Member

omarkb93

New Member