Feature Request: OOM Score for VMs

darkpixel

Renowned Member
Oct 26, 2010
30
2
73
OOM killer is a headache in Proxmox 8.x.
Systems that have been running for years under previous versions are now getting OOM killed several times per week.

Every site we manage that uses Proxmox has the VMs--their main Windows Server, a remote access server, and a "vendor" machine that we don't really care if it gets killed.

It would be nice to have a config option for each VM to we can prioritize it for OOM killing.

Maybe a pick-list with "high priority", "normal priority" and "low priority".

High priority could set the OOM score to...say....10, normal could set it to 500, and low could set it to 1,000.

That way if the system strangely runs into a low memory condition when it has 128 GB RAM and 3 VMs that use a total of 32 GB of RAM, it will kill the least important VM first instead of taking down an entire site.
 
  • Like
Reactions: Kingneutron
Have you looked into why OOM happens?

Does the OOM trace provide any useful information? It sounds like either there is a leak that should be addressed, or perhaps you are using Huge Pages? Are there extra programs running directly on the PVE host that could be leaking?

You may also want to look into Memory Ballooning https://pve.proxmox.com/wiki/Dynamic_Memory_Management#Requirements_for_Linux_VM

Another option is to create a VM hook that manually sets OOM priority:
echo "-1000" > /proc/<pid>/oom_score_adj


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron
If they don't implement this in the GUI anytime soon, you could do it with a bash script on the host once all the VMs are running.

Until then I would also recommend you adjust your guest VM allocation based on actual weekly usage, increase in-guest swap and add more RAM to the host if at all possible.

If adding RAM not possible, maybe time for another host server to handle the load; or build out your cluster with more nodes.


Example:
echo -17 > /proc/$(pidof squid)/oom_score_adj

EDIT: Also look into:

echo 1 > /proc/sys/vm/overcommit_memory
 
Last edited:
Yeah, we already have a salt state that looks at running VMs and adjusts the score because the "unimportant" VM has the same name and ID at every location.

But I guess the "feature-request" tag is pointless and should be deleted in the forum software? ;)
 
Tags are arbitrary, with enough seniority you can enter anything you want.

To be honest, considering only the information you provided : trying to fix a clearly broken system via GUI/CLI knobs is a Sisyphean task. If Kernel cant kill few big VMs what is it going to kill next? Systemd?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
> if the system strangely runs into a low memory condition when it has 128 GB RAM and 3 VMs that use a total of 32 GB of RAM, it will kill the least important VM first

Hold on, the server is only using 1/4th of the RAM for running VMs? Are you using ZFS?
If so, look into limiting ARC usage, there's something going on there.

If you have a support subscription I would also recommend opening a ticket with Proxmox support.
 
Add swap disk on Proxmox server ( you can add swap file not require swap volume ).

Also, check (Proxmox server):
Code:
$> sysctl -a | grep -i swappiness

The value of "vm.swappiness = 20" recommended.
Code:
/etc/sysctl.conf
vm.swappiness = 20

The value "20" meaning: 100-20 = 80 ( if ram usage reach 80% , then begin to use swap disk ).

The above settings with swap disk prevent OOM kill, if you run out physical memory.
 
Last edited:
Add swap disk on Proxmox server ( you can add swap file not require swap volume ).
Files are not possible on ZFS (because of holes) and I'm not sure about Btrfs. Proxmox advises against swap on ZFS (but it's fine on other drives).
The value of "vm.swappiness = 20" recommended.
Who recommends that and why? Can you please explain in more detail?
The value "20" meaning: 100-20 = 80 ( if ram usage reach 80% , then begin to use swap disk ).
It's not as simple as that: https://github.com/torvalds/linux/blob/v5.0/Documentation/sysctl/vm.txt#L809
The above settings with swap disk prevent OOM kill, if you run out physical memory.
Is that your own experience? How much swap space (relative to physical RAM?) do you use?
 
Files are not possible on ZFS (because of holes) and I'm not sure about Btrfs. Proxmox advises against swap on ZFS (but it's fine on other drives).

Who recommends that and why? Can you please explain in more detail?

It's not as simple as that: https://github.com/torvalds/linux/blob/v5.0/Documentation/sysctl/vm.txt#L809

Is that your own experience? How much swap space (relative to physical RAM?) do you use?
Your Proxmox root is on ZFS?

I have using EXT4 for root, where i can add the swap files.

The "vm.swappiness" value is working as i wrote above, from experience.

You can adjust the swap file size, as you oversize the memory usage, usually 16G or 24G is enough, if you need more you can add as you need.

Example:
Code:
$> mkdir /swap               
$> cd /swap
$> dd if=/dev/zero of=swap0-2G.swap bs=1G count=2
$> mkswap swap0-2G.swap
$> chown root:root swap0-2G.swap
$> chmod 0400 swap0-2G.swap

/etc/fstab
    /swap/swap0-2G swap swap 0 0


$> swapon /swap/swap0-2G.swap
 
Last edited:
OOM killer is a headache in Proxmox 8.x.
Systems that have been running for years under previous versions are now getting OOM killed several times per week.

Every site we manage that uses Proxmox has the VMs--their main Windows Server, a remote access server, and a "vendor" machine that we don't really care if it gets killed.

It would be nice to have a config option for each VM to we can prioritize it for OOM killing.

Maybe a pick-list with "high priority", "normal priority" and "low priority".

High priority could set the OOM score to...say....10, normal could set it to 500, and low could set it to 1,000.

That way if the system strangely runs into a low memory condition when it has 128 GB RAM and 3 VMs that use a total of 32 GB of RAM, it will kill the least important VM first instead of taking down an entire site.
If you are using ZFS, do you set zfs arc cache limits? Default setting is unlimited (can take 80% of memory). If it grows, VMs can be killed with oom.
https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage
 
Last edited:
  • Like
Reactions: Kingneutron
Sorry--the forums didn't alert me to all the replies.

Yes, the VMs (according to top) are using around 1/4 of the memory on the box.
There's plenty of "avail mem" (plenty being 10-20 GB) around the time it happens.
VMs aren't using ballooning or anything--just a fixed amount of memory.

Root is ZFS, VM storage is ZFS.

> trying to fix a clearly broken system via GUI/CLI knobs is a Sisyphean task. If Kernel cant kill few big VMs what is it going to kill next? Systemd?

Yeah...I thought I was fairly clear about it. If there's a "big production VM" and several smaller completely unimportant VMs, I'd rather kill them first. OOM killer always goes after the "big production VM" first.

Anyways, I have it manually adjusted and now it knifes the useless VM first.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!