OOM-Killer killing only VM, but why?

RuleNo76

New Member
Apr 20, 2023
4
0
1
I am running a single VM running Debian11 on a host with 40gb RAM, 4 cores and a total of 4.25tb storage in a single drive zpool of 256gb and a mirror zpool of 2x2tb. I allocated 4 cores to the VM and varying amounts of RAM as I've tried to troubleshoot this issue. So far OOM-Killer keeps killing it even during simple data transfer tasks. The same set of applications/services used to run on a raspberry pi 4 4gb no problem, but I'm trying to upgrade my hardware for speed and data reliability (using a zpool mirror). Here is a history of my issue:

1. Naively allocated 32gb RAM to the VM. OOM-Killer killed it after about 30 minutes of trying to transfer data over my LAN. Did some reading, learned that zfs can use up to 50% of the host's physical memory and that I shouldn't allocate too much to the VM, also learned about min/max VM RAM.

2. Allocated min 8gb, up to 20gb RAM to the VM. OOM-Killer killed it again after about the same amount of time. Searched some more, found this post: https://forum.proxmox.com/threads/vms-crashing-with-out-of-memory-oom-on-zfs.121757/.

3. Reduced `zfs_arc_max` to 8gb and tried again. Seemed much more stable, lasting many hours, but my network connection between machines kept getting interrupted so I tried transferring the data by directly plugging in an external drive to the host and passing it through to the VM. After about 1-2 hours of copying data from the external, OOM-Killer went to work again.

On the last attempt, based purely on me looking at the summary page usage charts, VM RAM usage was about 18/20gb and host RAM usage was about 32-33/40gb. I don't get it. Why is it just killing this VM? Why isn't it reducing the VM down to the min of 8gb before killing it? Even if the VM were using 20gb, ZFS ARC were using 8gb and the zfs_dirty_data_max were using 4gb, there should still be ~8gb left over for other host processes. Per https://forum.proxmox.com/threads/vm-down-because-of-oom-killer-finding-actual-reason.124819/ I checked `cat /etc/fstab` *on the host* and did not see any lines about swap, so I don't believe it is enabled on the host. (Was I supposed to check the guest?)

The same set of services (and then some!) used to run fine on a much less powerful computer with a tenth of the RAM, albeit more slowly. I don't mind if the host throttles the VM significantly during periods of high load, but I can't have it just killing it or I will have to find a different solution. (Btw, long story on why I'm running a single VM in this host but it made system configuration much simpler/straightforward.)

Please let me know if any specific logs would be helpful. Any assistance would be much appreciated!
 
It looks like the Proxmox is running out of memory (not sure why, sorry) and therefore invokes OOM. OOM usually kills the process that uses the most memory (as to kill the least number of processes), which is almost always a VM.
 
Totally get that, but how can I troubleshoot the issue or change a setting so it stops killing my VM?
 
Why isn't it reducing the VM down to the min of 8gb before killing it?
While I did not investigate the actual algorithm this is my personal impression only: reclaiming so much (60% !?) memory from a VM is not quickly done. Ballooning can only request the Guest to give it back. If that takes some time then it might be too slow for the host to be useful.

Anyone with more detailed knowledge may correct me.

My personal rule is: do not over-commit memory. If you assign Ram to a VM consider it gone. If I nevertheless need to overcommit keep it a low percentage per VM. Ram overcommitment is just not as forgiving as CPU overcommitment, which just slows down everything...

To find your cause: look into the system logfiles like /var/log/kern.log or run journalctl -b and look for OOM-related messages with a timestamp near that event.

Can you trigger that behavior ad-hoc? Start htop on the host, sort for Memory consumption and watch carefully.

Good luck!
 
Ok, thank you, I'll give it a look! Do you have any thoughts on how much memory I can safely commit to the VM if it is the only one I am running and the only other memory users are proxmox and ZFS? The only reason I switched to ballooning was to try to give the host more flexibility in assigning the RAM, but I'd have preferred to be on the upper end of my range to being with.
 
Do you have any thoughts on how much memory I can safely commit to the VM

You have 40 GiB Ram, correct?
The OS needs some Ram. For me this means 4 GiB; other people may recommend other values.
You have limited the size of the ZFS ARC? Re-verify it with arc_summary | grep "ARC size (current)".

The main question is: how much Ram does your only VM require to do its job well? I can not tell. If your 20GiB assignment did work I would configure 16GiB/20GiB. (As mentioned: 8/20 is just too much dynamic in my opinion.)

In this (possibly oversimplified) configuration there are additional 8 GiB left untouched, so Ram should be available if required by any process.

If it crashes nevertheless then something is wrong. See my recommendation regarding kern.log etc.

If there are crashes without OOM-messages in the log I would recommend to run memtest86+ for a day or two.

Best regards
 
You have limited the size of the ZFS ARC? Re-verify it with arc_summary | grep "ARC size (current)".
Thank you so much! This comment led me here, after I'd be struggling to figure why 24 GB of VMS without ballooning kept getting killed on a Proxmox node with 32 GB of RAM. It turns out ZFS was using 10-12 GB, and of course one of my VMs got killed at that point. I don't have much storage, so I was able to limit the cache to 3 GB and have plenty of room for everything to play nicely again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!