I am running a single VM running Debian11 on a host with 40gb RAM, 4 cores and a total of 4.25tb storage in a single drive zpool of 256gb and a mirror zpool of 2x2tb. I allocated 4 cores to the VM and varying amounts of RAM as I've tried to troubleshoot this issue. So far OOM-Killer keeps killing it even during simple data transfer tasks. The same set of applications/services used to run on a raspberry pi 4 4gb no problem, but I'm trying to upgrade my hardware for speed and data reliability (using a zpool mirror). Here is a history of my issue:
1. Naively allocated 32gb RAM to the VM. OOM-Killer killed it after about 30 minutes of trying to transfer data over my LAN. Did some reading, learned that zfs can use up to 50% of the host's physical memory and that I shouldn't allocate too much to the VM, also learned about min/max VM RAM.
2. Allocated min 8gb, up to 20gb RAM to the VM. OOM-Killer killed it again after about the same amount of time. Searched some more, found this post: https://forum.proxmox.com/threads/vms-crashing-with-out-of-memory-oom-on-zfs.121757/.
3. Reduced `zfs_arc_max` to 8gb and tried again. Seemed much more stable, lasting many hours, but my network connection between machines kept getting interrupted so I tried transferring the data by directly plugging in an external drive to the host and passing it through to the VM. After about 1-2 hours of copying data from the external, OOM-Killer went to work again.
On the last attempt, based purely on me looking at the summary page usage charts, VM RAM usage was about 18/20gb and host RAM usage was about 32-33/40gb. I don't get it. Why is it just killing this VM? Why isn't it reducing the VM down to the min of 8gb before killing it? Even if the VM were using 20gb, ZFS ARC were using 8gb and the zfs_dirty_data_max were using 4gb, there should still be ~8gb left over for other host processes. Per https://forum.proxmox.com/threads/vm-down-because-of-oom-killer-finding-actual-reason.124819/ I checked `cat /etc/fstab` *on the host* and did not see any lines about swap, so I don't believe it is enabled on the host. (Was I supposed to check the guest?)
The same set of services (and then some!) used to run fine on a much less powerful computer with a tenth of the RAM, albeit more slowly. I don't mind if the host throttles the VM significantly during periods of high load, but I can't have it just killing it or I will have to find a different solution. (Btw, long story on why I'm running a single VM in this host but it made system configuration much simpler/straightforward.)
Please let me know if any specific logs would be helpful. Any assistance would be much appreciated!
1. Naively allocated 32gb RAM to the VM. OOM-Killer killed it after about 30 minutes of trying to transfer data over my LAN. Did some reading, learned that zfs can use up to 50% of the host's physical memory and that I shouldn't allocate too much to the VM, also learned about min/max VM RAM.
2. Allocated min 8gb, up to 20gb RAM to the VM. OOM-Killer killed it again after about the same amount of time. Searched some more, found this post: https://forum.proxmox.com/threads/vms-crashing-with-out-of-memory-oom-on-zfs.121757/.
3. Reduced `zfs_arc_max` to 8gb and tried again. Seemed much more stable, lasting many hours, but my network connection between machines kept getting interrupted so I tried transferring the data by directly plugging in an external drive to the host and passing it through to the VM. After about 1-2 hours of copying data from the external, OOM-Killer went to work again.
On the last attempt, based purely on me looking at the summary page usage charts, VM RAM usage was about 18/20gb and host RAM usage was about 32-33/40gb. I don't get it. Why is it just killing this VM? Why isn't it reducing the VM down to the min of 8gb before killing it? Even if the VM were using 20gb, ZFS ARC were using 8gb and the zfs_dirty_data_max were using 4gb, there should still be ~8gb left over for other host processes. Per https://forum.proxmox.com/threads/vm-down-because-of-oom-killer-finding-actual-reason.124819/ I checked `cat /etc/fstab` *on the host* and did not see any lines about swap, so I don't believe it is enabled on the host. (Was I supposed to check the guest?)
The same set of services (and then some!) used to run fine on a much less powerful computer with a tenth of the RAM, albeit more slowly. I don't mind if the host throttles the VM significantly during periods of high load, but I can't have it just killing it or I will have to find a different solution. (Btw, long story on why I'm running a single VM in this host but it made system configuration much simpler/straightforward.)
Please let me know if any specific logs would be helpful. Any assistance would be much appreciated!