VMs crashing with Out of Memory (OOM) on ZFS

philippt

Active Member
Nov 21, 2018
12
2
43
74
Hi,

I have a single PVE 7.3-4 machine that runs on the (default) ZFS setup. It is equipped with 32 GB of RAM and 512 GB + 1 TB disk space.
On this machine, I have three VMs:
- Windows #1 with 4 GB
- Windows #2 with 4 GB
- Linux with 256 MB
Ballooning is enabled, but I did set fixed RAM limits (min/max) for all VMs.

So all VMs together should be using a little more than 25% of all available RAM at max.

Nevertheless, it happened several times already that one of the VMs crashed because the system ran out of memory - so an OOM kill of the kvm process, usually for Windows #1. This is obviously really annoying and is the reason why I already reduced the RAM size for the VMs, Windows actually would need more than 4 GB.

Could this behavior be related to ZFS' caching functionality? If so, is it safe to limit the usage to, let's say, 4 GB, so that I can assign more RAM to my VMs without taking the risk that they crash because of OOM?
If I understand https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage correctly, 2 GB + 1.5 GB = 3.5 GB should be sufficient for ZFS in my setup.

Thanks
Philipp


Code:
root@XXX:~# arc_summary

------------------------------------------------------------------------
ZFS Subsystem Report                            Sat Jan 28 20:07:49 2023
Linux 5.15.83-1-pve                                           2.1.7-pve1
Machine: XXX (x86_64)                              2.1.7-pve1

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    99.7 %   15.5 GiB
        Target size (adaptive):                       100.0 %   15.6 GiB
        Min size (hard limit):                          6.2 %  995.5 MiB
        Max size (high water):                           16:1   15.6 GiB
        Most Frequently Used (MFU) cache size:         50.4 %    7.3 GiB
        Most Recently Used (MRU) cache size:           49.6 %    7.2 GiB
        Metadata cache size (hard limit):              75.0 %   11.7 GiB
        Metadata cache size (current):                 11.4 %    1.3 GiB
        Dnode cache size (hard limit):                 10.0 %    1.2 GiB
        Dnode cache size (current):                     0.7 %    8.4 MiB

Code:
root@XXX:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-helper: 7.3-2
pve-kernel-5.15: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve3

Code:
root@XXX:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,iso,vztmpl

zfspool: local-zfs
        pool rpool/data
        content images,rootdir
        sparse 1

zfspool: hdpool
        pool hdpool
        content images,rootdir
        mountpoint /hdpool
        nodes XXX
        sparse 1

nfs: nas-XXX
        export /volume1/srv-XXX
        path /mnt/pve/nas-XXX
        server 192.168.XXX.XXX
        content backup

pbs: XXX
        datastore XXX
        server XXX
        content backup
        encryption-key XXX
        fingerprint XXX
        prune-backups keep-all=1
        username XXX@pbs
 
Last edited:
I mean, beside of adding more RAM (if possible at all), not much options here.
So yes, I would try it with a reduced ARC-max of 4 GB in this case.
But with this, I nevertheless would not assign more than a total of additional 8 GB* (despite the fact, that you get around 11,5 GB more available) to the VMs; otherwise you might end up in a similar OOM-situation again.
* The exact amount has to be tested by you, I guess.
 
Options

1 - disable transparent huge pages. (add 'transparent_hugepage=never' to kernel cmdline'
2 - reduce zfs dirty cache, this is independent of the ARC, it defaults to either 4 gigs or 10 % of ram (the larger of the two) with systems that have less than 4 gigs of ram it means that zfs dirty cache takes a larger portion of ram.
https://openzfs.github.io/openzfs-docs/Performance and Tuning/Module Parameters.html#zfs-dirty-data-max
The documentation states 10% for both values but if you have less than 40 gigs, the 'zfs_dirty_data_max' parameter is set to 4 gigs which overrides the 'zfs_dirty_data_max_percent' parameter.
3 - throttle your guest writes to limit the utilisation level of the host dirty cache.
4 - add a swap, even a very small swap can prevent oom's.

I found a combination of setting it to 2 gigs and disabling huge pages was enough to allow unthrottled zfs writes on a 32 gig system.
 
Last edited:
Thanks for the quick responses! For now, I changed the ARC cache size to 4 GB
Code:
root@XXX:~# cat /sys/module/zfs/parameters/zfs_arc_max
4294967296
and increased the Windows VMs' RAM to 8 GB. I am now at 70 % RAM consumption on the host, so I am happy for the moment ;-)

Will definitely look into chrcoluk's points as well and obviously observe the system further!

Thanks again!
 
I know this is an old thread, but I am having similar issues and hoping for some additional information. 40gb ram on the host, a total of 4.25 TB of storage in two pools: a single disk 256gb OS disk and a 2TB/2TB mirror pool. I'm trying to run a single VM (I know that's weird but there are reasons not worth getting into here why I did it this way) and would like it to have access to at least 8gb RAM, and preferably as much as possible.

1. I just reduced zfs_arc_max to 8gb. Can I reduce zfs_dirty_data_max to less than 4gb, and if so what's a good number to start with?

2. I'm not sure how to "1 - disable transparent huge pages. (add 'transparent_hugepage=never' to kernel cmdline'" Would someone be willing to explain more? This will likely be my first kernel edit...

3. How can I throttle the guest writes?

4. If I set the guest VM ram to min 8gb, how high can I safely go and just let the hypervisor manage it dynamically?

Thank you!
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!