VMs crashing with Out of Memory (OOM) on ZFS

philippt · Jan 28, 2023

Hi,

I have a single PVE 7.3-4 machine that runs on the (default) ZFS setup. It is equipped with 32 GB of RAM and 512 GB + 1 TB disk space.
On this machine, I have three VMs:
- Windows #1 with 4 GB
- Windows #2 with 4 GB
- Linux with 256 MB
Ballooning is enabled, but I did set fixed RAM limits (min/max) for all VMs.

So all VMs together should be using a little more than 25% of all available RAM at max.

Nevertheless, it happened several times already that one of the VMs crashed because the system ran out of memory - so an OOM kill of the kvm process, usually for Windows #1. This is obviously really annoying and is the reason why I already reduced the RAM size for the VMs, Windows actually would need more than 4 GB.

Could this behavior be related to ZFS' caching functionality? If so, is it safe to limit the usage to, let's say, 4 GB, so that I can assign more RAM to my VMs without taking the risk that they crash because of OOM?
If I understand https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage correctly, 2 GB + 1.5 GB = 3.5 GB should be sufficient for ZFS in my setup.

Thanks
Philipp

Code:

root@XXX:~# arc_summary

------------------------------------------------------------------------
ZFS Subsystem Report                            Sat Jan 28 20:07:49 2023
Linux 5.15.83-1-pve                                           2.1.7-pve1
Machine: XXX (x86_64)                              2.1.7-pve1

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    99.7 %   15.5 GiB
        Target size (adaptive):                       100.0 %   15.6 GiB
        Min size (hard limit):                          6.2 %  995.5 MiB
        Max size (high water):                           16:1   15.6 GiB
        Most Frequently Used (MFU) cache size:         50.4 %    7.3 GiB
        Most Recently Used (MRU) cache size:           49.6 %    7.2 GiB
        Metadata cache size (hard limit):              75.0 %   11.7 GiB
        Metadata cache size (current):                 11.4 %    1.3 GiB
        Dnode cache size (hard limit):                 10.0 %    1.2 GiB
        Dnode cache size (current):                     0.7 %    8.4 MiB

Code:

root@XXX:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-helper: 7.3-2
pve-kernel-5.15: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve3

Code:

root@XXX:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,iso,vztmpl

zfspool: local-zfs
        pool rpool/data
        content images,rootdir
        sparse 1

zfspool: hdpool
        pool hdpool
        content images,rootdir
        mountpoint /hdpool
        nodes XXX
        sparse 1

nfs: nas-XXX
        export /volume1/srv-XXX
        path /mnt/pve/nas-XXX
        server 192.168.XXX.XXX
        content backup

pbs: XXX
        datastore XXX
        server XXX
        content backup
        encryption-key XXX
        fingerprint XXX
        prune-backups keep-all=1
        username XXX@pbs

Neobin · Jan 29, 2023

I mean, beside of adding more RAM (if possible at all), not much options here.
So yes, I would try it with a reduced ARC-max of 4 GB in this case.
But with this, I nevertheless would not assign more than a total of additional 8 GB* (despite the fact, that you get around 11,5 GB more available) to the VMs; otherwise you might end up in a similar OOM-situation again.
* The exact amount has to be tested by you, I guess.

chrcoluk · Jan 29, 2023

Options

1 - disable transparent huge pages. (add 'transparent_hugepage=never' to kernel cmdline'
2 - reduce zfs dirty cache, this is independent of the ARC, it defaults to either 4 gigs or 10 % of ram (the larger of the two) with systems that have less than 4 gigs of ram it means that zfs dirty cache takes a larger portion of ram.
https://openzfs.github.io/openzfs-docs/Performance and Tuning/Module Parameters.html#zfs-dirty-data-max
The documentation states 10% for both values but if you have less than 40 gigs, the 'zfs_dirty_data_max' parameter is set to 4 gigs which overrides the 'zfs_dirty_data_max_percent' parameter.
3 - throttle your guest writes to limit the utilisation level of the host dirty cache.
4 - add a swap, even a very small swap can prevent oom's.

I found a combination of setting it to 2 gigs and disabling huge pages was enough to allow unthrottled zfs writes on a 32 gig system.

philippt · Jan 29, 2023

Thanks for the quick responses! For now, I changed the ARC cache size to 4 GB

Code:

root@XXX:~# cat /sys/module/zfs/parameters/zfs_arc_max
4294967296

and increased the Windows VMs' RAM to 8 GB. I am now at 70 % RAM consumption on the host, so I am happy for the moment ;-)

Will definitely look into chrcoluk's points as well and obviously observe the system further!

Thanks again!

RuleNo76 · Apr 20, 2023

I know this is an old thread, but I am having similar issues and hoping for some additional information. 40gb ram on the host, a total of 4.25 TB of storage in two pools: a single disk 256gb OS disk and a 2TB/2TB mirror pool. I'm trying to run a single VM (I know that's weird but there are reasons not worth getting into here why I did it this way) and would like it to have access to at least 8gb RAM, and preferably as much as possible.

1. I just reduced zfs_arc_max to 8gb. Can I reduce zfs_dirty_data_max to less than 4gb, and if so what's a good number to start with?

2. I'm not sure how to "1 - disable transparent huge pages. (add 'transparent_hugepage=never' to kernel cmdline'" Would someone be willing to explain more? This will likely be my first kernel edit...

3. How can I throttle the guest writes?

4. If I set the guest VM ram to min 8gb, how high can I safely go and just let the hypervisor manage it dynamically?

Thank you!

Search

Search

VMs crashing with Out of Memory (OOM) on ZFS

philippt

Active Member

Neobin

Distinguished Member

chrcoluk

Renowned Member

philippt

Active Member

RuleNo76

New Member

We value your privacy