OOM

Nhoague

Renowned Member
Sep 29, 2012
90
4
73
45
Colorado, USA
I am being plagued by OOM. I know, I know, there are a million posts about this ... I just want to run something by the forums.

I use a small mini appliance to run our firewall product.

VM: Kerio Control with 1 GB RAM.
HOST: 4 GB RAM, 2 x 128 GB SSD on ZFS RAID 1.

Questions, ZFS says it wants 2GB + 1GB per TB. Since I am only running 128GB disks, can I run ZFS with 500MB MIN RAM, and 1GB MAX RAM?

Thank you in advance for any insight!

Here is some output:

Top:
Tasks: 210 total, 1 running, 209 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.6 us, 0.3 sy, 0.0 ni, 98.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3818.6 total, 169.9 free, 3556.1 used, 92.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 72.6 avail Mem

(I know 169MB Free is NOT much at all, but with this appliance only running one VM, I would like to keep it as close to max as possible for performance.)

PVEVERSION: pve-manager/7.2-3/c743d6c1 (running kernel: 5.15.30-2-pve)

OOM (proof): Dec 3 11:11:54 pve systemd[1]: 100.scope: A process of this unit has been killed by the OOM killer.

nano /etc/modprobe.d/zfs.conf:

options zfs zfs_arc_min=1073741824
options zfs zfs_arc_max=2147483648

ARC_SUMMARY:
------------------------------------------------------------------------
ZFS Subsystem Report Sat Dec 03 11:37:57 2022
Linux 5.15.30-2-pve 2.1.4-pve1
Machine: pve (x86_64) 2.1.4-pve1

ARC status: THROTTLED
Memory throttle count: 13

ARC size (current): 49.8 % 1019.1 MiB
Target size (adaptive): 50.0 % 1.0 GiB
Min size (hard limit): 50.0 % 1.0 GiB
Max size (high water): 2:1 2.0 GiB
Most Frequently Used (MFU) cache size: 54.8 % 535.7 MiB
Most Recently Used (MRU) cache size: 45.2 % 442.7 MiB
Metadata cache size (hard limit): 75.0 % 1.5 GiB
Metadata cache size (current): 5.5 % 85.1 MiB
Dnode cache size (hard limit): 10.0 % 153.6 MiB
Dnode cache size (current): 1.5 % 2.3 MiB

ARC hash breakdown:
Elements max: 117.6k
Elements current: 100.0 % 117.6k
Collisions: 27.1k
Chain max: 5
Chains: 11.5k

ARC misc:
Deleted: 21
Mutex misses: 0
Eviction skips: 1
Eviction skips due to L2 writes: 0
L2 cached evictions: 0 Bytes
L2 eligible evictions: 2.3 MiB
L2 eligible MFU evictions: 84.3 % 1.9 MiB
L2 eligible MRU evictions: 15.7 % 364.0 KiB
L2 ineligible evictions: 4.0 KiB

ARC total accesses (hits + misses): 4.8M
Cache hit ratio: 98.4 % 4.8M
Cache miss ratio: 1.6 % 78.2k
Actual hit ratio (MFU + MRU hits): 98.3 % 4.8M
Data demand efficiency: 97.7 % 2.5M
Data prefetch efficiency: 6.1 % 20.5k

Cache hits by cache type:
Most frequently used (MFU): 92.6 % 4.4M
Most recently used (MRU): 7.3 % 349.5k
Most frequently used (MFU) ghost: < 0.1 % 3
Most recently used (MRU) ghost: 0.0 % 0
Anonymously used: 0.1 % 3.9k

Cache hits by data type:
Demand data: 51.0 % 2.4M
Demand prefetch data: < 0.1 % 1.3k
Demand metadata: 48.9 % 2.3M
Demand prefetch metadata: 0.1 % 5.3k

Cache misses by data type:
Demand data: 72.4 % 56.6k
Demand prefetch data: 24.6 % 19.2k
Demand metadata: 1.9 % 1.5k
Demand prefetch metadata: 1.1 % 870
 
Yes, ZFS can run will less memory. I would guess that you can set it even lower (and at a fixed size) since filesystem performance it probably not that important for your single firewall VM. Proxmox wants between 1 and 2GB, your VM want 1GB (plus some overhead), which would leave 1 to 2GB for ZFS. I'll expect that even 256MB would probably be fine for your system. But please test for yourself what works good enough for your use case.
 
Also keep in mind that PVE won't create any swap partition when installing it on top of a ZFS mirror. Swap might help with your OOM problem, but I still don't know how to use redundant swap, as mdadm and ZFS shouldn't be used to mirror the swap partition. And using an unmirrored swap partition would cause the host to crash in case one of the mirrored system disks would fail. So not great if you care about downtime.
 
Last edited:
Or maybe you could just run the firewall on the hardware and get rid of a whole bunch of useless complexity.

Well, useless complexity to me equals automated hourly snapshots and nightly image backups of the VM. We have a 3 node cluster running CEPH with our firewall, for true HA failover in the event of hardware; that box has much more RAM and doesnt get this problem. Besides, thats boring. :)
 
Or i would recommend to stick to mdadm instead of zfs on low memory machines.
I will investigate this. I wanted ZFS due to the ease of failover and replacing disks. Its not IF they fail, its WHEN. I have experience with replacing disks in ZFS, so I figured since it's in the installer it would be more straight forward. I hadn't tried ZFS with a box with such low RAM, so still learning this side of it.
 
Also keep in mind that PVE won't create any swap partition when installing it on top of a ZFS mirror. Swap might help with your OOM problem, but I still don't know how to use redundant swap, as mdadm and ZFS shouldn't be used to mirror the swap partition. And using an unmirrored swap partition would cause the host to crash in case one of the mirrored system disks would fail. So not great if you care about downtime.
Thanks for the reply! Good point, and yes you wouldnt want swap on a mirror. Downtime is not acceptable, at least as much as we can control.
 
Yes, ZFS can run will less memory. I would guess that you can set it even lower (and at a fixed size) since filesystem performance it probably not that important for your single firewall VM. Proxmox wants between 1 and 2GB, your VM want 1GB (plus some overhead), which would leave 1 to 2GB for ZFS. I'll expect that even 256MB would probably be fine for your system. But please test for yourself what works good enough for your use case.
Correct, filesystem is not important. 99% of the operations run in RAM. The filesystem is locked and read only, except when making changes via web GUI, which doesnt happen often. Thanks for your reply, exactly what I was asking. I'll try to see how low I can go with ZFS on a test server first. Was mainly wondering if it would even run. :)
 
Well, useless complexity to me equals automated hourly snapshots and nightly image backups of the VM. We have a 3 node cluster running CEPH with our firewall, for true HA failover in the event of hardware; that box has much more RAM and doesnt get this problem. Besides, thats boring. :)
Uh, ok, but it kind of begs the question of why you need that for a firewall. Having precious data on there is almost the definition of "doing it wrong".

You can get quick recovery from hardware failure plenty of other ways, say by net-booting it or taking periodic image backups or copying to a dedicated container on your cluster with rsync. Then swapping in new hardware is a ten-minute job at most and you don' t have to fool around tuning ZFS to run on what is really a ridiculously small box in ZFS terms.

Anyhow, this just doesn't seem like a good application of PVE. If you're doing it "for fun" I guess it is fine but other people reading this should not take it as a good thing to do in general.
 
  • Like
Reactions: Neobin
Uh, ok, but it kind of begs the question of why you need that for a firewall. Having precious data on there is almost the definition of "doing it wrong".

You can get quick recovery from hardware failure plenty of other ways, say by net-booting it or taking periodic image backups or copying to a dedicated container on your cluster with rsync. Then swapping in new hardware is a ten-minute job at most and you don' t have to fool around tuning ZFS to run on what is really a ridiculously small box in ZFS terms.

Anyhow, this just doesn't seem like a good application of PVE. If you're doing it "for fun" I guess it is fine but other people reading this should not take it as a good thing to do in general.
Im curious, doing it wrong? What do you mean? The cluster we're running is in CEPH. 3 x Intel Xeon with 32GB RAM just for our firewall. Differnet application; this our data center application.

The question at hand was about ZFS.

Isn't this what PVE is all about? Clustering Virtual Machines for the intended use of HA when uptime is crucial? People reading this should most definitely understand what PVE is supposed to do and real world applications. Are you saying you don't run any firewalls virtually?
 
The cluster we're running is in CEPH. 3 x Intel Xeon with 32GB RAM just for our firewall. Differnet application; this our data center application.

The question at hand was about ZFS.
But this sounded like you are running PVE on very limited hardware, where it would be reasonable to achieve HA without all that overhead and complexitiy of a PVE cluster:
VM: Kerio Control with 1 GB RAM.
HOST: 4 GB RAM, 2 x 128 GB SSD on ZFS RAID 1.
 
Ah yes, fair enough. To be clear, the PVE in question is running limited hardware. I was hoping to achieve ZFS with only 1 VM on a 4GB, 128GB SSD RAID1 small server appliance. My apologies if I confused you with the CEPH reference! I may have it stabilized now with 512MB min and 1GB max RAM and I turned the VM RAM down to 512MB as well. No clustering on this appliance. Thanks for your feedback!
 
I was about to post a new thread but this one seems to be related to my issues which I encountered last week.

I have a machine with 48 GB RAM and Proxmox 7.3.3.
There are 3 VMs in total, with memory allocation 20 GB, 4 GB, 1 GB.
Storage-wise there are:
- 2 NVMe SSDs 500GB - ZFS RAID 1 for Proxmox and OS partitions for the VMs
- 2x HDDs 2 TB - ZFS RAID 1 for storage partitions for the VMs

By a rough calculation Proxmox and ZFS should consume around 3-5 GB of memory.

I still encounter oom-killer situations. Brief look at the memory allocation showed that indeed zfs arc was the culprit (consuming close to 50 % of the memory as expected). After I manually limited the arc size to ~10GB, the OOM stopped.

While this server is low duty so limiting arc size is no issue, the question that comes to mind I guess is how to actually approach the memory management in general in new installations. I have read in some previous threads that yes, when VM + Proxmox + ZFS memory usage nears the total memory of the host, arc might not be able to "vacate" memory it's using fast enough thus leading to the oom situation. But in this case, there was about 45-50 % memory used and I still saw VMs crashing.

Also, if this question would be better asked in the separate thread, let me know.

Thanks!
 
I was about to post a new thread but this one seems to be related to my issues which I encountered last week.

I have a machine with 48 GB RAM and Proxmox 7.3.3.
There are 3 VMs in total, with memory allocation 20 GB, 4 GB, 1 GB.
Storage-wise there are:
- 2 NVMe SSDs 500GB - ZFS RAID 1 for Proxmox and OS partitions for the VMs
- 2x HDDs 2 TB - ZFS RAID 1 for storage partitions for the VMs

By a rough calculation Proxmox and ZFS should consume around 3-5 GB of memory.

I still encounter oom-killer situations. Brief look at the memory allocation showed that indeed zfs arc was the culprit (consuming close to 50 % of the memory as expected). After I manually limited the arc size to ~10GB, the OOM stopped.

While this server is low duty so limiting arc size is no issue, the question that comes to mind I guess is how to actually approach the memory management in general in new installations. I have read in some previous threads that yes, when VM + Proxmox + ZFS memory usage nears the total memory of the host, arc might not be able to "vacate" memory it's using fast enough thus leading to the oom situation. But in this case, there was about 45-50 % memory used and I still saw VMs crashing.

Also, if this question would be better asked in the separate thread, let me know.

Thanks!
ZFS will by default use up to 24GB of that 48GB RAM. ZFSs ARC should shrink when another process needs that RAM, but sometimes it isn't fast enough and the OOM killer kicks in. In such a case it might be useful to limit the ARC size. 5-9GB should be fine for the ARC. See here how to limit it: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage
 
ZFS will by default use up to 24GB of that 48GB RAM. ZFSs ARC should shrink when another process needs that RAM, but sometimes it isn't fast enough and the OOM killer kicks in. In such a case it might be useful to limit the ARC size. 5-9GB should be fine for the ARC. See here how to limit it: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage
Thank you, Danuin. I know all these things, if it was not clear from my original post.

The idea of the question was whether one should either:
1) overprovision HW in such a way that all VMs + Proxmox would never use more than 50 % of the host memory (because otherwise it might lead to OOM due to ARC not being fast enough)
2) if such an overprovisioning is not financially or physically possible, always limit the ARC size after installation.

Or a corollary - should "a general Proxmox user" (emphasis on the double quotes) be expected to allocate that much memory to the zfs caching mechanism in the first place? (related to the oficial HW specs for Proxmox)
 
Usually it's no problem that the ARC uses so much RAM. Really depends on the workload you are running. Unused RAM is wasted performance, so every OS will try to use all RAM available for caching. But if you still encounter OOM, its best to limit the ARC to a reasonable size. You will find different rules of thumb of how big your ARC should be. Something between 2GB + 0.5GB per 1TB of raw storage should be the minimum and 4GB + 1GB per 1 TB of raw storage would be good the have. And when using deduplication you would need to add 5GB RAM per 1TB of deduplicated data. So your 24GB ARC is way more than needed for your hardware setup. But the more ARC you got, the faster your storage will be, so before wasting that RAM I would use it for the ARC.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!