Hi guys, I have decided to make this thread as there is threads popping up related to the ARC.
In my experience ZFS ARC is a lot more polite than the old 'host page cache', and is also actually much more configurable. However its defaults are not necessarily suited for proxmox due to that ARC usage isnt considered as available by the linux kernel.
If you use ext4 on the host, then the page cache will be used by the operating system, this cache can consume almost all of your free ram, has a dumb eviction mechanism, is barely configurable, and if swap file is enabled will usually cause the system to start swapping out (like windows, linux page cache and swap is dumb). You can dump it via a command, but this only has a temporary effect until it fills up again. It is used for both read caching and dirty write caching. The write caching is a little bit configurable, but the read caching is a case of the devs know better and you let it do its thing.
For VM data, the behaviour is manipulated by the per virtual storage device settings as documented here. This affects both ext4 and ZFS.
https://pve.proxmox.com/wiki/Performance_Tweaks
If you use the default 'none' settings and are using ZFS, then no page cache will be utilised by VM storage, instead it will use the ARC (read cache for ZFS), and a dedicated ZFS dirty cache.
The ARC has a min size and a max size which are both configurable, the min size is very low, and max much higher, but the max is dependent on having available memory, as well as enough data to cache. As a rule of thumb this will shrink much better than the old linux page cache, ARC doesnt usually cause OOM's or swapping.
The ZFS dirty cache is not quite as polite, and is less well known, most discussions are about the ARC, but ZFS write caching is not inside the ARC. By default async writes will start to be flushed within 5 seconds, and how quick this dirty cache grows depends on how write heavy the workload is and how fast the disks are. However like the ARC it is configurable and has a cap.
If you use either writethrough, writeback, writeback (unsafe), then even with ZFS you will utilise the page cache system, and will have much higher cache usage, the kernel wont think twice about filling all free RAM with cached data and then to start swapping out, even with vm.swappiness set to 1. This will also cause double caching if using ZFS which of course is a bad idea.
The main difference between directsync and 'none' for ZFS is that 'none' will still allow drives to buffer writes for async writes, whilst directsync will force disk storage flushes for 'everything', and as such will be much slower.
Lets say you have 64 gigs of ram, and you configure a low amount of ram usage for VM's e.g. 4 gig for 3 VM's so 12 gig utilised and all set to none for cache. With ARC e.g. configured to 50% of RAM, and ZFS dirty cache a bit more as well, this will be a totally safe configuration. The memory screen in Proxmox will show ARC usage, making it feel worse than the host page cache but it isnt, the host page cache is invisible usage that doesnt show on the graph.
So when should ZFS be configured?
If you have 32 gigs or less of RAM it might be an idea to reduce the dirty ZFS cache limit as because that is unwritten data, the cache cannot be quickly reduced when needed, and as such is far less polite than the ARC. Especially with only 16 gigs.
Otherwise, only be concerned if you actually getting OOM's or swap usage. When there is a choice between swap and cache, having no swap utilisation should always be priority.
For those who dont know.
/sys/module/zfs/parameters/zfs_arc_max - max arc size in bytes (set this to 10% of your total ram if you want to match new proxmox defaults)
/sys/module/zfs/parameters/zfs_arc_min - min arc size in bytes
/sys/module/zfs/parameters/zfs_dirty_data_max - max zfs dirty cache in bytes (tune this before ARC if you have less than 40 gigs of total ram)
I dont know current defaults in proxmox, but I know it used to be the case that zfs dirty cache would default to either a % of ram or a set fixed value depending on which is highest, and if you dont have much ram it meant a higher % of ram was used by default. So this is far less likely to be an issue with 64gigs of ram and higher.
--edit, looks like default is either 10% of ram for dirty cache or 4 gigs, whichever is higher, so 40gigs of ram or higher it will default to 10%, lower then that, it will use 4 gigs and be a higher % of ram, for 16 gigs of ram, 25% of ram can be used for dirty.
From 8.1 onwards on new installs ARC usage is capped to 10% or 16 gigs, whichever is lower.
In both of these cases, they are configurable to override.--edit
So remember ARC usage shows on the graph, zfs dirty cache and host page cache does not show. This is because ARC counts as used memory.
Example below of nasty host page cache.
The memory graph showed the 20666 as not utilised. Note the swap usage. Server was sluggish to use.
Same machine with only ZFS caching for VMs. (before i force flushed the swap)
and currently with now only zram configured for swap as well.
I do agree with the new 10% defaults in proxmox 8.1 as things can be awkrawd if you want large amounts if available memory for firing up new VM's. But on the flip side if you are like me and use less than 50% of ram for VM allocation., then a higher amount for the ARC cap should be fine and helps a ton to speed up hdd reads. It will shrink when it detects available memory is low.
In my experience ZFS ARC is a lot more polite than the old 'host page cache', and is also actually much more configurable. However its defaults are not necessarily suited for proxmox due to that ARC usage isnt considered as available by the linux kernel.
If you use ext4 on the host, then the page cache will be used by the operating system, this cache can consume almost all of your free ram, has a dumb eviction mechanism, is barely configurable, and if swap file is enabled will usually cause the system to start swapping out (like windows, linux page cache and swap is dumb). You can dump it via a command, but this only has a temporary effect until it fills up again. It is used for both read caching and dirty write caching. The write caching is a little bit configurable, but the read caching is a case of the devs know better and you let it do its thing.
For VM data, the behaviour is manipulated by the per virtual storage device settings as documented here. This affects both ext4 and ZFS.
https://pve.proxmox.com/wiki/Performance_Tweaks
If you use the default 'none' settings and are using ZFS, then no page cache will be utilised by VM storage, instead it will use the ARC (read cache for ZFS), and a dedicated ZFS dirty cache.
The ARC has a min size and a max size which are both configurable, the min size is very low, and max much higher, but the max is dependent on having available memory, as well as enough data to cache. As a rule of thumb this will shrink much better than the old linux page cache, ARC doesnt usually cause OOM's or swapping.
The ZFS dirty cache is not quite as polite, and is less well known, most discussions are about the ARC, but ZFS write caching is not inside the ARC. By default async writes will start to be flushed within 5 seconds, and how quick this dirty cache grows depends on how write heavy the workload is and how fast the disks are. However like the ARC it is configurable and has a cap.
If you use either writethrough, writeback, writeback (unsafe), then even with ZFS you will utilise the page cache system, and will have much higher cache usage, the kernel wont think twice about filling all free RAM with cached data and then to start swapping out, even with vm.swappiness set to 1. This will also cause double caching if using ZFS which of course is a bad idea.
The main difference between directsync and 'none' for ZFS is that 'none' will still allow drives to buffer writes for async writes, whilst directsync will force disk storage flushes for 'everything', and as such will be much slower.
Lets say you have 64 gigs of ram, and you configure a low amount of ram usage for VM's e.g. 4 gig for 3 VM's so 12 gig utilised and all set to none for cache. With ARC e.g. configured to 50% of RAM, and ZFS dirty cache a bit more as well, this will be a totally safe configuration. The memory screen in Proxmox will show ARC usage, making it feel worse than the host page cache but it isnt, the host page cache is invisible usage that doesnt show on the graph.
So when should ZFS be configured?
If you have 32 gigs or less of RAM it might be an idea to reduce the dirty ZFS cache limit as because that is unwritten data, the cache cannot be quickly reduced when needed, and as such is far less polite than the ARC. Especially with only 16 gigs.
Otherwise, only be concerned if you actually getting OOM's or swap usage. When there is a choice between swap and cache, having no swap utilisation should always be priority.
For those who dont know.
/sys/module/zfs/parameters/zfs_arc_max - max arc size in bytes (set this to 10% of your total ram if you want to match new proxmox defaults)
/sys/module/zfs/parameters/zfs_arc_min - min arc size in bytes
/sys/module/zfs/parameters/zfs_dirty_data_max - max zfs dirty cache in bytes (tune this before ARC if you have less than 40 gigs of total ram)
I dont know current defaults in proxmox, but I know it used to be the case that zfs dirty cache would default to either a % of ram or a set fixed value depending on which is highest, and if you dont have much ram it meant a higher % of ram was used by default. So this is far less likely to be an issue with 64gigs of ram and higher.
--edit, looks like default is either 10% of ram for dirty cache or 4 gigs, whichever is higher, so 40gigs of ram or higher it will default to 10%, lower then that, it will use 4 gigs and be a higher % of ram, for 16 gigs of ram, 25% of ram can be used for dirty.
From 8.1 onwards on new installs ARC usage is capped to 10% or 16 gigs, whichever is lower.
In both of these cases, they are configurable to override.--edit
So remember ARC usage shows on the graph, zfs dirty cache and host page cache does not show. This is because ARC counts as used memory.
Example below of nasty host page cache.
Code:
# free -m
total used free shared buff/cache available
Mem: 64194 42510 2354 54 20666 21684
Swap: 2300 2156 144
The memory graph showed the 20666 as not utilised. Note the swap usage. Server was sluggish to use.
Same machine with only ZFS caching for VMs. (before i force flushed the swap)
Code:
# free -m
total used free shared buff/cache available
Mem: 64194 30411 34250 50 369 33783
Swap: 2300 1434 866
and currently with now only zram configured for swap as well.
Code:
# free -m
total used free shared buff/cache available
Mem: 64194 43202 19627 55 2522 20992
Swap: 255 0 255
I do agree with the new 10% defaults in proxmox 8.1 as things can be awkrawd if you want large amounts if available memory for firing up new VM's. But on the flip side if you are like me and use less than 50% of ram for VM allocation., then a higher amount for the ARC cap should be fine and helps a ton to speed up hdd reads. It will shrink when it detects available memory is low.
Last edited: