Test arc easily eg with: cd /usr; tar cf /ext4-or-xfs-mount/testfile.tar * /etc; time dd if=/ext4-or-xfs-mount/testfile.tar of=/dev/null bs=32k
and cd /usr; tar cf /zfs-mount/testfile.tar * /etc; time dd if=/zfs-mount/testfile.tar of=/dev/null bs=32k
That's an interesting way to test it, but keep the following things in mind:
- If there are any other IOPS going on on your system, they might affect how both caches behave.
- The best way to ensure that the ARC is definitely cleared is to export your pool first and then unload the ZFS kernel module, then load the module and import the pool again. This is obviously not possible (or a wise thing to do) if you're using ZFS on root.
- Note that you won't necessarily "benchmark" the ARC that way: As you yourself know, ZFS does a lot of extra things behind the curtains, e.g. (de-)compression etc. This will greatly affect your results.
It is overall notoriously hard to
accurately benchmark filesystem caches, because of the reasons above. I could elaborate much more, but this isn't really actually the point I want to make. Becaaaause ...
We had a production fileserver with clamav running each day and if you don't echo "limit" > zfs_arc_max before start endless clamav get allocation errors so we must ensure to have that 64x 1.3GB mem free from installed ram.
Evenly that production fileserver is notoriously slow as all files are going through arc daily and so it's the best example for bad arc file handling which is then mostly uncached.
I think this here is where some misunderstandings are coming from. The ARC exists for the same purpose as LRU caches in other filesystems: It's there to make reads faster (obviously). The difference between the ARC and an LRU cache is that the ARC has a higher cache hit rate, because it does additional tracking. In fact, the ARC consists of
four LRU caches itself -- to quote
Wikipedia again:
- T1, for recent cache entries.
- T2, for frequent entries, referenced at least twice.
- B1, ghost entries recently evicted from the T1 cache, but are still tracked.
- B2, similar ghost entries, but evicted from T2.
Without going into too much detail, these four internal LRUs let the ARC have a
higher hit rate than a simple LRU cache, because it isn't as vulnerable to cache flushes.
For example, let's say you have two directories A and B with
a lot of files. At first, you work mostly with the files from directory A, then you have to work with the files from directory B for a short amount of time, and then you switch back to directory A again.
If your filesystem has an LRU cache, it's possible that all files of directory A get evicted from the cache while you work in directory B for a short moment, which means you'll need to read from disk again when switching back.
If your filesystem has an ARC, it's much more likely that the files of directory A remain in the cache, because the ARC also keeps track of files that were recently evicted.
This is the main benefit of the ARC.
Of course, if you have mostly random reads throughout your fileserver, there's only little benefit to a cache -- neither ARC or LRU make a difference here, nor does any other kind of cache.
So, what I believe is the
actual issue with your setup is something else entirely. In an earlier post, you revealed a little more about your zpool:
We support a fileserver with 24x 16TB raidz2 with 4 vdevs running clamav each day and it cannot fill the cores but give us I/O waits until in the heaven [...]
This is not an issue with the ARC, but rather with IOPS, which is why you're getting I/O waits. Regarding IOPS, there are a couple things to keep in mind:
- RAID-Z vdevs will each be limited to the IOPS of the slowest drive in the vdev.
- Mirror vdevs are not limited by this -- the IOPS will scale with the number of drives in the mirror vdev.
Because you have
four RAID-Z2 vdevs, you essentially have the IOPS of only
four disks. This has nothing to do with the ARC, you see -- the ARC can only help you if it can actually cache things, otherwise there will be no performance gain -- and if it cannot cache things, you most likely won't see any performance loss at all.
My personal recommendation would be to revise your pool's geometry. I like this site a lot personally for planning things (but unfortunately it doesn't show any information on IOPS):
https://jro.io/capacity/
In summary, if you want to increase your IOPS, you need
more vdevs, with less disks per vdev. This means that a "classic" RAID-10 setup (a bunch of striped 2-way mirror vdevs) will give you the most IOPS while still providing redundancy. (Just striping over all disks would give you the highest IOPS, but... your pool will die as soon as a single disk dies.)
I think this rather shows that it's hard to plan ahead with ZFS when you have larger storage setups -- it requires a lot of knowledge to get things right. But at the same time, you have the power to optimise your storage as much as possible, so I personally will still remain a big fan of ZFS
Yes, enterprise ssd's/nvme's are best needed for zfs but as you see here are so many proxmox home and small budget users which use consumer ssd's/nvme's which are definitive eaten by zfs which isn't warned enough for before. Yes it's said in endless threads but mostly people just try before and later are wondering about failing drives - buy cheap comes to buy twice.
Yeah I agree with you, there are unfortunately some really bad SSDs out there, but there's only so much that we can do. Chasing after terrible SSDs is a cat-and-mouse game. Even if you think you know all bad products, more cheap stuff follows, unfortunately. I personally always recommend used enterprise/datacenter SSDs to homelabbers if they can buy them somewhere, because even with a bit of wearout they will survive quite a long time in the average homelab environment.
Either way, that's beside the main topic here -- I hope my tips above may help you with your pool. To me it looks like you'll have to rebuild it if you want more IOPS, unfortunately. Or, you can always add more vdevs, but I don't know if that's an option in your case -- you already have quite a lot of disks. Good luck! And let me know if I can help with anything else regarding your pool.