ZFS install recommendations

MimCom

Active Member
Apr 22, 2011
204
3
38
Southwest NM
https://pve.proxmox.com/wiki/Storage:_ZFS seems a bit outdated at this point, so I'm asking about a new 4.2 install here. This will be a small system with four 2TB 4kN drives and one 128 gB NVMe drive. I plan to build a ZRAID 10 using the 2TB drives but am new to ZFS and so not as clear as to the best use of the SSD for a log or a cache, presumably. Due to BIOS limitations, /boot will be located on a separate 120 gB mSATA drive, which will likely host another partition for as yet undetermined purposes.

Suggestions appreciated -- thank you.
 
A 2012 Intel desktop board with first generation UEFI support -- ran the BIOS issue to ground weeks ago. The SSD is a Samsung SM951 which I have yet to benchmark. The Proxmox installer happily found and installed to it, but the BIOS can't boot from it.

Assuming I decide to partition it for both L2ARC and ZIL use, is there any guidance on what percentage to allocate for each?

There will not be a lot of RAM in this machine; the guests do not use very much. https://en.wikipedia.org/wiki/ZFS#ZFS_cache:_ARC_.28L1.29.2C_L2ARC.2C_ZIL seems to indicate that ZFS is fairly smart about this sort of thing.
 
Last edited:
Log is flushed every 5 seconds, so you have to have enough space on your SSD to store 5 seconds of SYNCHRONOUS writes. This is usually less than 1 GB for most of the drives if you compare the numbers in the link i provided. Your SSD is a consumer SSD, so I would not raise my expectations for synchronous writes too high, yet I'd like to see the benchmarks (please run the benchmarks on the aforementioned webpage).

ZFS without a lot of RAM and using L2ARC will not be fast, because L2ARC needs ARC which reduces the RAM further.
 
Thanks, good to know. This is a very lightly loaded system -- ZFS is primarily for reliability and survivability. The overwhelming majority of its disk is for a mailserver that only has a few hundred gigs of spool -- everything else is really small and (other than a VoIP system) 99% idle.
It's currently running on an ancient Core 2 Quad Penryn with 8gb of RAM and a 3Ware RAID 10 without any issues. The new platform (a test, really) will have 16 gB initially but I can push that to 32 if it proves necessary. I've seen guidelines of 1 gB per TB for ARC, which would not be an issue.

If the ZIL really needs such a tiny amount of flash, I can put it on a small partition. Does wear leveling work across drive partitions?
 
Thanks, good to know. This is a very lightly loaded system -- ZFS is primarily for reliability and survivability. The overwhelming majority of its disk is for a mailserver that only has a few hundred gigs of spool -- everything else is really small and (other than a VoIP system) 99% idle.
It's currently running on an ancient Core 2 Quad Penryn with 8gb of RAM and a 3Ware RAID 10 without any issues. The new platform (a test, really) will have 16 gB initially but I can push that to 32 if it proves necessary. I've seen guidelines of 1 gB per TB for ARC, which would not be an issue.

The problem is that a VoIP system does not generate a lot of I/Os, but a mailsystem does. Your rule states, that you'll use exactly 8 GB of RAM for your 4 drives of storage of 2 TB. That is the default amount of ARC for a 16 GB system on Linux. If you use L2ARC, the memory consumtion in ARC increases, because you need to manage your L2ARC in your ARC, so you have less ARC available.

If the ZIL really needs such a tiny amount of flash, I can put it on a small partition. Does wear leveling work across drive partitions?

Again, you use a consumer grade SSD, which are very, very slow with respect to synchronous writes. That's one reaseon why enterprise SSDs are so expensive, because they are fast with synchronous writes. ZIL needs 5 seconds of data, then the entries are flushed on the disk. This timeout is hardcoded, so you need these 5 seconds of data which define your maximum size of ZIL on your (sync-fast) SSD.

Wear leveling does not know about partitions. Data is not ordered internally on your SSD.
 
Your rule states, that you'll use exactly 8 GB of RAM for your 4 drives of storage of 2 TB. That is the default amount of ARC for a 16 GB system on Linux.

Thanks. Do mirrored drives each consume 1 gB per TB, or is the mirroring done below that level?
 
You always see a pool of all available space, so 8 TB in your case. Used and free space is more complicated in ZFS than in ext4.
 
Am I reading this as an empty drive still use the same amount of ARC? That seems rather wasteful from an architectural standpoint. I've read about a dozen different articles and whitepapers on ZFS architecture and tuning and did not come across this. If I set the amount of ARC lower because I have drives that are only half full, do I suffer a penalty?
 
ARC is a block and metadata cache, so more is always better. 1 GB for each 1 TB of storage is maybe a minimum for decent usage. You always want more real memory with ZFS (no L2ARC).You have to check for normal operation if you often have cache misses. arcstat is good for that.

If you do not have anything stored, you need of course no memory. If the pool is not that filled the system "feels" faster and at about 80% it gets slower and then at approx. 90% it crawls and then it suddenly stops reacting at all.
 
Thanks, that does make sense. The mailserver is mostly maildir, so that's going to burn a lot of metadata space. Guess I need to fire it up and see how things go. I do appreciate the information.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!