zfs best practices: 36x 6tb hdd + enterprise nvme

encore

Well-Known Member
May 4, 2018
108
1
58
35
Hello together,

there are a lot of posts here about ZFS, which makes it a bit difficult to compress the needed information.

Therefore a simple "best practice" queestion to the experienced ZFS admins:
I have two Adaptec 6805 raid servers with 36x 6TB HDDs each and would like to enable HBA passtrough here so that the HDDs are recognized directly by the operating system (no raid).
I want to set up a ZFS RAIDZ-2 as backup storage. Does it make sense to use an enterprise NVME (3.84 TB Kingston DC1500m) as cache for this backup application? Does it speed up the write rates essentially?
 
Hi there!

[...] and would like to enable HBA passtrough here so that the HDDs are recognized directly by the operating system (no raid).
That's already a very good start, as ZFS and hardware RAID are fundamentally incompatible.

You might already know this, but I still want to give you a quick overview of RAID-Z levels from the OpenZFS docs:
A raidz group can have single, double, or triple parity, meaning that the raidzgroup can sustain one, two, or three failures, respectively, without losing anydata. The raidz1 vdev type specifies a single-parity raidz group; the raidz2 vdev type specifies a double-parity raidz group; and the raidz3 vdev typespecifies a triple-parity raidz group. The raidz vdev type is an alias for raidz1.

So, putting all of your 36 drives (on each server) into a single RAID-Z2 vdev would be a pretty bad idea, because the chance that multiple drives fail at once increases the more drives you have in a single vdev. If 3 out of your 36 drives fail, your vdev fails. And if a vdev fails, your entire pool is gone. The docs also recommend not putting more than 16 disks in RAID-Z.

It is instead safer to create multiple smaller vdevs, each with the redundancy that you require. Since you have 36 disks, I'm guessing that you might have 3 12-drive bays per server, perhaps? If that's the case, you could put the drives of each bay into a separate RAID-Z2 vdev, for example. If you want to be really safe, you could put each bay into a RAID-Z3 vdev with a spare. In the latter case you would have 8 drives for data, 3 for parity, and 1 spare.

In either of the above two scenarios you end up with 3 vdevs - the data going to your pool will be dynamically distributed between those, increasing your maximum write speed, depending on how ZFS chooses to distribute the data.

Also, since you have such a large number of disks, you might want to consider dRAID instead of RAID-Z. dRAID would increase the resiliency of your pool even more, as resilver times are much, much faster.

In either case, the level of redundancy, number of spares, number of drives per vdev, etc. depends on your needs, so if you'd like to elaborate on those, I could give you some more hints.

Maybe some other ZFS veterans could chime in here, too.

Does it make sense to use an enterprise NVME (3.84 TB Kingston DC1500m) as cache for this backup application? Does it speed up the write rates essentially?
If by cache you mean the L2ARC, then no, it won't. The L2ARC can increase read speeds in certain scenarios, e.g. if an application relies a lot on filesystem-based caching.

However, if you have a lot of synchronous writes, you can use a SLOG device. That doesn't need 3.84TB at all, however; a couple GB should be more than enough (as mentioned in the docs). Write speeds can be increased by using multiple vdevs, as the data will be dynamically distributed between them, as mentioned above. The read and write speeds can vary a little bit there depending on how your pool is set up. In detail, e.g. if your pool is almost full and you add another vdev, all writes following will go to the new vdev, so write speeds won't change at all in that case. As another example, if you create a pool with multiple vdevs from the very beginning, you'll achieve much higher reads and writes, as all data should be (more or less) balanced between the vdevs.


In any case you should have enough RAM for ZFS to be able to effectively support that much data. A common rule of thumb is 1GB of RAM per 1TB of disk space in your pool.

Also, a lot of information can be found in the manual pages. I can highly recommend reading through those, too:
Bash:
man zfsconcepts
man zpoolconcepts

man zfs
man zpool

man zfsprops
man zpoolprops

One last thing: Never ever enable deduplication unless you really, really, really know what you're doing.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!