Hi there!
[...] and would like to enable HBA passtrough here so that the HDDs are recognized directly by the operating system (no raid).
That's already a very good start, as ZFS and hardware RAID are fundamentally incompatible.
You might already know this, but I still want to give you a quick overview of RAID-Z levels
from the OpenZFS docs:
A raidz group can have single, double, or triple parity, meaning that the raidzgroup can sustain one, two, or three failures, respectively, without losing anydata. The raidz1
vdev type specifies a single-parity raidz group; the raidz2
vdev type specifies a double-parity raidz group; and the raidz3
vdev typespecifies a triple-parity raidz group. The raidz
vdev type is an alias for raidz1.
So, putting all of your 36 drives (on each server) into a
single RAID-Z2 vdev would be a pretty bad idea, because the chance that multiple drives fail at once increases the more drives you have in a single vdev. If 3 out of your 36 drives fail, your vdev fails.
And if a vdev fails, your entire pool is gone. The docs also recommend not putting more than 16 disks in RAID-Z.
It is instead safer to create multiple smaller vdevs, each with the redundancy that you require. Since you have 36 disks, I'm guessing that you might have 3 12-drive bays per server, perhaps? If that's the case, you could put the drives of each bay into a separate RAID-Z2 vdev, for example. If you want to be really safe, you could put each bay into a RAID-Z3 vdev with a spare. In the latter case you would have 8 drives for data, 3 for parity, and 1 spare.
In either of the above two scenarios you end up with 3 vdevs - the data going to your pool will be dynamically distributed between those, increasing your
maximum write speed, depending on how ZFS chooses to distribute the data.
Also, since you have such a large number of disks, you might want to consider
dRAID instead of RAID-Z. dRAID would increase the resiliency of your pool even more, as resilver times are much, much faster.
In either case, the level of redundancy, number of spares, number of drives per vdev, etc. depends on your needs, so if you'd like to elaborate on those, I could give you some more hints.
Maybe some other ZFS veterans could chime in here, too.
Does it make sense to use an enterprise NVME (3.84 TB Kingston DC1500m) as cache for this backup application? Does it speed up the write rates essentially?
If by cache you mean the
L2ARC, then no, it won't. The L2ARC can increase read speeds in certain scenarios, e.g. if an application relies a lot on filesystem-based caching.
However, if you have a lot of
synchronous writes,
you can use a SLOG device. That doesn't need 3.84TB at all, however; a couple GB should be more than enough (as mentioned in the docs). Write speeds
can be increased by using multiple vdevs, as the data will be dynamically distributed between them, as mentioned above. The read and write speeds can vary a little bit there depending on how your pool is set up. In detail, e.g. if your pool is almost full and you add another vdev, all writes following will go to the new vdev, so write speeds won't change at all in that case. As another example, if you create a pool with multiple vdevs from the very beginning, you'll achieve much higher reads and writes, as all data should be (more or less) balanced between the vdevs.
In any case you should have enough RAM for ZFS to be able to effectively support that much data. A common rule of thumb is 1GB of RAM per 1TB of disk space in your pool.
Also, a lot of information can be found in the manual pages. I can highly recommend reading through those, too:
Bash:
man zfsconcepts
man zpoolconcepts
man zfs
man zpool
man zfsprops
man zpoolprops
One last thing:
Never ever enable deduplication unless you really, really,
really know what you're doing.