WOWZA!!!! I wish there was a tool
that would offer recommended ZFS pool options
e.g. Get user input i.e. Do you prefer performance or storage? How many drives do you have? Are they SSDs or Spindles? Will you be running databases or just hosting files? Are you hosting VMs or not? And then offer 2-3 options with a recommended option based on what user selected. Would you be interested in building that tool and make it online with me? I am a software engineer and can help with code and share it with experts such as TechnoTim, Linus etc. etc. to fine tune
it??
Now coming back to what I am thinking to do is to build based on your middle option in the above table. And use the 5th SSD to host Postgres only? These SSDs are 12Gbps and that should be good? I could take regular backup of this SSD on my TrueNas in case of disk failure? Do you think its a good idea?
The problem is that there are so many factors you need to take into account, because everyone got a different workload, that that such a tool would be hard to understand how to use it right. And all the numbers I calculated above are just theoretical and based on formulas how ZFS should behave. There are alot of other things that will change the real capacity/performance:
- what blocksize are the SSDs internally working with (which no manufacturer will tell you in any datasheet)
- what type of compression algorithm you use
- if you use deduplication or not
- how good your data is deduplicatable
- how good your data is compressible
- ratio of datasets and zvols used
- ...
So its always a good idea to also benchmark different setups to see what in realitiy works best for you. But that again is hard because you would need to create your own individual benchmark that will match your realworld workload (read/write ratio, sequential/random ratio, different blocksizes used, if your workload is highly or less parralelizable, if your are more writing on blocklevel to zvols or writing on filelevel to datasets, different caching modes, different protocols, encryption or not, how good data is compressible, how good data is deduplicatable, checking that the CPU or HBA or PCIe link between HBA and CPU are not bottlenecking, sync/async writes, ...).
So that is a super complex topic and you basically need to understand how ZFS works in detail to be able to make good decisions.
This is a bad news ! I am using both iSCSI
(mounted on ubuntu to host nextcloud data, Postgres and MySQL data etc.) and datasets
(for file storage and backups). I am passing 11 x SSDs
(1.8TB per disk) to TrueNas and built two pools
. One with 6 disks in raidz2 and one with 5 disks in a raidz config. Now that I am thinking to move Postgres and MySQL over to my newly created raid10
pool with 4 SSDs. What's the best config for these 11 SSDs over in TrueNas?
8K volblocksize for your posgres is only possible with:
- a single disk
- 2 disk mirror
- 4 disk striped mirror.
For everything else you would need to increase the vollbocksize above 8K to not loose too much capacity to padding.
16K volblocksize for your MySQL would be possible with:
- a single disk
- 2 disk mirror
- 4, 6 or 8 disk striped mirror (6 disk would not be ideal)
- 3 disk raidz1
And iSCSI adds another layer of overhead and the network stack will add additional latency so I would also recommend to create a local pool for your DBs. A 8K pool should be fine for all kinds of DBs if you want to run them all from the same pool. If you want even better performance you could create a dedicated pool for each DB that matches its blocksize for sync writes (so a 8K volblocksize for posgres, 16K for MySQL and 64K for MSSQL and so on). That way ZFS could write tha data with the blocksize the DB is nativly using and each DB would got its own pool so the workload is split between pools so each pool gets less IO to handle.
Running 11x 1.8 TB SSDs you got 19.8TB raw storage.
| usable space @ 100% datasets: | usable space @ 100% zvols: | 8K random write IOPS: | 8K random read IOPS: | big seq. write throughput: | big seq. read throughput: | disk may fail: |
11 disk raidz1 @ 8K: | 14.4 TB | 7.92 TB | 1x | 1x | 10x | 10x | 1 |
11 disk raidz2 @ 8K: | 12.96 TB | 5.28 TB | 1x | 1x | 9x | 9x | 2 |
11 disk raidz3 @ 8K: | 11.52 TB | 3.96 TB | 1x | 1x | 8x | 8x | 3 |
11 disk raidz1 @ 64K: | 14.4 TB | 14.4 TB | 0.125x | 0.125x | 10x | 10x | 1 |
11 disk raidz2 @ 64K: | 12.96 TB | 12.04 TB | 0.125x | 0.125x | 9x | 9x | 2 |
11 disk raidz3 @ 32/64K: | 11.52 TB | 10.56 TB | 0.25x / 0.125x | 0.25x / 0.125x | 8x | 8x | 3 |
8 disk str. mirror @ 16K: | 5.76 TB | 5.76 TB | 2x | 4x | 4x | 8x | 1-4 |
3 disk raidz1 @ 16K: | 2.88 TB | 2.88 TB | 0.5x | 0.5x | 2x | 2x | 1 |
4 disk str. mirror @ 8K: | 2.88 TB | 2.88 TB | 2x | 4x | 2x | 4x | 1-2 |
7 disk raidz1 @ 32/64K: | 10.8 TB | 8.06 TB | 0.25x / 0.125x | 0.25x / 0.125x | 6x | 6x | 1 |
2 disk mirror @ 4/8K: | 1.44 TB | 1.44 TB | 1x | 2x | 1x | 2x | 1 |
9 disk raidz1 @ 64K: | 11.53 TB | 11.53 TB | 0.125x | 0.125x | 8x | 8x | 1 |
So for a file server for big files (best capacity and sequential read/writes) I would go with "
11 disk raidz1 @ 64K" (but bad IOPS so bad for DBs and small files and bad reliability)
. But reliablity could be increased with using "
11 disk raidz2 @ 64K" instead.
If you primarily want a fast but small storage I would go with a "
8 disk str. mirror @ 16K" for decent IOPS (for 16K and above IOPS like a MySQL DB wants it even would be 4x read and 8x write and sequential reads are great too) + "
3 disk raidz1 @ 16K" so you got a small second pool that even could be used for MySQL so you don'T waste those drives. Both together would be 8.64TB of usable space.
Best allrounder would be "
4 disk str. mirror @ 8K" (small but fast pool that even could run posgres for your zvols) + "
7 disk raidz1 @ 32/64K" (big pool great for datasets with medium to big files). So that would result in 13.68TB (10.8TB for datasets + 2.88TB for zvols/datasets) of usable storage which isn't that much below the optimum of usable capacity (14.4TB) that you would get with a "
11 disk raidz1".
If you don't need that much space for zvols you can also use "
2 disk mirror @ 4/8K" + "9 disk raidz1 @ 64K".
And fusion pools might also be an option. In that case you only need to create a single pool instead of two pools.
You could basically use the same 2 options as above but instead of creating a additional mirror / striped mirror pool you could add the mirror / striped mirror as "special devices".
An example:
You create a normal raidz1 pool with 7 disks as raidz1. But than you add another 4 disks to that tool as "special devices" as a striped mirror. Now you can tell TrueNAS to store everything that uses a blocksize of for example 4k to 64K to the striped mirror and everything that is 128K or bigger to the normal raidz1. That way you only got one pool but all files bigger than 128K will be stored on that 7 disk raidz1 and all files smaller 128K (and also all zvols as long as they a vollbocksize of 64K or lower) will be stored on the 4 disk striped mirror instead. And also all metadata will be stored on that striped mirror too, so that even speeds up the 7 disk raidz1 because it won't be hit by all the IOPS caused by metadata.
You can choose for every dataset which files should be stored on the special devices by changing the datasets "special_small_blocks" attribute.
You can for example set your "special_small_blocks" to 16K. In that case all metadata, all 8K volblocksize zvols and all files smaller than 16K will be stored on that striped mirror and all files 16K or bigger will be stored on that raidz1. And you can set that for each dataset so if you don't want a dataset to use the striped mirror your could set the special_small_blocks to 0 and if you want all data to be written to that striped mirror you can set it to 1024K.
That way stuff that needs good IOPS (so zvols, small files and metadata) will be automatically stored on the fast striped mirror and all big files that just need good throughput and space efficiency will be stored on the raidz. That way data is send to that storage that will work best for that workload. As soon as your special devices are full all new small data (metadata, small files, zvols) will spill over to the raidz (so it gets slower but atleast keeps working).
So that also might be a interesting choice for your setup.
And if you got a lot of sync writes (from all your DBs) it might also be useful to get a very durable NVMe SSD with a as low as possible latency (Intel Optane would be best but also expensive) and add it as a SLOG. That way you would get a very fast write cache for sync writes boosting your pools performance even more. But its useless for async writes, so you would really need to check first if it would make sense.