Best raid configuration for my setup (HDD/SSD)

Suertzz

Member
Jan 4, 2021
13
3
8
Hello,

I would need your recommendations on what type of raid would be the best for my PBS install, I planned to use ZFS.

Server hardware :

- 64Go RAM
- 24x 8To (HDD, 7,2k)
- 2x 1To (Sata SSD), for the OS.

For the OS, I guess I will use raid-1 on both SSD (2x 1To)

Regarding the storage of backups, this is where it gets complicated :

- 21x HDD on raidz3, and 3 hot spare HDD (fault tolerance is 3 disk only at the same time)
- I don't know if it's possible with zfs but 4 x [RAIDZ-2 with 6disk], 0 hot spare ?

I need approximately ~ 140TB.

Since it is about spinning disks, I understood that it was possible to significantly increase performance by adding a Special Device, is it possible to use the unused space on the OS disk (2x 1TB SSD) ?

Thank you for your recommendations.
 
Last edited:
Hello,

I would need your recommendations on what type of raid would be the best for my PBS install, I planned to use ZFS.

Server hardware :

- 64Go RAM
- 24x 8To (HDD, 7,2k)
- 2x 1To (Sata SSD), for the OS.
Thats alot of storage for that less RAM. Rule of thumb would be 4GB + 1GB RAM per 1TB of raw storage for the ARC. So that would be 198GB RAM for just for ZFS. It will run with less RAM but the more RAM you allow ZFS to use, the faster your pool will be. With so many HDDs you should also consider using some mirrored SSDs as special device so metadata can be stored on the SSDs what will speed up things like listing folders and offload alot of IOPS from the HDDs.
For the OS, I guess I will use raidz-1 on both SSD (2x 1To)
Raidz1 (what is like raid5) needs atleast 3 drives. So I think you mean a normal raid1 what is called a "mirror" in ZFS terminology.
Regarding the storage of backups, this is where it gets complicated :

- 21x HDD on raidz3, and 3 hot spare HDD (fault tolerance is 3 disk only at the same time)
- I don't know if it's possible with zfs but 4 x [RAIDZ-2 with 6disk], 0 hot spare ?

I need approximately ~ 140TB.
You can stripe multiple smaller raidz2s. And I think that would be a good idea.
First PBS needs alot of IOPS and 21 HDDs in raidz3 can'T handle more IOPS than a single HDD. A single HDD might even be faster. With 24 drives at 4x raidz2 striped together you atleast get the IOPS performance of 4 HDDs. Its by the way recommended to only use SSDs because HDDs are so slow at reading/writing random IO. Your backups will be splitted into millions of small chucks and then be deduplicated so if you start some tasks like the garbage collection or a verify job PBS needs to check and open millions of small files and HDDs are really bad at that. So if you got an 400GB image of a virtual disk that isn't a single big file that can be read sequentially (where HDDs would be good at) but instead you got 100000x 4MB chunks that are more or less random reads (where HDD are really bad at) because they are spread across all disks. ZFS also doens't support defrag, so reads will get more random over time.
The second thing is that it might take weeks to resilver your pool if a single HDD fails. While the resilvering is running your pool will get unusable slow and your HDDs are tortured 24/7 making it more likely that another disk will fail which would force you to start all over again which makes it more likely that another disk might fail and so on. Resilvering smaller raidzs that are striped together should be way faster so you get less downtime and drives are less likely to fail at the same time. You should do some resilvering tests with testdata before using that pool in production. It isn'T unusual that resilvering can be as slow as 5MB/s. If you need to resilver a 21x 8TB raidz3 that are 168 TB (but only 115-130TB usable because ZFS in general need 10-20% free space to work properly because it is copy-on-write or it gets slow until it finally switches into panic mode where it gets terrible slow). 168TB / 5MB/s = 407 days to resilver. If you stripe multiple smaller raidz2 that is way faster because you only need to resilver 48 TB. And resilvering a raidz2 should be faster than a raidz3 because the parity data calculations are less complex.
Since it is about spinning disks, I understood that it was possible to significantly increase performance by adding a Special Device, is it possible to use the unused space on the OS disk (2x 1TB SSD) ?
Keep in mind that all data on all 24 HDDs is lost as soon as both SSDs die. So you should mirror atleast 3 or better 4 SSDs so that this matches the redundancy of your raidz2 or raidz3. And also keep in mind that a special device can't be removed. Once you add it can't be removed without destroy the complete pool with all data on it. It should be possible to use a partition as a special device so you could use the same SSD that you use for your OS but I don't think that is a good idea. It would make it harder to replace a failing special device SSD because you would always need to manually partition it, writing the bootloader to it and so on.
 
Last edited:
  • Like
Reactions: Suertzz and UdoB
Hi,

Thank you for all these details :D

Thats alot of storage for that less RAM. Rule of thumb would be 4GB + 1GB RAM per 1TB of raw storage for the ARC. So that would be 198GB RAM for just for ZFS. It will run with less RAM but the more RAM you allow ZFS to use, the faster your pool will be.

Will the performance really be impacted, or will it just not be optimal?

Raidz1 (what is like raid5) needs atleast 3 drives. So I think you mean a normal raid1 what is called a "mirror" in ZFS terminology.

I meant raid - 1, so yeah mirror with ZFS!

You can stripe multiple smaller raidz2s. And I think that would be a good idea.
First PBS needs alot of IOPS and 21 HDDs in raidz3 can'T handle more IOPS than a single HDD. A single HDD might even be faster. With 24 drives at 4x raidz2 striped together you atleast get the IOPS performance of 4 HDDs.

If I understand correctly, if I do stupidly with 24 disks 8x raidz-1 (3 disk per raid), I would have at least 8x times the IOPS performance of a disk ? (It works the same way as a classic raid ?)

Keep in mind that all data on all 24 HDDs is lost as soon as both SSDs die. So you should mirror atleast 3 or better 4 SSDs so that this matches the redundancy of your raidz2 or raidz3. And also keep in mind that a special device can't be removed. Once you add it can't be removed without destroy the complete pool with all data on it. It should be possible to use a partition as a special device so you could use the same SSD that you use for your OS but I don't think that is a good idea. It would make it harder to replace a failing special device SSD because you would always need to manually partition it, writing the bootloader to it and so on.

Ok, I guess I will add 4 nvme on raidz2, what is the recommended size for the special device with ~160TB of storage ? Also, I saw in the documentation the special device could store very small files (~4k) that could help a lot right?

Thanks you
 
Will the performance really be impacted, or will it just not be optimal?
You can run arc_summary to watch the data cache hit rates and watch how much of the metadata cache and dnode cache is available. Performance can get really terrible if most of the stuff needs to be read from the slow HDDs instead of RAM. By default ZFSs ARC will use 50% of your hosts total RAM, so that will only be 32GB if you don't manually change it. So use arc_summary and if values are too bad buy some additional RAM.
If I understand correctly, if I do stupidly with 24 disks 8x raidz-1 (3 disk per raid), I would have at least 8x times the IOPS performance of a disk ? (It works the same way as a classic raid ?)
Yes, it should be at most IOPS performance of 8 drives. Raidz only increases sequential throughput and capacity but not IOPS. If you want IOPS you need to stripe so more drives can do stuff in parallel. In theory it shoud be:
write throughputread throughputIOPSlatencyusable capacitydrives may failtime to resilver
12x striped 2 disk mirrors12x24x12x1x86.4 TB1 to 12fast
8x striped 3 disk raidz1s16x16x8x1x115,2 TB1 to 8slow
4x striped 6 disk raidz2s16x16x4x1x115,2 TB2 to 8very slow
1x 21 disk raidz318x18x1x1x129,6 TB3abysimal
Ok, I guess I will add 4 nvme on raidz2, what is the recommended size for the special device with ~160TB of storage ? Also, I saw in the documentation the special device could store very small files (~4k) that could help a lot right?
If you only want to store metadata and no small files rule of thumb is 0.3% of your capacity (not sure if it was raw or usable capacity). But htat really depends on the number of files you try to store. The more small files and less big files you store, the bigger your special device has to be.
So for example 4x 250GB SSDs in a raidz2. But if you buy NVMes because of the IOPS performance I would choose 3x 500GB SSD as a threeway mirror instead.
 
Last edited:
  • Like
Reactions: Suertzz

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!