Need help for ZFS Special Device

Nov 23, 2021
4
0
1
France
Hi,

First time zfs pool and I have some questions.

I have a zfs pool of 100 TB (SAS hdd drives) and I would like to try ZFS special device but I'm not sure how it works.

I'm aware that redundancy of special should match the one of the pool and it seems that there is a 0.2 ratio rule for sizing.

Question 1
How much total space do I need ? 2 TB ?

Question 2
How many mirror device(s) do I need for different scenarios ?
  • A : raidz-1 : 2 ?
  • B : raidz-2 : 3 ?
  • C : raidz-3 : ?
Question 3
If I need 2 TB total space for special device and if I need raidz-2 on 3 drives, do I need 3 x 2 TB drives ?

Thanks.
 

Dunuin

Famous Member
Jun 30, 2020
5,678
1,297
144
Germany
Hi,

First time zfs pool and I have some questions.

I have a zfs pool of 100 TB (SAS hdd drives) and I would like to try ZFS special device but I'm not sure how it works.

I'm aware that redundancy of special should match the one of the pool and it seems that there is a 0.2 ratio rule for sizing.

Question 1
How much total space do I need ? 2 TB ?
If I remember right rule of thumb is 0.3 percent of your usable capacity if you just want to store metadata. But how big your special device need to be depends on the number of records. So the more small files you store the bigger your special device needs to be. There also was a oneliner to sum up all your records so you can calculate the size you really need for your real data.
Question 2
How many mirror device(s) do I need for different scenarios ?
  • A : raidz-1 : 2 ?
  • B : raidz-2 : 3 ?
  • C : raidz-3 : ?
Jep and raidz3 would be 4 drives.
Question 3
If I need 2 TB total space for special device and if I need raidz-2 on 3 drives, do I need 3 x 2 TB drives ?

Thanks.
Yes, because special devices only support mirrors. So 3x 2TB (but I guess 3x 1TB will be fine too) for a raidz2
 
  • Like
Reactions: trebuh
Nov 23, 2021
4
0
1
France
I see, thank you Dunuin.
If I remember right rule of thumb is 0.3 percent of your usable capacity if you just want to store metadata. But how big your special device need to be depends on the number of records. So the more small files you store the bigger your special device needs to be. There also was a oneliner to sum up all your records so you can calculate the size you really need for your real data.
hmm so for 100TB it would far less than I expected (300GB). i'm curious about that oneliner, do you have any link ?

Jep and raidz3 would be 4 drives.
ok
Yes, because special devices only support mirrors. So 3x 2TB (but I guess 3x 1TB will be fine too) for a raidz2

In case it helps some people and to resume (generic scenario for 100TB) :
  1. define your usable pool storage space (for exemple 100TB)
  2. use the 0,3% rule (it would be 300GB for 100TB)
  3. use the same redundancy level for you special device as your pool storage
    1. for raidz-1 : 2 drives (total usable space of disks should be equal or greater than 300GB, so 2 x 300GB)
    2. for raidz-2 : 3 drives (total usable space of disks should be equal or greater than 300GB, so 3 x 300GB)
    3. for raidz-3 : 4 drives (total usable space of disks should be equal or greater than 300GB, so 4 x 300GB)
@Dunuin : do you know if a cache / log pool is also necessary on top of a special device pool ?

Any other extra tips is welcome.

Thanks.
 

Dunuin

Famous Member
Jun 30, 2020
5,678
1,297
144
Germany
Need to look for that oneliner.

L2ARC and SLOG wont make much sense in most scenarios. SLOG will only be used for sync writes. So if you got alot of sync writes (running databases and so on) it would be great. If your only got async writes (SMB/NFS shares and so on) it would be totally useless.
And for L2ARC and DDT devices you should first max out your supported RAM before considering using one of these (which will only try to lower the performance hits of ruing ZFS with not enough RAM).

Edit:
https://www.reddit.com/r/zfs/comments/elg7fe/allocation_class_can_i_get_a_quick_gut_check/:
run "zdb -Lbbbs POOLNAME" for block statistics. Level 0 ZFS Plain File and Level 0 zvol objects do NOT go to the metadata SSD, but everything else would. This may take a while, but would give the most accurate answer possible. Oh, and use the ASIZE column for your measurements.
And if I get it right the metadata will be stored on your HDDs again as soon as the special device is more than 75% filled up. So you possibly might want that special device to be 200% of the size you calculate using zdb -Lbbbs POOLNAME to be on the save side (with 50% more space you actually need).
And keep in mind that ZFS won't move existing metadata from the HDDs to the special device. Only new metadata will be stored there. So you would need to move all your data around so the old metadata actually gets written to the special devices.
 
Last edited:
  • Like
Reactions: trebuh
Nov 23, 2021
4
0
1
France
Need to look for that oneliner.

L2ARC and SLOG wont make much sense in most scenarios. SLOG will only be used for sync writes. So if you got alot of sync writes (running databases and so on) it would be great. If your only got async writes (SMB/NFS shares and so on) it would be totally useless.
And for L2ARC and DDT devices you should first max out your supported RAM before considering using one of these (which will only try to lower the performance hits of ruing ZFS with not enough RAM).
I see. Well in my case the special device is only for a pbs server full of hdds. On the other side, pve servers are full of nvme/ssd. I just want to minimize backups time (a mix of VMs / millions of small files from a "file server").
Edit:
https://www.reddit.com/r/zfs/comments/elg7fe/allocation_class_can_i_get_a_quick_gut_check/:

And if I get it right the metadata will be stored on your HDDs again as soon as the special device is more than 75% filled up. So you possibly might want that special device to be 200% of the size you calculate using zdb -Lbbbs POOLNAME to be on the save side (with 50% more space you actually need).
Ok, will try that.

So if special device never use more than 75%, and since you cant remove a special device, once it's added to a pool then it's very important to find the good value before doing anything or you are good to start from scratch again.
And keep in mind that ZFS won't move existing metadata from the HDDs to the special device. Only new metadata will be stored there. So you would need to move all your data around so the old metadata actually gets written to the special devices.
Ok.

Very interesting, thanks.
 

Dunuin

Famous Member
Jun 30, 2020
5,678
1,297
144
Germany
I see. Well in my case the special device is only for a pbs server full of hdds. On the other side, pve servers are full of nvme/ssd. I just want to minimize backups time (a mix of VMs / millions of small files from a "file server").

Ok, will try that.

So if special device never use more than 75%, and since you cant remove a special device, once it's added to a pool then it's very important to find the good value before doing anything or you are good to start from scratch again.
Yes. But special devices can be striped too. So it should be possible to add another mirror later to increase the capacity. But again, ZFS won't actively reorder data, so adding another mirror to a stripe will only increase the capacity but not the performance at first until the data is passively spread equally.

I also read about some people just using L2ARC to cache metadata. So if you don't often reboot your BPS that should work too. In both cases the metadata will be read from the SSDs. Disadvantage with L2ARC would be that after a reboot the metadata needs to be read from the HDDs once again. But the advantage would be that you can't loose your backups in case the SSD dies because it is just read caching stuff stored on the HDDs. And a L2ARC can be removed/replaced at any time. So with L2ARC you don't need to care that much about reliability and so you can maybe save some money on some SSDs you don't need to buy.
 
Last edited:

Dunuin

Famous Member
Jun 30, 2020
5,678
1,297
144
Germany
There is a persistent L2ARC option if i'm not mistaken.
Yes, but that is super new and it looks like there are some problems. For example on TrueNAS it is disabled by default because of that. So thats more like an experimental feature.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!