zfs raidz# vs draid#

sking1984

New Member
Nov 28, 2022
25
1
3
Hi all,

I am not a storage noob but clearly I am missing something.

In a standard raid as I understand it, IOPS is calculated based on the number of spindles in the array (most cases per disk).
Ex: in the calculation of full SAS disks aprox 200 IOPS at 10 to 15k Enterprise disks. However the numerical value isnt important here but in the understanding of how this changes in a raidz and draid.

The documentation for zraid/draid suggests that the IOPS doesnt change but the amount of bandwidth per IOP does! I dont understand how this is? Perhaps if it was calculated based on the HBA aka 200 IOPS for the HBA but each disk added additional bandwidth or amount of data that can be accessed (read and write)

In my specific scenario I want a balance of performance vs parity(data protection)

I initially configured a raidz2 (2 disk parity) with 0 hot spares. I read that draid was available shortly after and configured a draid2 the following:
I used data devs 2 and 1 spare (600gb 12GB SAS disks). I understand the spare actually gets used and the blocks equivalent to 1 drive are actually removed across all disks.
This equates to: draid2:2d:14c:1s-0

https://openzfs.github.io/openzfs-docs/Basic Concepts/dRAID Howto.html

Based on the above link it should be 2 groups, but its specifically this that I'm having a hard time picturing or understanding what it defines:

data - The number of data devices per redundancy group. In general a smaller value of D will increase IOPS, improve the compression ratio, and speed up resilvering at the expense of total usable capacity. Defaults to 8, unless N-P-S is less than 8.

Does this mean I split my 14 disks into 2 groups of 7, or that I split my 14 disks into 7 groups of 2? How do I calculate the IOPS and how am I losing space at an even greater value based on the lower D value? My usable space calculation shows 1 spare removed but doesnt show parity space calculations removed in total available space. Also what does N-P-S mean or stand for... my guess? (N number of disks, minus parity, minus spare?)

ex: 14 600GB disks = 8.4TB aprox with 2 parity and 1 spare. I understand the parity calculation doesnt show as removed apparently and it should, but it does show that 1 disk equivalent of space for the spare was removed aka its showing 7.8TB usable.

Regarding the IOPS if it was 7 groups of 2 then based on their explanation that would be 7 x 200 IOPS. In a standard raid that would be 13 x 200 IOPS (14 disks with 1 hot spare)

Which is true?
 
oes this mean I split my 14 disks into 2 groups of 7, or that I split my 14 disks into 7 groups of 2? How do I calculate the IOPS and how am I losing space at an even greater value based on the lower D value? My usable space calculation shows 1 spare removed but doesnt show parity space calculations removed in total available space. Also what does N-P-S mean or stand for... my guess? (N number of disks, minus parity, minus spare?)
"draid2:2d:14c:1s-0" should mean there are 3 sets of 2data+2parity, 1 spare and 1 disk not used. So something similar to three 4 disk raidz2 striped together using a single spare. Didn't used the new draid yet, but that is how I understand the documentation.
The documentation for zraid/draid suggests that the IOPS doesnt change but the amount of bandwidth per IOP does! I dont understand how this is? Perhaps if it was calculated based on the HBA aka 200 IOPS for the HBA but each disk added additional bandwidth or amount of data that can be accessed (read and write)
Number of data disks only increases throughput performance. Number of striped vdevs increases IOPS performance. Let us use the good old raidz1 as an example.
24 disks in a single raidz1 will give you 23 times the throughput performance of a single disk (because 23x data disks + 1x parity disk) but still only the IOPS performance of a single disk, as you only got one vdev.
8x 3 disk raidz1 striped together will give you 8 times the IOPS performance of a single disk (as you got 8 striped vdevs) and a throughput performance of 16 times the performance of the a single disk (as you get 16 data disks).
So for IOPS performance you want as few disks as possible in a vdev so you can create more vdevs. Best case you don't use raidz at all and use striped mirrors. This would give you the best IOPs performance, easiest expandability and fastest resilvering time. But you will of cause also lose the most capacity to parity.
 
Last edited:
So using zfs has significantly less performance then over a standard raid 5 where we calculate each individual disk or rather spindle for IOPS at a time? Or do I have that wrong too?

I dont understand how draid2:2d:14c:1s-0 converts to 3 4 disk raidz2 vdev's? What made is used there.

4x3 = 12 not 14.... 12+1 spare = 13, +1 not used? How is 1 not used, because of the math? remainder 1?
So 14 - 1 spare = 13 / 4 disks = 3 remainder 1?

So if I turned this into a draid1:2d:14c:1s-0 that would be:
14 -1 = 13 / 3 = 4 vdev's? remainder 1? So I could increase the spare count.

14-1 = 13 / 5 = 2 vdev's remainder 3?

Theres no great way to say this but this performance calculation sucks!

BTRFS does it work the same? It allows for snapshots too, if I recall as I have a Synology and use that file system there.
 
Last edited:
Code:
draid2:2d:14c:1s
     |  |  |  |
     |  |  |  L-> one spare
     |  |  |
     |  |  L-> total number of disks to use
     |  |
     |  L-> number of data disks per disk group
     |
     L-> number of parity disks per disk group
You tell it to use 14 disks. One is used as spare, so 13 left for parity or data. You tell it to use 2 data disk and 2 parity disks per disk group, so groups of 4 disks. best you can do with the remaining 13 disks are 3 groups of 4 disks (so 12 disks) because 4 groups (so 16 disks) would be too many.
 
  • Like
Reactions: stv.Bundesadmin
So using zfs has significantly less performance then over a standard raid 5 where we calculate each individual disk or rather spindle for IOPS at a time? Or do I have that wrong too?
Imho: yes
You need also to take into consideration what IO comes in (read or write) and as well which block size is used.

Storage performance and the calculations behind are very tricky and never accurate.
There is a range of what to expect from a given setup but a slight variation of the workload can maje a huge difference.
 
The documentation for zraid/draid suggests that the IOPS doesnt change but the amount of bandwidth per IOP does! I dont understand how this is?

So using zfs has significantly less performance then over a standard raid 5 where we calculate each individual disk or rather spindle for IOPS at a time? Or do I have that wrong too?

for both zfs and traditional raid, each vdev/stripeset gets its own queue for operations. A full write or read is to all members of a given vdev. If you arrange your vdevs in a parity arrangement (eg, zraid/draid/raid5/6) each transaction require all drives to complete.

This is great if your application is few, large transactions, or low IOPs and high bandwidth, like streaming video.

When your application has a lot of disparate transactions, like virtualization, you want to have AS MANY VDEVS as you can, so each vdev can handle a transaction request while other vdevs are service other transactions. This is why you'd most commonly see raid10/simple replication techniques for hypervisor storage.

Lastly- dont use single parity (eg, raidz1/raid5,) basically ever, but certainly not for any data that you wish to survive.
 
  • Like
Reactions: Dunuin
Lastly- dont use single parity (eg, raidz1/raid5,) basically ever, but certainly not for any data that you wish to survive.

Why? If I have a hot spare? That covers a dying disk? Also... backups are a thing. Especially when I have a Synology array!

Mount an iSCSI or NFS LUN and mount it to run Synology backup or Proxmox backup!
6x 4TB WD RED PRO (CMR) and 2x 8TB Seagate Ironwolf (CMR) in my Synology.
 
Why? If I have a hot spare? That covers a dying disk?
Apologies. As storage is my profession, I tend to look at all use of storage through the lenses of operation in production. By all means, do use whatever storage arrangements you desire for non production (read: home) use. It may sound counterintuitive, but I dont use any of this stuff at home.

In production, there are multiple layers of consideration. backups are NOT a remedy for downtime.
 
Yeah this is definitely for home.

At work we spare no expense. Full hardware raids and we prioritize performance and safety over disk size. So more disks vs storage.
 
Why? If I have a hot spare?
Raidz1 with hot spare is still worse than a raidz2. Keep in mind that you will lose bit rot protection when one disks fails. So with one degraded disk the smallest error (even if it is just one single bit flip) will cause data corruption. So your data will be vulnerable while resilvering your pool and resilvering can take weeks or months when using a lot of HDDs in a very big vdev.
 
Last edited:
Why? If I have a hot spare? That covers a dying disk?
If you have a hotspare anyways why not using it by default? A spare is just sitting around as already indicated by others. A participating disk helps in various scenarios
 
again, speaking from a "for realz" perspective- you would never do single parity under any circumstances. degraded mode on a single parity volumeset has virtually no safety net, either from physical disk or logical (memory, system, etc) faults.

rule of thumb for draid is 20+ disks; it doesnt make too much sense for a smaller deployment. A typical deployment of 8:2:1 or 8:2:2 will need about 24 disks to get 2x vdev performance- but you have to realize draid is not a solution FOR performance, its a solution to minimize degradation time. a small volumeset wouldnt benefit much since resilver times arent that long.
 
So IMO what FS would you use with 14 disks?

Well 16 but I used 2 in a zfs mirror for the boot drive.

I was primarily looking for the additional features of zfs but I could get that with btrfs yes?

Thanks
 
So I didn’t know you could do that. In a zfs pool. I’ll have to look into that. Thanks folks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!