zfs 2.1 roadmap

alexskysilk · Jul 20, 2021

I've had a rather catastrophic fault (3 drives in a raidz2 vdev) that I've been trying to recover from for the last month or so. I finally have the scrub complete successfully without DISK errors but there are permanent errors in the pool which I'm now resigned to having to having to redeploy the whole thing as I've been unable to clear them.

I wouldnt normally ask except in this case, but what is the prospective ETA for inclusion of ZFS 2.1? if its imminent I'll wait to deploy as draid is a killer feature for me; if its months out I'll just have to live without it.

@t.lamprecht @martin please advise

t.lamprecht · Jul 20, 2021

No imminent release planned, and cannot really state any time estimation as you may get disappointed either way, I'm afraid.

alexskysilk · Jul 20, 2021

Thanks Thomas.

What I'm going to do is remove proxmox from this box and install a clean debian. That way I SHOULD Be able to reintroduce proxmox once zfs is incorporated. Not ideal, but ok for this particular deployment.

ness1602 · Jul 21, 2021

Isnt draid for bigger deployments , eq 15+ disks?

alexskysilk · Jul 21, 2021

afaik there is no hard and fast rule about size of deployment but generally you're right, rebuilds are slower with larger pools.

This happens to be a >300TB pool.

Roman Shein · Aug 29, 2021

alexskysilk said:
I've had a rather catastrophic fault (3 drives in a raidz2 vdev)
This happens to be a >300TB pool.

300tb implies dozens of drives. Why raidz2 for such a huge pool? Why not raidz3 to start with?

Correct me, if I'm wrong, yet it is my understanding that draid is for those, who want even better redundancy, beyond raidz3... Otherwise, why somebody should choose 6 hdd draidz(lets say 4 data,1 parity+1 spare) over 6 hdd raidz2 (4 data+2 parity)? It is the same number of disks. No advantage in electricity consumption or HDD wear (the spare is not dedicated in draidz, thus it doesn't spin down). Theoretically, draidz is marginally faster, yet raidz2 is much more robust: there are 2 redundant disks at all times, instead of 1!

alexskysilk · Aug 29, 2021

raidz2 vdevs. with a 36 disk arrangement, your options are 2x 18disk RAIDZ3 or 3x 12disk RAIDZ2. I'd rather have more vdevs or performance would be total shit. more vdevs would be great but the parity overhead would be murder on usable capacity.

Roman Shein · Aug 30, 2021

alexskysilk said:
raidz2 vdevs. with a 36 disk arrangement, your options are 2x 18disk RAIDZ3 or 3x 12disk RAIDZ2. I'd rather have more vdevs or performance would be total shit. more vdevs would be great but the parity overhead would be murder on usable capacity.

1) You have had a terrible month, but it was NOT due to poor pool performance, indeed.
2) Your debacle is not a coincidence. For modern installations, raidz2 is not enough. Here is a publication from old 2010 predicting that something like your case will be happening more and more often by 2019 https://www.zdnet.com/article/why-raid-6-stops-working-in-2019/. We are in 2021 now. The author is 100% correct.
3) I suggest you reading https://arstechnica.com/gadgets/202...-1s-new-distributed-raid-topology/?comments=1, especially the ending, about dRAID real usable capacity and fault tolerance.

alexskysilk · Aug 30, 2021

Roman Shein said:
3) I suggest you reading https://arstechnica.com/gadgets/202...-1s-new-distributed-raid-topology/?comments=1, especially the ending, about dRAID real usable capacity and fault tolerance.

... which has become available in zfs 2.1. and we've come full circle to the first post

Incidentally, I run about 20 filers in this configuration. None of them has ever come close to a real fault. this particular unit shipped with a whole batch of faulty drives. and survived for years (with a lot of care and feeding.) while your articles are not wrong, its also about scope. the sky isnt falling.

Roman Shein · Aug 30, 2021

alexskysilk said:
... which has become available in zfs 2.1. and we've come full circle to the first post

Incidentally, I run about 20 filers in this configuration. None of them has ever come close to a real fault. this particular unit shipped with a whole batch of faulty drives. and survived for years (with a lot of care and feeding.) while your articles are not wrong, its also about scope. the sky isnt falling.

Roman Shein · Aug 30, 2021

I hinted to the following:
You was an unfortunate to get a 3 disk failure within the same vdev (3 out of 12 disks). If those were 3 disk from different vdev, you be fine. It was a very unlikely failure, but it happened.
In case, of the dRAID, a failure of ANY 3 disks out of 36 will result in pool loss.

chrcoluk · Sep 1, 2021

Use mirrors

I have never understood the drive to risky raidZ, usually these setups have huge amounts spent on them with large amounts of disks in the pool, if money isnt an object then 50% redundancy cost shouldnt matter.

alexskysilk · Sep 1, 2021

chrcoluk said:
I have never understood the drive to risky raidZ, usually these setups have huge amounts spent on them with large amounts of disks in the pool, if money isnt an object then 50% redundancy cost shouldnt matter.

That is a whole bunch of assumptions, based on your admitted lack of understanding. I'd suggest you understand the use case before you begin offering suggestions.

Just to humor you- all ZFS pools are vulnerable to complete failure due to vdev loss. A mirror is a 2 disk vdev. A striped pool can sustain the LOSS of two disks or even 3 with raidz3. If you're banking on your luck that you'd lose two disks in different vdevs thats your call. Its true that mirrors resilver at a much greater rate then striped pools, which is a case that can be made- but when you start taking physical space, power, and cooling requirements (and, yes- cost) this isnt always a workable approach for all storage requirements.

alexskysilk · Sep 1, 2021

Roman Shein said:
In case, of the dRAID, a failure of ANY 3 disks out of 36 will result in pool loss.

No so. that is the WHOLE POINT of draid. you're NOT necessarily changing your stripe arrangement, and its up to you on HOW MANY virtual spares you define. In my case, I WOULD probably make 2x 18disk stripesets (RaidZ2) with 2 draid spares each. A disk failure will RESILVER very rapidly to a draid spare, which means that even when a disk is out you still have full dual parity. even a second disk failure still has a spare to rebuild to. Assuming you did not replace either disk fault, you can sustain ANOTHER two disks failing and still be operational, but I'd be real nervous by then- and if you really haven't replaced your failures by then you deserve to lose your pool

Also, the above is true for EACH VDEV. in theory, this arrangement will provide ~~the same usable capacity~~ similar (6p vs 8p) as 3x12 as I operate now, but have the ability to sustain up to 8 disk failures (not at the same time but you get the point) without data loss.

chrcoluk · Sep 2, 2021

alexskysilk said:
That is a whole bunch of assumptions, based on your admitted lack of understanding. I'd suggest you understand the use case before you begin offering suggestions.

Just to humor you- all ZFS pools are vulnerable to complete failure due to vdev loss. A mirror is a 2 disk vdev. A striped pool can sustain the LOSS of two disks or even 3 with raidz3. If you're banking on your luck that you'd lose two disks in different vdevs thats your call. Its true that mirrors resilver at a much greater rate then striped pools, which is a case that can be made- but when you start taking physical space, power, and cooling requirements (and, yes- cost) this isnt always a workable approach for all storage requirements.

I am well aware there is advantages of raidz2 raidz3 over mirror, in the fact "any" 2/3 disks can fail whilst if you have multiple mirror vdevs and 2 disks fail in the same vdev then say bye bye to your pool. Personally I think raidz3 is overall safer than mirrored vdev's, but I dont have that same feeling with raidz2 especially on very large device pools. My point was more about if you able to budget for a very large device pool then cost of redundancy shouldnt be an issue.

I have looked at the draid documentation and I do consider it a huge step forward as the window of risk is significantly reduced, I feel that draid makes parity based setups much more viable now.

I would like to leave it here on a agree to disagree note, I probably have made wrong assumptions that very large pools perhaps do have budgeting limitations and my comment shouldnt have been made not productive to this specific discussion (it wasnt referring to draid which I think is great), and hope you dont reply again saying I dont understand things.

alexskysilk · Sep 2, 2021

chrcoluk said:
... and hope you dont reply again saying I dont understand things.

chrcoluk said:
I have never understood the drive to risky raidZ

wasnt me...

Felix. · Sep 28, 2021

@alexskysilk If that is still relevant for you, I just discovered ZFS 2.1 in the pvetest repository.

piexil · Sep 28, 2021

ZFS 2.1 is also available on the custom pve-edge kernels
https://github.com/fabianishere/pve-edge-kernel

Note this is not supported by proxmox and they will not provide support for you (i assume)

vesalius · Sep 29, 2021

Is Proxmox going to ZFS 2.1.1 and skipping the 2.0.6 release?

t.lamprecht · Sep 29, 2021

We uploaded a bunch of kernels with newer ZFS versions yesterday:

Proxmox VE 6.4 (oldstable):
- pve-kernel-5.4.143-1-pve (5.4.143-1) with ZFS 2.0.6
- pve-kernel-5.11.22-5-pve (5.11.22-10~bpo10+1) with ZFS 2.0.6
Proxmox VE 7.0 (stable):
- pve-kernel-5.11.22-5-pve (5.11.22-10) with ZFS 2.0.6
- pve-kernel-5.13.14-1-pve (5.13.14-1) with ZFS 2.1.1

The 5.13 based kernel is still opt-in only, it will be the one we default to in Proxmox VE 7.1 (planned for 2021/Q4).
That means that 5.11 is slowly on its way out, and that's why we did not bother with updating the ZFS module there to the 2.1 series.

To test ZFS 2.1 and the 5.13 based kernel add the pvetest repo and do:

Bash:

apt update
apt full-upgrade
apt install pve-kernel-5.13
# -> reboot

zfs 2.1 roadmap

Distinguished Member

Proxmox Staff Member

Distinguished Member

Famous Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Active Member

Renowned Member

Distinguished Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Active Member

Renowned Member

Proxmox Staff Member

We value your privacy