Options to make RaidZ1/Mirroring more robust

Diggo

New Member
Feb 26, 2025
3
0
1
Hi guys,

due to a power outtage it seems like I lost my Raid1/Mirror pool.
It is in state faulted.
The metadata got corrupted and it can´t be imported, no matter how hard you try unless... i came up with this one here:
Bash:
~# zpool import -f -FXn Pool-A-Toshi18                                                                                                                                                                                     
Would be able to return Pool-A-Toshi18 to its state as of Sun Aug 11 20:47:02 2024.
Would discard approximately 257972 minutes of transactions.

Both disks seem to run fine, long smart Test returned no errors.
If you try to import it with normal -f or with -d for the disks itself, It get this "advice":

Bash:
~# zpool import -d /dev/sda1 -d /dev/sdb1 -f Pool-A-Toshi18
cannot import 'Pool-A-Toshi18': I/O error
        Destroy and re-create the pool from
        a backup source.

So my primary question is not, how do I get the pool back (although comments are welcome).
I am more interested in the question, how do i prevent this from happening. I mean, is it really possible, that both disks scramble metadata at the same time and so many information is lost, that I practically have to choose a state from last year?
I thought I was pretty safe with a mirror and thought metadata etc. is mirrored, too?
Therefore I am quite unhappy with the fact, that it tells me to go back to last year.

Ist there a way to get a mirror more robust against failures like this?
 
Last edited:
I am more interested in the question, how do i prevent this from happening. I mean, is it really possible, that both disks scramble metadata at the same time and so many information is lost, that I practically have to choose a state from last year?
Consumer SSDs handle unexpected power loss really badly sometimes, since they might be shuffling data (trimming or rearranging) that would be perfectly safe on an old (rotating) HDD.

I thought I was pretty safe with a mirror and thought metadata etc. is mirrored, too?
Yes, and metadata is stored multiple times per drive as well. But maybe both drives were trimming or erasing blocks or maybe you just got very unlucky.
Maybe you can recover by importing and older version of the metadata (I don't remember the details, search the forum)?

Therefore I am quite unhappy with the fact, that it tells me to go back to last year.
I'm also surprised that it suggest going back that far. Maybe research about ZFS or ask experts if you can restore a newer version of the metadata. ZFS is not specific or limited to Proxmox.

Ist there a way to get a mirror more robust against failures like this?
Use enterprise SSDs with PLP (which also give much better sync writes as they can safely cache them) or setup a UPS to prevent power problems for your system.
 
Consumer SSDs handle unexpected power loss really badly sometimes, since they might be shuffling data (trimming or rearranging) that would be perfectly safe on an old (rotating) HDD.
...
Use enterprise SSDs with PLP (which also give much better sync writes as they can safely cache them) or setup a UPS to prevent power problems for your system.
Thanks, I´ll go and check out what PLP is, maybe also other ZFS forums.
My drives are TOSHIBA MG09ACA18TE, thought they were 24/7 models and not so consumer like.

*edit* I´m thinking, that I could try out to separate the drives and put one in another computer, like truenas and let each disk scan for it´s own. Perhaps one will have different metadata. Kind of a last hope :)
 
Last edited:
Thanks, I´ll go and check out what PLP is, maybe also other ZFS forums.
It stands for Power Loss Protection, using a capacitor or battery on the SSD drive. It's not something that you'll find on HDDs. Maybe invest in a UPS for the whole system.

My drives are TOSHIBA MG09ACA18TE, thought they were 24/7 models and not so consumer like.
I did not realize that you used NAS-ready (rotating, helium-filled) HDD drives. My previous post applies only to SSDs not (your) HDDs. I don't know why they would handle unexpected power loss so badly.
 
would be perfectly safe on an old (rotating) HDD.
you used NAS-ready (rotating, helium-filled) HDD drives. My previous post applies only to SSDs not (your) HDDs. I don't know why they would handle unexpected power loss so badly.
That's no special case and even not ssd nor hdd related. Had a (named) fileserver with multiple hundreds TB raidz2 (vdev's of 6) pool which must be restored (quiet long) from DR server by send/recv too after poweroutage while department wasn't able to work for few days. This is the most terrible feature of zfs and the internet is full of corrupted pool and metadata member errors to this zfs unexpected happen - but that's life which zfs can mostly just fulfill in theory and documentary but that doesn't help any in those cases.
 
That's no special case and even not ssd nor hdd related. Had a (named) fileserver with multiple hundreds TB raidz2 (vdev's of 6) pool which must be restored (quiet long) from DR server by send/recv too after poweroutage while department wasn't able to work for few days. This is the most terrible feature of zfs and the internet is full of corrupted pool and metadata member errors to this zfs unexpected happen - but that's life which zfs can mostly just fulfill in theory and documentary but that doesn't help any in those cases.
So in summary you say "that´s life" and there is no way to get ZFS more robust in that way?
I´ll order new disks then: one set for a zfs send/recv backup server (I already have one, but it does not feel right to only have one right now ;-) ). And another slightly bigger disk and do rsnapshots on that with ext4.