ZFS: Failed to replace failing disk

jens.kuespert · Apr 12, 2021

Hi,

one of my disks reported a SMART-error, so I set out to replace the failed disk - but failed.

Here's what I did, as reported by zpool history, with some remark indented:

2021-04-12.09:48:43 zpool offline rpool sdc2
I removed the failed disk and shut down the system, replacing the drive physically by a new disk of the same model

2021-04-12.10:13:21 zpool import -N rpool
I guess, this was done by the reboot

2021-04-12.10:17:20 zpool add rpool /dev/sdc -f
I tried to add the new disk, as there was some data on it, I figured -f would be a good idea...

As it turns out, this it was no good idea. I ended up with this layout as reported by zpool status:

Code:

pool: rpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 01:19:00 with 0 errors on Sun Mar 14 01:43:03 2021
config:

NAME        STATE     READ WRITE CKSUM
rpool       DEGRADED     0     0     0
raidz1-0  DEGRADED     0     0     0
sda2    ONLINE       0     0     0
sdb2    ONLINE       0     0     0
sdc2    OFFLINE      0     0     0
sdd2    ONLINE       0     0     0
sdc       DEGRADED     0     0     0  too many errors

To remove the new disk, I set it offline.
2021-04-12.10:21:35 zpool offline rpool sdc -f
Now zpool reports sdc as degrade, but I can not remove it...

OK, so here I am. Stuck. Luckily the system is part of a cluster, so I moved all VMs off to other servers.

What options do I have, beside from re-installing the failed node? Any idea is greatly appreciated.

leesteken · Apr 12, 2021

I'm not sure how you can tell ZFS that you have replaced /dev/sdc2 with another disk (with partitions?). However, I don't think it matters anymore because you have created a RAID0 (striped) of a RAID-Z1 and a single disk (sdc), which cannot be undone because it is the rpool as has happened to other people before.

jens.kuespert · Apr 15, 2021

Thanks for advice - I reinstalled the node without any problems (with the replaced disk ;-).

Search

Search

ZFS: Failed to replace failing disk

jens.kuespert

Renowned Member

leesteken

Distinguished Member

jens.kuespert

Renowned Member

We value your privacy