ZFS: Failed to replace failing disk

Mar 14, 2014
23
0
21
Hi,

one of my disks reported a SMART-error, so I set out to replace the failed disk - but failed.

Here's what I did, as reported by zpool history, with some remark indented:

2021-04-12.09:48:43 zpool offline rpool sdc2
I removed the failed disk and shut down the system, replacing the drive physically by a new disk of the same model

2021-04-12.10:13:21 zpool import -N rpool
I guess, this was done by the reboot

2021-04-12.10:17:20 zpool add rpool /dev/sdc -f
I tried to add the new disk, as there was some data on it, I figured -f would be a good idea...

As it turns out, this it was no good idea. I ended up with this layout as reported by zpool status:
Code:
pool: rpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 01:19:00 with 0 errors on Sun Mar 14 01:43:03 2021
config:

NAME        STATE     READ WRITE CKSUM
rpool       DEGRADED     0     0     0
raidz1-0  DEGRADED     0     0     0
sda2    ONLINE       0     0     0
sdb2    ONLINE       0     0     0
sdc2    OFFLINE      0     0     0
sdd2    ONLINE       0     0     0
sdc       DEGRADED     0     0     0  too many errors
To remove the new disk, I set it offline.
2021-04-12.10:21:35 zpool offline rpool sdc -f
Now zpool reports sdc as degrade, but I can not remove it...

OK, so here I am. Stuck. Luckily the system is part of a cluster, so I moved all VMs off to other servers.

What options do I have, beside from re-installing the failed node? Any idea is greatly appreciated.
 
Last edited:

leesteken

Famous Member
May 31, 2020
2,364
504
118
I'm not sure how you can tell ZFS that you have replaced /dev/sdc2 with another disk (with partitions?). However, I don't think it matters anymore because you have created a RAID0 (striped) of a RAID-Z1 and a single disk (sdc), which cannot be undone because it is the rpool as has happened to other people before.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!