Zpool replace with removed dead drive doesn't work

MikeC

Renowned Member
Jan 11, 2016
71
0
71
Bay Area, California
Hello, all.

One of the drives in my zpool has failed, and so I removed it and ordered a replacement drive.
Now that it's here, I am having problems replacing it.

OS: Debian 11
Pve: 7.3-4

I've installed the replacement drive and it shows up under both lsblk and in the gui.

Zpool status:

Code:
root@proxmox:/home/mikec# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub in progress since Wed Jan 18 21:45:57 2023
    143G scanned at 238M/s, 5.12G issued at 8.53M/s, 143G total
    0B repaired, 3.59% done, no estimated completion time
config:

    NAME                      STATE     READ WRITE CKSUM
    rpool                     DEGRADED     0     0     0
      mirror-0                DEGRADED     0     0     0
        10342417906894345042  FAULTED      0     0     0  was /dev/sda2
        sda2                  ONLINE       0     0     0

errors: No known data errors

Code:
root@proxmox:/home/mikec# lsblk
NAME     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda        8:0    0  1.8T  0 disk 
├─sda1     8:1    0 1007K  0 part 
├─sda2     8:2    0  1.8T  0 part 
└─sda9     8:9    0    8M  0 part 
sdb        8:16   0  1.8T  0 disk 
├─sdb1     8:17   0  1.8T  0 part 
└─sdb9     8:25   0    8M  0 part 
zd0      230:0    0   24G  0 disk 
├─zd0p1  230:1    0   50M  0 part 
├─zd0p2  230:2    0 23.5G  0 part 
└─zd0p3  230:3    0  509M  0 part 
zd16     230:16   0   24G  0 disk 
├─zd16p1 230:17   0  500M  0 part 
└─zd16p2 230:18   0 23.5G  0 part 
zd32     230:32   0    7G  0 disk [SWAP]

The previous disk in the pool has this weird name, and the system notes that it was formerly on /dev/sda2 (the remaining disk in the pool has already been renamed sda). Also it's status is FAULTED and not REMOVED.

I first tried a straight replace..

Code:
zpool replace rpool 1034241706894345042 /dev/sdb
cannot replace 1034241706894345042 with /dev/sdb: no such device in pool

The former drive IS shown in /dev/disk/by-id...

Code:
root@proxmox:/home/mikec# ls /dev/disk/by-id/
ata-ST2000DM008-2FR102_ZFL63932        ata-ST32000542AS_5XW2JWV2-part1    wwn-0x5000c5002f9110e4-part1  wwn-0x5000c500e437d5cb-part1
ata-ST2000DM008-2FR102_ZFL63932-part1  ata-ST32000542AS_5XW2JWV2-part2    wwn-0x5000c5002f9110e4-part2  wwn-0x5000c500e437d5cb-part9
ata-ST2000DM008-2FR102_ZFL63932-part9  ata-ST32000542AS_5XW2JWV2-part9    wwn-0x5000c5002f9110e4-part9
ata-ST32000542AS_5XW2JWV2           wwn-0x5000c5002f9110e4        wwn-0x5000c500e437d5cb

The new disk s/n is ZFL63932

I tried zpool replace with the disk-id values (wwn-0x5000c5002f9110e4 and wwn-0x5000c500e437d5cb), but got the same 'no such device in pool' error.

What do I have to do to replace the old disk?
Do I have to format the new disk, or will zpool set it up properly? It currently has part1 & part9, but not part2 like the current disk in the pool.

Thanks.
 
How did you install PVE in the first place? You wrote Debian 11 but the Debian installer doesn't support ZFS as far as I know. And as far as I know you also didn't use the PVE installer, as your rpool is using the 2nd partition and the PVE installer installs ZFS on the 3rd partition, with partition 2 for the bootloader and partition 1 for MBR compatibility.

And your ata-ST2000DM008-2FR102_ZFL63932 looks like it was partitioned by ZFS but without partitioning it first to store a booloader. So you won`t be able to boot from that new disk, resulting in an unbootable server in case the remaining disk also fails.
 
Hey. Dunuin. Incorrect terminology then on my part. I did install this server using a Proxmox installer image. I may have clicked on Initialize in the GUI for this new disk, but don't recall. It doesn't have any data at all on it, so no problem reformatting and repartitioning it. Is there a guide for doing so on the proxmox site?

Still looking at how to successfully either replace or add the new drive, then remove the ghost drive from the pool.
 
I've used sgdisk to set up the new disk. Now the paritions match the disk in the pool.
I'm thinking of adding it first to the mirror and let it resilver before figuring out how to remove the dead/removed disk.
do I use 'zpool replace', 'zpool attach' or 'zpool add'? Do I use 'sdb' or the partition 'sdb2'?
 
I've used sgdisk to set up the new disk. Now the paritions match the disk in the pool.
I'm thinking of adding it first to the mirror and let it resilver before figuring out how to remove the dead/removed disk.
do I use 'zpool replace', 'zpool attach' or 'zpool add'? Do I use 'sdb' or the partition 'sdb2'?
add creates a stripe/raid0 which you don't want. replace should work but it caused you issues and you don't want to remove the old one yet.
So please use attach, which also requires you to name the (existing, working) vdev that you want to mirror. Don't use /dev/sdb2 (or 3? make sure to select the right/largest one), use /dev/disk/by-id/ata-__YOUR_NEW_DRIVE__-part2. You should end up with a 3-way mirror of which 1 is missing.
 
  • Like
Reactions: MikeC
Sorry to ask more, but I'm really nervous about the possibility of blowing up the raid set...

The syntax of attach is: zpool attach [-fsw] [-o property=value] pool device new_device
Given my pool 'rpool' has 'sda2' as what I'm assuming is the "device", would the proper command be:

zpool attach rpool sda2 /dev/disk/by-id/ata-ST2000DM008-2FR102_ZFL63932-part2 ?

I'm wondering why the existing drive is just 'sda2' and not the full path to that partition. Maybe zpool abbreviates the device name once it's attached?
 
Sorry to ask more, but I'm really nervous about the possibility of blowing up the raid set...

The syntax of attach is: zpool attach [-fsw] [-o property=value] pool device new_device
Given my pool 'rpool' has 'sda2' as what I'm assuming is the "device", would the proper command be:

zpool attach rpool sda2 /dev/disk/by-id/ata-ST2000DM008-2FR102_ZFL63932-part2 ?

I'm wondering why the existing drive is just 'sda2' and not the full path to that partition. Maybe zpool abbreviates the device name once it's attached?
Just try it and if it tells you that it does not have a sda2, then use /dev/sda2.
 
Thanks again. I've added the new drive using it's by-id value and it's showing as part of the pool, and resilvering has begun.
Once it's done, then I shall try again to remove the faulted drive.

Code:
root@proxmox:~# zpool attach rpool sda2 /dev/disk/by-id/ata-ST2000DM008-2FR102_ZFL63932-part2
root@proxmox:~# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jan 22 10:22:56 2023
    582M scanned at 58.2M/s, 188K issued at 18.8K/s, 143G total
    0B resilvered, 0.00% done, no estimated completion time
config:


    NAME                                       STATE     READ WRITE CKSUM
    rpool                                      DEGRADED     0     0     0
      mirror-0                                 DEGRADED     0     0     0
        10342417906894345042                   FAULTED      0     0     0  was /dev/sda2
        sda2                                   ONLINE       0     0     0
        ata-ST2000DM008-2FR102_ZFL63932-part2  ONLINE       0     0     0


errors: No known data errors
 
i am having a similar issue to the post above.
I have two nodes, exactly the same, and each of them had a failed drive. There are only two drive bays and the zfs is set up as a mirror.

The difference is that i replaced the failed drive with the new one, and i cannot get any command to work.

It seems like proxmox is recognizing the replacement drive as sda as well.

root@ktpve32:~# zpool status -x
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: scrub repaired 0B in 00:00:25 with 0 errors on Sun Jul 10 00:24:26 2022
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
16938304517973068656 UNAVAIL 0 0 0 was /dev/disk/by-id/scsi-36d4ae520a6226f00293f990107118dba-part3
scsi-36d4ae520a6226f00293f990c07bd03b6-part3 ONLINE 0 0 0

errors: No known data errors

ls /dev/disk/by-id
scsi-36d4ae520a6226f00293f990c07bd03b6 scsi-36d4ae520a6226f00293f990c07bd03b6-part3 wwn-0x6d4ae520a6226f00293f990c07bd03b6-part2
scsi-36d4ae520a6226f00293f990c07bd03b6-part1 wwn-0x6d4ae520a6226f00293f990c07bd03b6 wwn-0x6d4ae520a6226f00293f990c07bd03b6-part3
scsi-36d4ae520a6226f00293f990c07bd03b6-part2 wwn-0x6d4ae520a6226f00293f990c07bd03b6-part1

ls -ahlp /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 200 May 12 15:25 ./
drwxr-xr-x 7 root root 140 May 12 15:25 ../
lrwxrwxrwx 1 root root 9 May 12 15:25 scsi-36d4ae520a6226f00293f990c07bd03b6 -> ../../sda
lrwxrwxrwx 1 root root 10 May 12 15:25 scsi-36d4ae520a6226f00293f990c07bd03b6-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 12 15:25 scsi-36d4ae520a6226f00293f990c07bd03b6-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 May 12 15:25 scsi-36d4ae520a6226f00293f990c07bd03b6-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 May 12 15:25 wwn-0x6d4ae520a6226f00293f990c07bd03b6 -> ../../sda
lrwxrwxrwx 1 root root 10 May 12 15:25 wwn-0x6d4ae520a6226f00293f990c07bd03b6-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 12 15:25 wwn-0x6d4ae520a6226f00293f990c07bd03b6-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 May 12 15:25 wwn-0x6d4ae520a6226f00293f990c07bd03b6-part3 -> ../../sda3

Working node:

1683923071125-png.50342


And then in one of the the none-working nodes with replacement disks:
 

Attachments

  • 1683923071125.png
    1683923071125.png
    33.3 KB · Views: 32
  • 2023-05-12_16-24-49.jpg
    2023-05-12_16-24-49.jpg
    29.6 KB · Views: 2

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!