[SOLVED] Need help replacing disk in ZFS

cglmicro

Member
Oct 12, 2020
98
11
13
51
Hi guys.

I run PBS with ZFS on a pool named RPOOL that contains 4 drives of 4Tb. /dev/sdb was phasing out and gave tons of errors.

I did a "ls -a /dev/disk/by-id/", and here is the output concerning ths serial number K4KJ220L:
Code:
ata-HGST_HUS726040ALA610_K4KJ220L        lvm-pv-uuid-hQy3ob-CysI-64G2-4oF9-1Awg-KHJz-Vk44Ko                            wwn-0x5000cca244c909cf-part1
ata-HGST_HUS726040ALA610_K4KJ220L-part1  nvme-eui.e8238fa6bf530001001b448b46ae5183                                     wwn-0x5000cca244c909cf-part9
ata-HGST_HUS726040ALA610_K4KJ220L-part9  nvme-eui.e8238fa6bf530001001b448b46ae5183-part1                               wwn-0x5000cca25ccb252d

I did a "zpool offline rpool /dev/sdb" and replaced the disk in the datacenter, and I'm trying to REPLACE the old disk with the new one.
First step was to find his new name (the new is the serial number K3H88TRL:
Code:
root@pbs104:~# ls -a /dev/disk/by-id/
.                                        ata-HGST_HUS726040ALA610_N8GMWBLY-part9                                       nvme-WDC_CL_SN720_SDAQNTW-512G-2000_2008B7800452-part3
..                                       dm-name-pbs-root                                                              wwn-0x5000cca244c7153e
ata-HGST_HUS726040ALA610_K3GTJ1RL        dm-name-pbs-swap                                                              wwn-0x5000cca244c7153e-part1
ata-HGST_HUS726040ALA610_K3GTJ1RL-part1  dm-uuid-LVM-JBgr5pORNQBEXbR0ngPr9lvGBJz93hYl1to1krFai7RxqZDcgqh0MNRPzak47KQX  wwn-0x5000cca244c7153e-part9
ata-HGST_HUS726040ALA610_K3GTJ1RL-part9  dm-uuid-LVM-JBgr5pORNQBEXbR0ngPr9lvGBJz93hYlqNbyzKSP7YZlkFI00Ln3TjcHlY6tHeWG  wwn-0x5000cca244c909cf
ata-HGST_HUS726040ALA610_K3H88TRL        lvm-pv-uuid-hQy3ob-CysI-64G2-4oF9-1Awg-KHJz-Vk44Ko                            wwn-0x5000cca244c909cf-part1
ata-HGST_HUS726040ALA610_K3H88TRL-part1  nvme-eui.e8238fa6bf530001001b448b46ae5183                                     wwn-0x5000cca244c909cf-part9
ata-HGST_HUS726040ALA610_K3H88TRL-part9  nvme-eui.e8238fa6bf530001001b448b46ae5183-part1                               wwn-0x5000cca25ccb252d
ata-HGST_HUS726040ALA610_N8GHL0WY        nvme-eui.e8238fa6bf530001001b448b46ae5183-part2                               wwn-0x5000cca25ccb252d-part1
ata-HGST_HUS726040ALA610_N8GHL0WY-part1  nvme-eui.e8238fa6bf530001001b448b46ae5183-part3                               wwn-0x5000cca25ccb252d-part9
ata-HGST_HUS726040ALA610_N8GHL0WY-part9  nvme-WDC_CL_SN720_SDAQNTW-512G-2000_2008B7800452                              wwn-0x5000cca25cd1db7f
ata-HGST_HUS726040ALA610_N8GMWBLY        nvme-WDC_CL_SN720_SDAQNTW-512G-2000_2008B7800452-part1                        wwn-0x5000cca25cd1db7f-part1
ata-HGST_HUS726040ALA610_N8GMWBLY-part1  nvme-WDC_CL_SN720_SDAQNTW-512G-2000_2008B7800452-part2                        wwn-0x5000cca25cd1db7f-part9

But when I try the REPLACE command I receive this error about a device not existing:
Code:
root@pbs104:~# zpool replace -f rpool /dev/disk/by-id/ata-HGST_HUS726040ALA610_K4KJ220L /dev/disk/by-id/ata-HGST_HUS726040ALA610_K3H88TRL
cannot replace /dev/disk/by-id/ata-HGST_HUS726040ALA610_K4KJ220L with /dev/disk/by-id/ata-HGST_HUS726040ALA610_K3H88TRL: no such device in pool

So I tried this, and it started resilvering:
Code:
zpool replace -f rpool /dev/sdb /dev/disk/by-id/ata-HGST_HUS726040ALA610_K3H88TRL

Now if I ask for a status I get:
Code:
root@pbs104:~# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Feb 14 14:30:02 2023
        825G scanned at 5.09G/s, 36.3G issued at 229M/s, 10.1T total
        8.08G resilvered, 0.35% done, 12:50:02 to go
config:

        NAME                                     STATE     READ WRITE CKSUM
        rpool                                    DEGRADED     0     0     0
          raidz1-0                               DEGRADED     0     0     0
            sda                                  ONLINE       0     0     0
            replacing-1                          DEGRADED     0     0     0
              sdb                                FAULTED      0     0     0  corrupted data
              ata-HGST_HUS726040ALA610_K3H88TRL  ONLINE       0     0     0  (resilvering)
            sdc                                  ONLINE       0     0     0
            sdd                                  ONLINE       0     0     0

errors: No known data errors

Is that normal? Was it the good command to enter? Will the FAULTED drive disapear at the end of the (very long) resilvering?

Thank you.
My question is
 
  • Like
Reactions: Tmanok
Hi Tmanok.

Thank you for your reply.

The resilvering ended, and ZFS removed the faulty drive by itself!
The only thing is the device name that isn't sexy, but I don't really care:
1676469926291.png
 
  • Like
Reactions: Tmanok

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!