Question about ZFS replace

Matteo Calorio

Well-Known Member
Jun 30, 2017
34
0
46
52
Hi, I put on a physical test machine with 2TB 4HDD and installed PBS on ZFS at setup (RAIDZ1), but then one HDD died after a day (recovery hardware).

I replaced the disk and did a "zfs replace": the result is that "zfs status" is now fine, but the new disk has different partitions (two instead of three): what could have happened? can it be a problem?

Bash:
root@pbs:~# zpool status

      

  pool: rpool
        state: DEGRADED
       status: One or more devices could not be used because the label        is missing or
               invalid.  Sufficient replicas exist for the pool to        continue
               functioning in a degraded state.
       action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
       config:
      
               NAME                                                      STATE     READ WRITE CKSUM
               rpool                                                      DEGRADED     0     0     0
                 raidz1-0                                                DEGRADED     0     0     0
                   ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part3        ONLINE       0     0     0
                   ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part3        ONLINE       0     0     0
                   ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part3        ONLINE       0     0     0
                           3608573773871883845                             UNAVAIL              0     0     0  was          /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9181578-part3
      
      
       root@pbs:~# ls -l /dev/disk/by-id/
       lrwxrwxrwx 1 root root  9 May  9 20:17          ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820 -> ../../sdd
       lrwxrwxrwx 1 root root  9 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141 -> ../../sda
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part1 -> ../../sda1
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part2 -> ../../sda2
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part3 -> ../../sda3
       lrwxrwxrwx 1 root root  9 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218 -> ../../sdc
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part1 -> ../../sdc1
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part2 -> ../../sdc2
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part3 -> ../../sdc3
       lrwxrwxrwx 1 root root  9 May  9 20:17        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190 -> ../../sdb
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part1 -> ../../sdb1
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part2 -> ../../sdb2
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part3 -> ../../sdb3
      
      
       root@pbs:~# zpool replace rpool          /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9181578-part3          /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820
      
      
       root@pbs:~# zpool status -v
         pool: rpool
        state: DEGRADED
       status: One or more devices is currently being resilvered.  The        pool will
               continue to function, possibly in a degraded state.
       action: Wait for the resilver to complete.
         scan: resilver in progress since Mon May  9 21:02:49 2022
               808G scanned at 297M/s, 387G issued at 142M/s, 808G        total
               93.4G resilvered, 47.93% done,          00:50:23 to go
       config:
      
               NAME                                                      STATE     READ WRITE CKSUM
               rpool                                                      DEGRADED     0     0     0
                 raidz1-0                                                DEGRADED     0     0     0
                   ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part3        ONLINE       0     0     0
                   ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part3        ONLINE       0     0     0
                   ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part3        ONLINE       0     0     0
                   replacing-3                                            DEGRADED     0     0     0
                     3608573773871883845                                  UNAVAIL      0     0     0  was        /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9181578-part3
                     ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820            ONLINE       0     0     0  (resilvering)
      
       errors: No known data errors
      
      
       root@pbs:~# zpool status -v
         pool: rpool
        state: ONLINE
         scan: resilvered 194G in 01:25:24 with 0 errors on Mon May  9        22:28:13 2022
       config:
      
               NAME                                                      STATE     READ WRITE CKSUM
               rpool                                                      ONLINE       0     0     0
                 raidz1-0                                                ONLINE       0     0     0
                   ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part3        ONLINE       0     0     0
                   ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part3        ONLINE       0     0     0
                   ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part3        ONLINE       0     0     0
                           ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820        ONLINE                0     0     0
      
       errors: No known data errors


  



  

root@pbs:~# ll /dev/disk/by-id/
       lrwxrwxrwx 1 root root  9 May  9 21:02        ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820 -> ../../sdd
       lrwxrwxrwx 1 root root 10 May  9 21:02          ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820-part1 ->          ../../sdd1
         lrwxrwxrwx 1 root root 10 May  9 21:02          ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820-part9 ->          ../../sdd9
       lrwxrwxrwx 1 root root  9 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141 -> ../../sda
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part1 -> ../../sda1
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part2 -> ../../sda2
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARS-00MVWB0_WD-WMAZA1617141-part3 -> ../../sda3
       lrwxrwxrwx 1 root root  9 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218 -> ../../sdc
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part1 -> ../../sdc1
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part2 -> ../../sdc2
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EARX-00PASB0_WD-WCAZAD731218-part3 -> ../../sdc3
       lrwxrwxrwx 1 root root  9 May  9 20:17        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190 -> ../../sdb
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part1 -> ../../sdb1
       lrwxrwxrwx 1 root root 10 May  9 20:17        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 May 9 20:17 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M6ADS190-part3 -> ../../sdb3


As you can see instead of having partitions 1, 2, 3 I have 1 and 9 (9?): Why? did I do something wrong? what can i do now?

1652190878808.png
 
If you boot from that pool you also got a grub or ESP partition beside a partition used for ZFS. So you should not have replaced the whole disk (where ZFS will then create two partitions sdx1 and sdx9) but first have cloned the partition table from the health disks, then synced the bootloader and then just used partition 3 of that new disk as replacement for that pool.

Should be similar to PVE. See paragraph "Changing a failed bootable device" in the PVE documentation for reference: https://pve.proxmox.com/wiki/ZFS_on_Linux#_zfs_administration

So you pool now might work but you won't be abl to boot any longer of that new disk.
 
Very clear, thanks!

But now what do you suggest? Is it better to leave it like it is, also being drive number 4 in the array, or to remove it from the array, re-partition it and insert it again? All drives are operational, now, can I simply do a:

zpool offline -f rpool ata-WDC_WD20EARS-00MVWB0_WD-WCAZA9168820-part1 ?

And then follow the procedure you posted?
 
Shouldn't be a problem to continue like it is now. If more than one disk may die your pool is lost anyway. And if only one disk will die you always have 2 or 3 drives with working bootloader left so booting should work anyway.

If you want to replace it again to do it right it would be good if one of the staff could confirm first if that wiki article for PVE will work for PBS too.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!