How to replace a boot drive (rpool) in a ZFS 0 Mirror

oldfart

Well-Known Member
May 14, 2019
40
2
48
70
I do not understand the sequence to replace a failing boot disk.

Read a load of posts, and still unsure how to proceed

Proxmox 8.03
I have 2 x Samsung EVO ssd in mirror as boot disks, one is continually failing.

Code:
proxmox-boot-tool status
System currently booted with legacy bios
1D14-304E is configured with: grub (versions: 6.2.16-3-pve)
1D14-FB50 is configured with: grub (versions: 6.2.16-3-pve)

Code:
zpool status rpool
  pool: rpool
 state: ONLINE
  scan: resilvered 333M in 00:00:01 with 0 errors on Sat Nov 25 10:09:01 2023
config:
        NAME                                                     STATE     READ WRITE CKSUM
        rpool                                                    ONLINE       0     0     0
          mirror-0                                               ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10437J-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10430F-part3  ONLINE       0     0     0
the Samsung_SSD_870_EVO_250GB_S6PENU0TB10430F fails

Code:
ls -alh /dev/disk/by-id
lrwxrwxrwx 1 root root   9 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10430F -> ../../sdc
lrwxrwxrwx 1 root root  10 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10430F-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10430F-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  10 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10430F-part3 -> ../../sdc3
lrwxrwxrwx 1 root root   9 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10437J -> ../../sda
lrwxrwxrwx 1 root root  10 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10437J-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10437J-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Nov 23 12:08 ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10437J-part3 -> ../../sda3

Code:
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 870
/dev/sda1       34      2047      2014  1007K BIOS boot
/dev/sda2     2048   2099199   2097152     1G EFI System
/dev/sda3  2099200 488397134 486297935 231.9G Solaris /usr & Apple ZFS
Disk /dev/sdc: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 870
/dev/sdc1       34      2047      2014  1007K BIOS boot
/dev/sdc2     2048   2099199   2097152     1G EFI System
/dev/sdc3  2099200 488397134 486297935 231.9G Solaris /usr & Apple ZFS

question about how to proceed - is this correct?:
1 shut down
2 remove faulty disk
3 insert new disk (in my case /dev/sdc)
4 power on
5 sgdisk /dev/sda -R /dev/sdc
6 sgdisk -G /dev/sdc
6.5 do I need to do ls -alh /dev/disk/by-id to get the new disk id?
6.6 new disk id something like: ata-Samsung_SSD_870_EVO_250GB_S6PENU0TB10xxxxx
6.7 in the next command, do I replace by-id with the id above?
7 zpool replace -f rpool sdc-part3 /dev/disk/by-id/sda-part3
8 zpool status -v rpool
9 proxmox-boot-tool format /dev/sdc2
10 proxmox-boot-tool init /dev/sdc2
11 proxmox-boot-tool refresh

or have I mixed all my reading up
Wish there exists a dummy guide
 
I have 2 x Samsung EVO ssd in mirror as boot disks, one is continually failing.
It's highly recommended to NOT use consumer SSDs like those EVOs with ZFS.

10 proxmox-boot-tool init /dev/sdc2
You are using grub so the "grub" is missing. Should be "proxmox-boot-tool init /dev/sdc2 grub".

zpool replace -f rpool sdc-part3 /dev/disk/by-id/sda-part3
You mixed that. According to what you wrote earlier you would replace the working with the new disk... losing all the data. It's "# zpool replace -f <pool> <old zfs partition> <new zfs partition>" and you said sdc is the new disk.
 
  • Like
Reactions: oldfart