Failing SSD in ZFS RAID1 - How to replace SSD

bearhntr

Member
Sep 9, 2022
181
15
23
Atlanta, GA USA
I rebooted one of my HA boxes over the weekend, and a message popped up at the BIOS level that one of the 2 SSD drives is 'near life end'. The system has 2x240GB SSD drives in a ZFS RAID 1 (created during install by Proxmox). The mfg. is sending me a new drive (actually 2) as they appear to be mispresenting the "usage".

How do I replace them without reloading then entire thing? I have a PBS which is backing up the 1xVM & 1xLXC. But how do I replace the SSDs? I know that in a typical Hardware RAID, you replace 1 drive...allow it to rebuild - then do the other. ZFS is not Hardware RAID. This is my first experience with ZFS.
 
Actually found someone who provided these steps in another posting on another forum -- they were most helpful. Did new_Drive_2 replace and mirrored from old_Drive_1 -then re-did the process and mirrored the new_Drive_2 to new_Drive_1.


Zpool replace disk
==================

Get disk IDs:
ls -l /dev/disk/by-id/*

Get zpool status:
zpool status

this assumes the following disk layout:
Part 1: BIOS Boot
Part 2: EFI
Part 3: ZFS

Copy Partitions from working to new disk, without copying label and UUIDs:
sfdisk -d /dev/WORKING | sed 's/, uuid.*//; /label-id/d;' |sfdisk /dev/REPLACEMENT

Replace Disk, give ZFS Partition:
zpool replace zp_pve /dev/disk/by-id/nvme-OLD-part3 /dev/disk/by-id/nvme-iREPLACEMENT-part3

Check status, should resilver:
zpool status

Rewrite Bootloader:
proxmox-boot-tool format /dev/disk/by-id/nvme-REPLACEMENT-part2
proxmox-boot-tool init /dev/disk/by-id/nvme-REPLACEMENT-part2 grub
proxmox-boot-tool status

Clean /etc/kernel/proxmox-boot-uuids of old entries:
proxmox-boot-tool status
proxmox-boot-tool refresh
proxmox-boot-tool clean