PBS Boot Drive replacement

Robkd4

Member
Jan 7, 2021
26
2
23
45
Hi all,

i have my first boot zfs mirror boot drive replacement and struggle a little bit. Its a PBS on baremetal. I missed a info oder alert in Dashboard because an alerting was not send via Email. Yes this are cheap consumer SSD but they are only boot drives.

I found a thread here in the Forum and followed:

#lsblk -d -o name && lsblk -d -o serial
.
.
1642312006007002 "(new replaced drive)"
1642312006001618 "(old working drive from Mirror)"
.
.

#zpool list -v

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 117G 2.47G 115G - - 4% 2% 1.00x DEGRADED -
mirror-0 117G 2.47G 115G - - 4% 2.11% - DEGRADED
6383316981088746747 - - - - - - - - UNAVAIL
ata-INTENSO_SSD_1642312006001618-part3 118G - - - - - - - ONLINE

#ls -l /dev/disk/by-id/

lrwxrwxrwx 1 root root 9 Apr 30 12:21 ata-INTENSO_SSD_1642312006001618 -> ../../sdc
lrwxrwxrwx 1 root root 10 Apr 30 12:21 ata-INTENSO_SSD_1642312006001618-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Apr 30 12:21 ata-INTENSO_SSD_1642312006001618-part2 -> ../../sdc2
lrwxrwxrwx 1 root root 10 Apr 30 12:21 ata-INTENSO_SSD_1642312006001618-part3 -> ../../sdc3
lrwxrwxrwx 1 root root 9 Apr 30 12:21 ata-INTENSO_SSD_1642312006007002 -> ../../sdb
.
.
.
#proxmox-boot-tool status

Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
WARN: /dev/disk/by-uuid/DF9A-23CD does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
DF9A-A251 is configured with: uefi (versions: 6.8.12-10-pve, 6.8.12-5-pve)

And now i have problems to follow the "Procedure" What are the next steps?

Thx a lot for helping here.

Cheers
 
Last edited:
Hi Magic,

sure:

Code:
# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 00:00:07 with 0 errors on Sun Apr 13 00:24:08 2025
config:

        NAME                                        STATE     READ WRITE CKSUM
        rpool                                       DEGRADED     0     0     0
          mirror-0                                  DEGRADED     0     0     0
            6383316981088746747                     UNAVAIL      0     0     0  was /dev/disk/by-id/ata-INTENSO_SSD_1642312006005304-part3
            ata-INTENSO_SSD_1642312006001618-part3  ONLINE       0     0     0

errors: No known data errors
 
Thank you, before doing anything else, just a friendly reminder to backup your (backup server) data ;)
While looking for information about this I found this older thread which specifically mentions the new disk has to have boot partitions recreated. Such partitions are required to boot but not created by zfs replace by default.

Someone else also wrote a full blog post about this:
https://jordanelver.co.uk/blog/2018/11/26/how-to-replace-a-failed-disk-in-a-zfs-mirror/

Again, make sure you have a backup before replacing the disk, and read very carefully before each step. I go a bit further to identify devise and I use the serial number and brand of each disk before proceeding, just an extra tip if you want to do that.

By all mean please report back here - not sure which PBS version you're using but although the posts are a bit old and mention PVE 7 in one case, allt his should still apply to current ZFS versions.
 
Last edited:
Hi Magic,

thx for the link for Bjorns Blog,

with this procedure is was able to replace the faulted drive easy.

Thx a lot.
 
It's great to have answers wherever you find them.
The vendor also has some advice on this that is going to lead to a slightly different config than the blog article results in.

https://pve.proxmox.com/wiki/ZFS_on_Linux#_zfs_administration
...

Your ZFS pool members were (formerly) labeled like this.
ata-INTENSO_SSD_1642312006005304-part3 (the bad one)
ata-INTENSO_SSD_1642312006001618-part3

If you do zpool status again, you'll see that the new pool member is not going to have that -part3 bit at the end of it.

...

The blog does not describe the process of prepping the disk before adding the partition.
He just tells you to add the whole disk. You can see in his screenshots that's what he's doing too. Bad.

The vendor says to do this.
(For non-Proxmox situations, do this using parted. But let's keep it simple.)
# sgdisk <healthy bootable device> -R <new device>
# sgdisk -G <new device>

If you do, you get the partition setup and can add the -part3 instead of the whole disk.
# zpool replace -f <pool> <old zfs partition> <new zfs partition>

...

I'll admit that I've never been burned by _not_ doing this.
There's no boot partition on the disk you just added. If the first disk dies (semi-likely to happen soon if you bought them together), the second disk with no boot partition is going to be pretty darn hard to use.
I understand this protects you from hard drive size irregularities between disk vendors.
(I'm interested in learning if there are other reasons its recommended.)

These blog directions are not as bad as directions that tell you to mount /dev/sdc into your zpool. Really, this is pretty good.
But its missing a step.
 
Last edited: