Advice Needed for Replacing HDDs in Mirrored ZFS Pool on Proxmox

ana_mera

New Member
Jun 17, 2024
12
0
1
Hello Proxmox Community,

I’m currently running a Proxmox VE setup on a Hetzner AX161 server and am planning to upgrade my storage. Here’s an overview of my current configuration and what I intend to do:

Current Setup:​

  • Boot Disk: Mirrored SSDs (rpool):
    • Purpose: Hosting the Proxmox system, root filesystem, and subvolumes for containers (CTs).
    • Configuration: Mirrored ZFS pool (rpool), identified as SSDs.
    • Datasets: rpool/ROOT/pve-1 is mounted as /, and several container subvolumes are under rpool/data.
  • Data Disk: Mirrored HDDs (backup):
    • Purpose: Used primarily for storing VZDump backups.
    • Configuration: Mirrored ZFS pool (backup) with two 2 TB HDDs.

Planned Upgrade:​

I want to replace the existing 2 TB HDDs in the backup pool with new 8 TB HDDs. I’m considering the following approach:

  1. Add the First 8 TB HDD:
    • Confirm there is available space or temporarily remove one 2 TB HDD to add the new 8 TB HDD.
    • Add the new 8 TB HDD to the server.
  2. Replace One of the 2 TB HDDs:
    • Use the ZFS replace command to replace the first 2 TB HDD with the new 8 TB HDD. (example: zpool replace backup /dev/sdd /dev/sdc)
    • Monitor and complete the resilvering process. (via pool status backup)
  3. Remove the Replaced 2 TB HDD:
    • Once resilvering is done, remove the old 2 TB HDD.
  4. Add the Second 8 TB HDD:
    • Add the second 8 TB HDD to the server.
  5. Replace the Remaining 2 TB HDD:
    • Use the ZFS replace command to replace the remaining 2 TB HDD with the second 8 TB HDD.
    • Monitor and complete the resilvering process.
  6. Expand the ZFS Pool:
    • After both drives are replaced and resilvering is complete, expand the backup pool to utilize the full 8 TB capacity using the ZFS autoexpand feature (example: zpool set autoexpand=on backup)

Questions:​

  1. Is this step-by-step approach the best way to replace my existing 2 TB HDDs with 8 TB HDDs while ensuring data integrity and minimal downtime?
  2. Are there any risks or potential issues I should be aware of during this process?
  3. Do I need to manually set autoexpand=on before or after the drive replacement, or is it automatic?
  4. Given my current setup, is there anything else I should consider or prepare for before starting the upgrade?
Thank you for your advice and support!

Ana
 
Use the ZFS replace command to replace the first 2 TB HDD with the new 8 TB HDD. (example: zpool replace backup /dev/sdd /dev/sdc)
I recommend using the disk-id (/dev/disk/by-id/*) instead of the label. (The label could change, the disk-id doesn't.)

Is this step-by-step approach the best way to replace my existing 2 TB HDDs with 8 TB HDDs while ensuring data integrity and minimal downtime?
AFAIK yes.

Are there any risks or potential issues I should be aware of during this process?
There shouldn't be.

Do I need to manually set autoexpand=on before or after the drive replacement, or is it automatic?
You can set autoexpand=on before replacing the devices or you could execute zpool online -e pool device to manually expand the zpool (after replacing the devices obviously).

Otherwise this approach looks good!
 
My recipe:



Zpool replace disk
==================

Get disk IDs
ls -l /dev/disk/by-id/*


Get zpool status:

zpool status

this assumes the following disk layout:

Part 1: BIOS Boot
Part 2: EFI
Part 3: ZFS

Copy Partitions from working to new disk, without copying label and UUIDs:

sfdisk -d /dev/WORKING | sed 's/, uuid.*//; /label-id/d;' |sfdisk /dev/REPLACEMENT

Replace Disk, give ZFS Partition
zpool replace zp_pve /dev/disk/by-id/nvme-OLD-part3 /dev/disk/by-id/nvme-iREPLACEMENT-part3


Check status, should resilver:
zpool status


Rewrite Bootloader:
proxmox-boot-tool format /dev/disk/by-id/nvme-REPLACEMENT-part2
proxmox-boot-tool init /dev/disk/by-id/nvme-REPLACEMENT-part2
proxmox-boot-tool status

Clean /etc/kernel/proxmox-boot-uuids of old entries

proxmox-boot-tool status
proxmox-boot-tool refresh
proxmox-boot-tool clean
 
Thank you for quick responce!

I have acctually two more questions:

1. Should I detach the disk from the pool before i remove it? (via zpool detach command)

2. Which approach is recommended:

-Remove one of the 2 TB HDDs first, add the new 8 TB HDD, and then mirror the data from the remaining old 2 TB HDD?
-Add the 8 TB HDD first, mirror it with one of the old 2 TB HDDs, and then remove the old 2 TB HDD?


Thank you in advance!

Ana
 
My recipe:



Zpool replace disk
==================

Get disk IDs
ls -l /dev/disk/by-id/*


Get zpool status:

zpool status

this assumes the following disk layout:

Part 1: BIOS Boot
Part 2: EFI
Part 3: ZFS

Copy Partitions from working to new disk, without copying label and UUIDs:

sfdisk -d /dev/WORKING | sed 's/, uuid.*//; /label-id/d;' |sfdisk /dev/REPLACEMENT

Replace Disk, give ZFS Partition
zpool replace zp_pve /dev/disk/by-id/nvme-OLD-part3 /dev/disk/by-id/nvme-iREPLACEMENT-part3


Check status, should resilver:
zpool status


Rewrite Bootloader:
proxmox-boot-tool format /dev/disk/by-id/nvme-REPLACEMENT-part2
proxmox-boot-tool init /dev/disk/by-id/nvme-REPLACEMENT-part2
proxmox-boot-tool status

Clean /etc/kernel/proxmox-boot-uuids of old entries

proxmox-boot-tool status
proxmox-boot-tool refresh
proxmox-boot-tool clean
Thank you for your recipe :)

I think i do not need all of that, since my current HDDs are not bootable at all, but my SSDs are.

So the only command that i need is acctually zpool replace
 
I would:

Remove one of the 2 TB HDDs first, add the new 8 TB HDD, and then mirror the data from the remaining old 2 TB HDD

but both should work, if you have the free slot
 
I recommend using the disk-id (/dev/disk/by-id/*) instead of the label. (The label could change, the disk-id doesn't.)


AFAIK yes.


There shouldn't be.


You can set autoexpand=on before replacing the devices or you could execute zpool online -e pool device to manually expand the zpool (after replacing the devices obviously).

Otherwise this approach looks good!
Hi Gabriel,

Thank you for answering

I’ve just received confirmation from Hetzner that there are no additional slots available for new drives. This means I need to remove one of the existing 2 TB HDDs before I can add the new 8 TB HDD.

My Questions:​

  1. Is it safe for Hetzner to simply remove the old 2 TB HDD directly?
    • Should I perform any commands on Proxmox or ZFS before they physically remove the drive to ensure the system handles the change correctly?
  2. Hot Swapping Concerns:
    • Hetzner has indicated that hot swapping is supported on their end. However, I’m unsure if Proxmox fully supports hot swapping for ZFS pools and if it is advisable in my scenario.
    • Do you recommend using hot swapping for this replacement, or should I consider scheduling a downtime for the replacement to minimize risks?
Thank you for your advice and support!

Ana
 
HI!
in this case I'd use zpool detach and zpool attach. Before removing the old drive, detach it from the pool like this: zpool detach <pool> <device>, then you can physically remove it. After that insert the new one and execute zpool attach <pool> <other-device-in-mirror> <new-device>. The pool will automatically resilver and should work again.

Hetzner has indicated that hot swapping is supported on their end. However, I’m unsure if Proxmox fully supports hot swapping for ZFS pools and if it is advisable in my scenario.
You are not really hotswapping though as (if you go with the zpool attach/detach route) you are manually removing the disk from the zpool and adding another one.
Do you recommend using hot swapping for this replacement, or should I consider scheduling a downtime for the replacement to minimize risks?
Technically this should work without any downtime, but it is up to you if you feel comfortable doing it or if you need to schedule a downtime.
 
HI!
in this case I'd use zpool detach and zpool attach. Before removing the old drive, detach it from the pool like this: zpool detach <pool> <device>, then you can physically remove it. After that insert the new one and execute zpool attach <pool> <other-device-in-mirror> <new-device>. The pool will automatically resilver and should work again.


You are not really hotswapping though as (if you go with the zpool attach/detach route) you are manually removing the disk from the zpool and adding another one.

Technically this should work without any downtime, but it is up to you if you feel comfortable doing it or if you need to schedule a downtime.
Hi Gabriel!

Thanks for your answer!

Ok I understand, then I will use zpool detach and zpool attach. The swap will be done offline to ensure safety.

My Questions:​

  1. How long does the zpool detach command typically take to complete?
    • It’s crucial for me to know the duration because the physical removal of the HDD by Hetzner is scheduled shortly after I perform the detach operation.
  2. Here’s the plan I’ve prepared for tonight’s operation. Can you please review and confirm if it looks correct?

# Enable autoexpand for the pool
zpool set autoexpand=on <zpool>

# Detach the old disk from the pool
zpool detach <zpool> /dev/disk/by-id/<old-device-id>

# Shutdown the server safely
shutdown -h now

# (After swapping the disks and booting up the server)
# Attach the new disk to the pool
zpool attach <zpool> /dev/disk/by-id/<existing-device-id> /dev/disk/by-id/<new-device-id>

Thank you for your assistance!
Ana
 
Hi!
How long does the zpool detach command typically take to complete?
I can't tell you that exactly, but the detach/attach should be quite fast. The resilvering after the attach will take a while though.

The plan looks fine to me!
 
since AI may take you here (as it did with me), keep in mind the full officiale well done doc https://pve.proxmox.com/wiki/ZFS_on_Linux has all commands available:

We replaced today a faulty disk: we asked the replacemente before detaching the disk from zfs, which maybe is not good, but anyhow we did it,


Code:
$ zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun 30 13:56:30 2024
    2.48T / 2.62T scanned at 2.27G/s, 0B / 1.34T issued
    0B resilvered, 0.00% done, no estimated completion time
config:

    NAME                                        STATE     READ WRITE CKSUM
    rpool                                       DEGRADED     0     0     0
      mirror-0                                  DEGRADED     0     0     0
        ata-ST4000NM0024-1HT178_Z4F0MW6M-part3  ONLINE       0     0     0
        1934952624090884359                     UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST4000NM0024-1HT178_Z4F0R46X-part3
      mirror-1                                  ONLINE       0     0     0
        ata-ST4000NM0024-1HT178_Z4F0RGLL-part3  ONLINE       0     0     0
        ata-ST4000NM0024-1HT178_Z4F0RJ5L-part3  ONLINE       0     0     0

root@proxmox:~# proxmox-boot-tool status
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
B0CB-DBA4 is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
WARN: /dev/disk/by-uuid/B0CD-78EB does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
B0CE-D08D is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
B0D1-182E is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)


where the new disk is /dev/sdb (with not partition at all of course); then, as in https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_change_failed_dev

Changing a failed bootable device)​



Code:
# sgdisk <healthy bootable device> -R <new device>
$ sgdisk /dev/sda -R /dev/sdb
The operation has completed successfully.

# reissuing GUIDs
$ sgdisk -G  /dev/sdb
The operation has completed successfully.

# now I can see partition "*-part3" in the new disk, so I can replace it
# replacing the ZFS partition
$ zpool replace -f rpool /dev/disk/by-id/ata-ST4000NM0024-1HT178_Z4F0R46X-part3 /dev/disk/by-id/ata-ST4000NM0245-1Z2107_ZC112AAW-part3


and now the resilvering starts


Code:
zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun 30 13:56:30 2024
    2.62T / 2.62T scanned, 2.88G / 1.20T issued at 5.13M/s
    333M resilvered, 0.23% done, no estimated completion time
config:

    NAME                                          STATE     READ WRITE CKSUM
    rpool                                         DEGRADED     0     0     0
      mirror-0                                    DEGRADED     0     0     0
        ata-ST4000NM0024-1HT178_Z4F0MW6M-part3    ONLINE       0     0     0
        replacing-1                               DEGRADED     0     0     0
          1934952624090884359                     UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST4000NM0024-1HT178_Z4F0R46X-part3
          ata-ST4000NM0245-1Z2107_ZC112AAW-part3  ONLINE       0     0     0  (resilvering)
      mirror-1                                    ONLINE       0     0     0
        ata-ST4000NM0024-1HT178_Z4F0RGLL-part3    ONLINE       0     0     0
        ata-ST4000NM0024-1HT178_Z4F0RJ5L-part3    ONLINE       0     0     0

errors: No known data errors

then, since we have uefi


Code:
$ proxmox-boot-tool format /dev/disk/by-id/ata-ST4000NM0245-1Z2107_ZC112AAW-part2
UUID="" SIZE="1073741824" FSTYPE="" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdb" MOUNTPOINT=""
Formatting '/dev/disk/by-id/ata-ST4000NM0245-1Z2107_ZC112AAW-part2' as vfat..
mkfs.fat 4.2 (2021-01-31)
Done.

$ proxmox-boot-tool init /dev/disk/by-id/ata-ST4000NM0245-1Z2107_ZC112AAW-part2
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
UUID="0AA7-FE3B" SIZE="1073741824" FSTYPE="vfat" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdb" MOUNTPOINT=""
Mounting '/dev/disk/by-id/ata-ST4000NM0245-1Z2107_ZC112AAW-part2' on '/var/tmp/espmounts/0AA7-FE3B'.
Installing systemd-boot..
Created "/var/tmp/espmounts/0AA7-FE3B/EFI/systemd".
Created "/var/tmp/espmounts/0AA7-FE3B/EFI/BOOT".
Created "/var/tmp/espmounts/0AA7-FE3B/loader".
Created "/var/tmp/espmounts/0AA7-FE3B/loader/entries".
Created "/var/tmp/espmounts/0AA7-FE3B/EFI/Linux".
Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/var/tmp/espmounts/0AA7-FE3B/EFI/systemd/systemd-bootx64.efi".
Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/var/tmp/espmounts/0AA7-FE3B/EFI/BOOT/BOOTX64.EFI".
Random seed file /var/tmp/espmounts/0AA7-FE3B/loader/random-seed successfully written (32 bytes).
Unable to write 'LoaderSystemToken' EFI variable (firmware problem?), ignoring: Invalid argument
Created EFI boot entry "Linux Boot Manager".
Configuring systemd-boot..
Unmounting '/dev/disk/by-id/ata-ST4000NM0245-1Z2107_ZC112AAW-part2'.
Adding '/dev/disk/by-id/ata-ST4000NM0245-1Z2107_ZC112AAW-part2' to list of synced ESPs..
Refreshing kernels and initrds..
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Copying and configuring kernels on /dev/disk/by-uuid/0AA7-FE3B
    Copying kernel and creating boot-entry for 6.2.16-20-pve
    Copying kernel and creating boot-entry for 6.5.11-8-pve
    Copying kernel and creating boot-entry for 6.5.13-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/B0CB-DBA4
    Copying kernel and creating boot-entry for 6.2.16-20-pve
    Copying kernel and creating boot-entry for 6.5.11-8-pve
    Copying kernel and creating boot-entry for 6.5.13-1-pve
WARN: /dev/disk/by-uuid/B0CD-78EB does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
Copying and configuring kernels on /dev/disk/by-uuid/B0CE-D08D
    Copying kernel and creating boot-entry for 6.2.16-20-pve
    Copying kernel and creating boot-entry for 6.5.11-8-pve
    Copying kernel and creating boot-entry for 6.5.13-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/B0D1-182E
    Copying kernel and creating boot-entry for 6.2.16-20-pve
    Copying kernel and creating boot-entry for 6.5.11-8-pve
    Copying kernel and creating boot-entry for 6.5.13-1-pve


$ proxmox-boot-tool status
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
0AA7-FE3B is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
B0CB-DBA4 is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
WARN: /dev/disk/by-uuid/B0CD-78EB does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
B0CE-D08D is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
B0D1-182E is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)


$ proxmox-boot-tool clean
Checking whether ESP '0AA7-FE3B' exists.. Found!
Checking whether ESP 'B0CB-DBA4' exists.. Found!
Checking whether ESP 'B0CD-78EB' exists.. Not found!
Checking whether ESP 'B0CE-D08D' exists.. Found!
Checking whether ESP 'B0D1-182E' exists.. Found!
Sorting and removing duplicate ESPs..

$ proxmox-boot-tool status
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
0AA7-FE3B is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
B0CB-DBA4 is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
B0CE-D08D is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
B0D1-182E is configured with: uefi (versions: 6.2.16-20-pve, 6.5.11-8-pve, 6.5.13-1-pve)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!