Replacing/Upgrading drives in a 3 node replica cluster / shared pool

Sep 24, 2022
13
1
8
Hey folks,
I've got 3 nodes in a replica cluster running Proxmox 7.4-17 (planning to upgrade to 8.1 in a couple of months)

I've got a ZFS Pool across all 3 nodes using an Intel DC SATA HDD in each. I'd like to upgrade all of the drives, but preferably not have any downtime.

Is there good documentation someone can point me to as to how to accomplish this?

I know that I'd want to move any of the active VMs using that storage to another node, then likely remove that node's participation in the pool from DataCenter, but from that point, I'm not sure if I can just replace the drive...and what the next steps might be.

Any help is appreciated. :)

-Bear
 
using an Intel DC SATA HDD in each. I'd like to upgrade all of the drives,
A single disk without redundancy?

Without any prearrangement you could
  1. attach the new drive to this vdev, creating temporarily a mirror
  2. wait for it zu sync
  3. detach/remove the old drive
If you can connect the new drive without first turning the computer off this would work without any downtime of the otherwise affected VMs :)

Please read man zpool-attach and man zpool-detach to verify my advise on your actual system.

Disclaimer: I run all my pools with redundancy...

Edit: if this is your BOOT-disk you need to do some more adjustments regarding the partition table and the boot-mechanism!
 
Last edited:
As a side note, you can create a VM, install Proxmox in it and create the same storage configuration of your host so you can practice the procedure before applying it to your production servers.
 
A single disk without redundancy?

Without any prearrangement you could
  1. attach the new drive to this vdev, creating temporarily a mirror
  2. wait for it zu sync
  3. detach/remove the old drive
If you can connect the new drive without first turning the computer off this would work without any downtime of the otherwise affected VMs :)

Please read man zpool-attach and man zpool-detach to verify my advise on your actual system.

Disclaimer: I run all my pools with redundancy...

Edit: if this is your BOOT-disk you need to do some more adjustments regarding the partition table and the boot-mechanism!
These are mini PCs. The boot volume is in NVMe ZFS RAID-1. The shared cluster pool are single SATA Enterprise SSDs. The replicated VMs/containers are in the shared pool. There’s no redundancy at the disk level for the shared pool, however replicas are set to sync every 15 minutes across nodes, with heartbeats going, and backups to a RAID-6 volume (And subsequently LTO6) are performed every 3 days.

I can’t install a larger SSD in a node without removing the existing SSD given these only have one SATA port.

It looks like I’ll have to move the running VMs/containers, remove the drive in the pool from a node, turn it off, replace the drive, format it and re-add it to the shared pool before moving the VM/containers back over to re-establish quorum and active/passive, if I’m reading this correctly.