Correct procedure for replacing a zpool on specific nodes?

Pyromancer

Member
Jan 25, 2021
31
8
13
48
Summary: I need to know how to remove a zpool from one machine in a cluster while leaving it on the others.

I have a three node cluster, of two matching main servers (M1 and M2) and a third smaller (S3) machine which provides the third vote for failover and hosts a limited range of VMs. Each of the two main machines has two ZFS pools, /tank1 and /tank2, both of which fully replicate between the two machines, with HA enabled on the VMs for failover (in a group of just M1 and M2, as S3 doesn't have the capacity).

But the /tank2 pools on M1 and M2 are tiny and need to be entirely replaced. I know under ZFS I could just swap the disks out one by one and it's auto-grow to the new sizes, but we also want to change the whole setup of the pool, replace with an all-new pool with a different layout.

Currently the /tank2 pools are 7 x 500 gig SSDs, 5 of them in RaidZ1 and two hot spares. The reasons for this are historic, originally there were only 5 disks, but after a few failures we added the spares in empty slots as insurance.

We want to abolish this and replace it with a new pool, 7 x 4TB SSDs in RaidZ2, with cold spares stored in the rack for rapid deployment. At the same time I'm upgrading the servers' memory, doubling it on each - so the first machine will be being fully shut down, and the disk changes can be done while it's off-line.

In preparation for upgrading M2, which we're doing first, all the VMs have been migrated to M1, and replication has been turned off for the few that used the old /tank2 partition. Now I want to remove /tank2 from M2, but leave it in place on M1.

What is the correct procedure for removing /tank2 from M2, without affecting it on M1?

From reading other thread especially https://forum.proxmox.com/threads/correct-procedure-for-zpool-removal.106203/ I gather there's a "deactivate", then "remove", then destroy the zpool, but I can't see those options when I look at the menus for the storage on the specific hosts, and in Datacentre there's just the one entry for each pool.

All the VMs are backed up to a separate backup host so we have protection but I don't want to accidentally nuke /tank2 on M1 until M2 is back up and running with its new pool and new memory.

I realise I could just nuke the zpool from the command line on the host in question to remove it, but would rather not mess up the Proxmox config.

Aside: Should I bring the new pool up as a new /tank2, or as it's all-new would it be better to bring it up as /tank3, and if so, how do I then get the /tank2 VMs on M1 moved over to it on M2 so I can then replace /tank2 on M1 when it's turn to be upgraded comes?
 
For anyone interested in how this went:

First, using the GUI, in Datacentre > Storage, tank2 was previously set to be available on M1 and M2. I edited this and removed it from M2. This removed it from the left hand column on M2, while it remained on M1.

Next, logged in to M2 by SSH I did 'zpool destroy tank2' which removed the zpool from M2.

I then removed the seven 500 gig SSDs, and replaced them with the new 4TB ones.

As I wanted to build the new zpool based on disk unique identifiers rather than /dev/sdX, I used 'ls /dev/disk/by-id |grep <maker-id>' to get the list of newly installed disks. In this case the serial numbers all started with a common identifier so it was easy to get just these seven using grep.

I then built the new tank2 using 'zpool create tank2 raidz2 <disk-ids-from-grep-command-above>'

And finally I went back into Datacentre > Storage and re-enabled M2 as one of the hosts for tank2.

As yet we've only replicated to it rather than running VMs off it, but all appears to be working correctly.
 
  • Like
Reactions: i_am_jam and UdoB