Replace Mirrored ZFS Boot Pool (rpool) with Smaller Devices

mattlach

Renowned Member
Mar 23, 2016
181
21
83
Boston, MA
Now I know what you are all thinking.

IT CAN'T BE DONE. ZFS only allows growing pools, not shrinking them.

But bear with me here.

Background:

I boot this server off of two mirrored NVMe drives. I got a fantastic deal on a set of 500GB Samsung 980 Pro drives when I was rebuilding this server last, and while I knew the caution about consumer drives, I gave in to temptation and went ahead anyway. it's a "PRO" drive, right? :p

Well, it is not working out.

One of the 500 Samsung 980 Pro drives keeps - roughly every 90-120 days - going non-responsive. It's like the firmware locks up or something. The drive is listed as connected in lspci and the device in /dev/disk/by-id is still there, but it cannot be accessed at all. Smartctl won't even return anything from it.

The last time this happened, a power cycle brought the drive back and it continued working as normal.

This time, since I have a few critical things running I don't want to interrupt for a couple of days, I tried removing the device using the command line and rescanning the PCIe bus with the system running, hoping it would come back up. The rescan detects the device, but dmesg says "Device not ready", and doesn't re-atttach it.

It was worth a try.

Anyway, so I happen to have this set of enterprise Optane drives I am not using, and I figured I'd swap them in, but unfortunately while the Samsung drives are 500GB, the Optane drives are only 375GB, and ZFS does not support shrinking pools, only growing them.

This leads me to my crazy little plan:

Looking at the one remaining member of rpool in the system, the disk layout looks like this:
Code:
root@proxmox:~# fdisk -l /dev/nvme15n1
Disk /dev/nvme15n1: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Samsung SSD 980 PRO 500GB              
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3D6C5911-E7F6-42A3-80C8-106CE10A0603

Device            Start       End   Sectors   Size Type
/dev/nvme15n1p1      34      2047      2014  1007K BIOS boot
/dev/nvme15n1p2    2048   2099199   2097152     1G EFI System
/dev/nvme15n1p3 2099200 976773134 974673935 464.8G Solaris /usr & Apple ZFS


______________________________________________________________________________________________________________

What if I powered down the server, removed these two drives from the server, installed them in another system, did a "zpool import", followed by a block level backup using zfs send/recv.

I copy (using dd or something like that) the BIOS Boot and EFI partitions to my smaller Optane drives, create partitions using the remaining free space on them, create a new pool named rpool, and restore my backup using zfs send/recv to these new smaller drives.

Would this work?

I would certainly have disks with the same BIOS Boot and EFI system partitions.

I'd also have a new rpool with the same content as before (but smaller)

I'm guessing there is likely more to it than this though.

How does th eproxmox-boot-tool identify the disk to boot from? Does it go by pool GUID?

I imagine I'd at least need to make sure that both the individual disks and the new rpool have the same GUID as the old ones for it to match what proxmox-boot-tool expects to see?

Could I use "zfs set guid=" to set the guid of the new rpool to the same value as the old rpool and have it properly boot from it?

______________________________________________________________________________________________________________

Am I crazy for even considering this, or should I just do a clean install of Proxmox onto the new drives?

After doing so, and having it properly configure the new drives, maybe I could even zfs send/recv the content of the old rpool to the new rpool maybe I could even have the boot drive data be identical, and not have to do any migration/restore?

______________________________________________________________________________________________________________

Maybe the easiest solution would just be to buy a couple of enterprise grade 500GB+ NVMe drives and swap them in using zpool replace and be done with it?



I'd appreciate any suggestions.

Hindsight being 20-20, I kind of wish I had created the boot pool using a small portion of the 500GB drives space to make this kind of replacement easier.

I use these two drives just to boot and for the Debian/Proxmox system, no VM storage. The install only uses ~12GB on the drive... Lesson learned for next time I guess.
 
Last edited:
Am I crazy for even considering this, or should I just do a clean install of Proxmox onto the new drives?

After doing so, and having it properly configure the new drives, maybe I could even zfs send/recv the content of the old rpool to the new rpool maybe I could even have the boot drive data be identical, and not have to do any migration/restore?

______________________________________________________________________________________________________________

Maybe the easiest solution would just be to buy a couple of enterprise grade 500GB+ NVMe drives and swap them in using zpool replace and be done with it?

This would propably really be the easiest solution. However since you say that your install just uses around 12 GB I would backup the operating system with the backup tool of your choice (e.G. rsync or restic), reinstall the OS to the optane drives and restore the old files back.
Maybe somebody else can give information which files are needed to restore the configuration. I'm not sure whether /etc/ is enough or more files are needed to restore your configuration
 
Last edited:
  • Like
Reactions: mattlach
I was in a similar situation a few years ago. That where that guide came into existence. My advice is to test it in a VM first where you recreate the situation. Then you know the procedure and might can adapt it to your needs before you do it on the actual system.
 
I was in a similar situation a few years ago. That where that guide came into existence. My advice is to test it in a VM first where you recreate the situation. Then you know the procedure and might can adapt it to your needs before you do it on the actual system.

Yeah, I am debating back and forth between this, or just doing a clean Proxmox install to my new smaller drives and copying over the content of /etc and /var/lib/pve-cluster/config.db.

This was surprisingly easy the last time I upgraded my server and did a clean install, so I wonder if I am just over-complicating things by trying to duplicate and shrink my boot drives.
 
  • Like
Reactions: Johannes S

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!