Help expanding root ZFS rpool partition (self.Proxmox)

Urbz

Member
Nov 24, 2020
8
0
6
27
One of my the drives in my mirrored root ZFS rpool died on me. I have swapped both of original drives with new larger ones. Everything went smoothly and things are running great however I am struggling with expanding the partition to utilize the space of the larger drives.

I used "zpool set autoexpand=on rpool" but even after a reboot nothing has changed. I saw another thread where it was recommended to boot into gparted on a live CD and resize the partision manually from there however ZFS is not supported by gparted or any other GUI partitioning software.

Ive been googling for ages and I'm stuck. Any help would be greatly appreciated.
 
I'm running into the same issue, but in my case I (think I) cannot use the solution mentioned by @ph0x since I only have a single SSD which contains everything (default Proxmox installation on single disk (RAID0) ZFS).

So I cloned the disk using Clonezilla, which actually worked pretty well (as in, system boots and PVE runs fine). Except for sda3 still being the old smaller size (new disk is 480GB):
Code:
root@node1:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 447,1G  0 disk 
├─sda1   8:1    0  1007K  0 part 
├─sda2   8:2    0   512M  0 part 
└─sda3   8:3    0 232,4G  0 part

So the instructions on growing the rpool are pretty straightforward, but that will only work if the *partition* has the required size beforehand, right?

Is there any way to enlarge a ZFS *partition*? If not, what's the way to go if I don't want to (or can't) re-install the entire system?

Thinking ahead, would the following work?
1. Add second disk of same size (luckily I still have one)
2. Clone only sda1 and sda2 (since I'm booting straight to systemd via UEFI, no need to worry about GRUB)
3. Boot PVE from original SSD
4. Use fdisk(?) to create new partition sdb3, size 430G (to allow for a swap partition in the remaining free space in the future)
5. Create rpool on sdb3, full size
6. Copy data from sda3 to sdb3 (via zfs send/receive? -> suggested elswhere)
7. Shut down system
8. Swap SSD's
9. Cross fingers
10. Boot system

Also, when I run gdisk (tried that in my quest to find a way to enlarge the partition) on the cloned partition, I get the following error:
Code:
root@node1:~# gdisk /dev/sda3
GPT fdisk (gdisk) version 1.0.3

Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: damaged

Found invalid MBR and corrupt GPT. What do you want to do? (Using the
GPT MAY permit recovery of GPT data.)
 1 - Use current GPT
 2 - Create blank GPT

Your answer:
Not sure what to do here... I'm tempted to choose 'Create blank GPT' in the hopes it will allow me to expand the partition size, but a) I might loose all data (?) and b) this might not work anyway?
 
ZFS is not really useful on a single disk. If you're going to add a second disk anyway, consider creating a mirror which then gives you the opportunity to expand the pool like in the mentioned post above.
Regarding the GPT: if you add the partition boundaries on the exact same positions again, your data should still be there. But I don't see how that operation is going to be any better than a reinstall with ext4 this time.
 
ZFS is not really useful on a single disk. If you're going to add a second disk anyway, consider creating a mirror which then gives you the opportunity to expand the pool like in the mentioned post above.
Regarding the GPT: if you add the partition boundaries on the exact same positions again, your data should still be there. But I don't see how that operation is going to be any better than a reinstall with ext4 this time.
Thanks for your reply. And good point. I'd really like to prevent having to reinstall, since that caused a lot headaches the first time (i.e. a Sandybridge node causing the PVE installer to crash because of bugs in the Linux Intel graphics driver which will never be fixed). That's why I went with the cloning-route first. I'm reading here that (at least on BSD) you can actually use a partitioning tool on a Live USB to resize the partition. Would that work on Linux as well?

The primary reason I chose ZFS is because of snapshots. This IMHO is such an advantage over ext4 I'd chose ZFS on servers (or BTRFS on desktops) anytime. Unless I got this all wrong and there's a way to do snapshots with ext4 as well?

I'm not planning on adding a second disk to keep both in the system - it would only function as a way of allowing to use all available space on either disk.
Just to draw the picture of my situation here:
It's a home/smb 3-node cluster (with maybe a fourth node soon) setup made from old spare systems and parts primarily. The only reason I invested in 4 server-class Samsung SSD's is that the old consumer SSD's I started with had their Wearout-levels skyrocketing once they were being used in the Proxmox + ZFS + cluster setup.
So the idea was to replace the existing consumer SSD's with server-class SSD's and be done with it. 1 for each node. I'm now in the middle of that migration process.

Unfortunately, I don't have the budget to invest in even more server-class SSD's to do mirroring. Besides that, I don't think the risk of a disk dying is worth the investment in my situation - I can move all VM's freely around since I have the cluster. Replication jobs (with shorter intervals for VM's such as Nextcloud) minimize the risk of data loss. But maybe I'm missing something?

Next to this, I'm still thinking on what to do with the old consumer SSD's once the migration to the server-class SSD's has completed on all nodes. I might re-add them to each node and use them for non-write-intensive data (i.e. VM templates, ISO's and multimedia). In that case, snapshots probably won't be required, so you say ext4 is preferred then? Or would ZFS provide features that could be useful on those old disks as well (i.e. considering the primary disks will be ZFS anyway)?
 
Snapshots is surely on of the advantages of zfs over ext4 (although you can make snapshots with an lvm, but it's pretty slow). You basically can't make use of the self-healing with only one disk.
I usually resize partitions with parted but I have never tried that with zfs, to be honest, so this might or might not work. :)
 
First result of googling "zfs expand larger disks". Is this what might help you?

https://www.ateamsystems.com/tech-blog/expand-zfs-to-use-larger-disks-free-space/
This is my problem. The Proxmox wiki says to do the following followed by installing the bootloader on the new disk.
Code:
# sgdisk <healthy bootable device> -R <new device>
# sgdisk -G <new device>
# zpool replace -f <pool> <old zfs partition> <new zfs partition>
The first line is copying the partition table from the old healthy drive to the new one. The issue is if the new drive is larger then that partition size isn't correct. When ZFS looks to expand I guess it doesn't see the unused space because the partition table has already been written differently?

The link you provided doesn't use some of the commands listed on the wiki so I'm wondering how that works with Proxmox. Will the correct partisions for the bootloader and such be created? Theres also no mention of using zpool set autoexpand=on which I thought was the way ZFS expanded in the first place.

I'm no expert so I would love yours or anyone elses take on this.
 
Unfortunately, I'm also not a zfs expert ... But since everything runs smoothly as you said, did you try the command mentioned in the link? Even Oracle docs say to run it, so I would at least give it a shot if everything else works already.
 
Unfortunately, I'm also not a zfs expert ... But since everything runs smoothly as you said, did you try the command mentioned in the link? Even Oracle docs say to run it, so I would at least give it a shot if everything else works already.
Yup I ran the command then rebooted but but the pool is still showing the old capacity. The oracle docs say you can use the command before or after the other steps but maybe Proxmox gets confused if you run it after like I did.

Thanks for the help though.
 
It might well be related to the partitions being used rather than whole disks.
If you're brave, create a new GPT and add the three partitions manually with the same boundaries ... :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!