Proxmox VE 9 existing ZFS vdev expansion?

liamlows

Well-Known Member
Jan 9, 2020
41
13
48
Hello,

So I recently performed my upgrade from proxmox VE 8 to 9 and with it I was excited to see that v.9 now includes a form of ZFS expansion:

"ZFS now supports adding new devices to existing RAIDZ pools with minimal downtime."

I currently have a ZFS pool created on one of my PVE nodes that consists of 12 1.2TB disks. I purchased 12 more 1.2TB disks (of same size and features just a different manufacturer) and intend to add the additional capacity to the pool. My hope was to use the newly added feature of ZFS to perform a vdev expansion (not creating a new vdev and adding it to the pool) so my singular vdev in the pool I currently use would go from 12 disks in raidz2 to 24 disks in raidz2 with no data loss. However, when I went into the host i checked the feature flags for feature@vdev_expansion, I realized that it was not present implying this change is not possible.

Is this the case? Is my only option to create another vdev of 12 disks and add it to the pool or destroy the old one and create a new one with all 24 disks? I wasn't sure what the roadmap meant when it said the above so I was hoping someone here could provide some guidance.

Thanks in advance!
 
Hello,

So I recently performed my upgrade from proxmox VE 8 to 9 and with it I was excited to see that v.9 now includes a form of ZFS expansion:

"ZFS now supports adding new devices to existing RAIDZ pools with minimal downtime."

I currently have a ZFS pool created on one of my PVE nodes that consists of 12 1.2TB disks. I purchased 12 more 1.2TB disks (of same size and features just a different manufacturer) and intend to add the additional capacity to the pool. My hope was to use the newly added feature of ZFS to perform a vdev expansion (not creating a new vdev and adding it to the pool) so my singular vdev in the pool I currently use would go from 12 disks in raidz2 to 24 disks in raidz2 with no data loss. However, when I went into the host i checked the feature flags for feature@vdev_expansion, I realized that it was not present implying this change is not possible.

Is this the case? Is my only option to create another vdev of 12 disks and add it to the pool or destroy the old one and create a new one with all 24 disks? I wasn't sure what the roadmap meant when it said the above so I was hoping someone here could provide some guidance.

Thanks in advance!
My guess is that you should do
Code:
zpool POOL upgrade
in order to activate new features...
 
  • Like
Reactions: liamlows
My guess is that you should do
Code:
zpool POOL upgrade
in order to activate new features...
Oh nice I wasn't aware that was needed. I went ahead and ran the command zpool upgrade ZFS1 and it looks like it added the feature raidz_expansion but that isn't strictly the vdev expansion feature (or so it seems). Do you know if the vdev expansion is supported with v9 of proxmox?

EDIT: it looks like this is the intended feature to support vdev expansion. I am curious though, is this the best way to handle this expansion? Would it be better to simply add a secondary vdev rather than attempt to expand the first one? Are there any added benefits other than increasing parity accross 2 vdevs instead of 1?
 
Last edited:
I currently have a ZFS pool created on one of my PVE nodes that consists of 12 1.2TB disks. I purchased 12 more 1.2TB disks (of same size and features just a different manufacturer) and intend to add the additional capacity to the pool. My hope was to use the newly added feature of ZFS to perform a vdev expansion (not creating a new vdev and adding it to the pool) so my singular vdev in the pool I currently use would go from 12 disks in raidz2 to 24 disks in raidz2
No, no, no - do not do this.

I see that this is tempting, but a 12 disk wide vdev is already considered "wide".

Additionally: two vdevs will double IOPS, which is always a good idea ;-)


Disclaimer: I've never setup a 24 wide vdev, not even for testing - and I never will!
Disclaimer2: just my 2 €¢ and ymmv - as usual...
 
No, no, no - do not do this.

I see that this is tempting, but a 12 disk wide vdev is already considered "wide".

Additionally: two vdevs will double IOPS, which is always a good idea ;-)


Disclaimer: I've never setup a 24 wide vdev, not even for testing - and I never will!
Disclaimer2: just my 2 €¢ and ymmv - as usual...
Thanks for the insight! Yea as I've been thinking about it I actually feel like the two 12 disks Vdevs in the pool would be better than 1. If I simply create the new vdev and add it to the pool, should I do anything after that to optimize the new storage layout?
 
  • Like
Reactions: UdoB
If I simply create the new vdev and add it to the pool, should I do anything after that to optimize the new storage layout?
Well..., after adding that second vdev everything will work and the added capacity is available immediately. There is no additional step required.

But all "old" data is still stored on the old vdev. There is no automatic rebalancing. Last time I did this I utilized an external script to shuffle data around - which took days to finish.

If I remember correctly newer ZFS releases offer an integrated helper mechanism, but unfortunately I have not reference available.
 
But all "old" data is still stored on the old vdev. There is no automatic rebalancing. Last time I did this I utilized an external script to shuffle data around - which took days to finish.
I know almost nothing about zfs and related stuff, but it's should not be a resilvering in place at this point?

But I guess a resilver is just well loose some vdev in the same raid-z, right?
 
Last edited:
  • Like
Reactions: Johannes S
I know almost nothing about zfs and related stuff, but it's should not be a resilvering in place at this point?
You have a single vdev and you are adding a new, empty, second one. No resilvering, no rebalance, no nothing will happen. Everything is fine as it is.

As already said: all data is on the old vdev, at first. Technically that is fine! Seen from a performance standpoint it is bad. Reading data will happen only utilizing the disks in the old vdev because all of the data is there. Writing data will (mostly) happen only on the new vdev - because this one is empty.

That is the reason one would wish to have the data rebalanced = distributed onto both (all) vdevs. This does not happen automatically. Only after deleting some and writing a lot new data both vdevs may eventually get filled ~equally.

But I guess a resilver is just well loose some vdev in the same raid-z, right?
You can not loose "some vdev in the same raid-z"! One vdev is one RaidZ - or a RaidZ2 in your case.

A RaidZ2 vdev may loose two devices without loosing data. You can loose two drives in the old vdev and two drives in the new vdev at the same time and the data is still readable.

At least that is the plan. But I would produce some sweat if this actually happens to me...

Obligatory hint: Mirroring/RaidZ/Z2/Z3 (plus possibly multiple snapshots) does not count as a backup. Never.
 
@UdoB agreed, I am in no way anticipating the raidz2 array as having backup tenancies at all. Only the fact that I have the resilience of two disks failing prior to data loss occurring in the array. The 3-2-1 backup strategy is what I would use in this case to ensure my data is resilient against serious failure, although I don't only because the storage array contains re-creatable data that I'm not concerned about loosing in a total failure situation.

As far as the whole process, I did end up going the route of simply creating a new vdev and adding it to the pool. If anyone is curious, I wrote a quick little script to clean and add the disks to the new vdev below with some help from chatGPT:
Bash:
#!/usr/bin/env bash
set -euo pipefail

POOL="ZFS1"

# List of new disks
DISKS=(
/dev/disk/by-id/wwn-[ID]
/dev/disk/by-id/wwn-[ID]
... (10 more times)
)

echo "=== Checking disks ==="
for d in "${DISKS[@]}"; do
    if [ ! -e "$d" ]; then
        echo "❌ Disk not found: $d"
        exit 1
    fi
done
echo "✅ All disks found."

echo
echo "=== Clearing old labels and partition data ==="
for d in "${DISKS[@]}"; do
    echo "→ Clearing $d"
    zpool labelclear -f "$d" 2>/dev/null || true
    wipefs -a "$d" || true
done
echo "✅ All disks cleared."

echo
echo "=== Adding new RAIDZ2 vdev to pool: $POOL ==="
zpool add "$POOL" raidz2 "${DISKS[@]}"

echo
echo "✅ Vdev successfully added!"
echo
zpool status -P "$POOL"

I did see some information online about setting the following settings for performance (as pretty much all the files on the array are media files) but was unsure if I should do it, curious to know what others think:
Bash:
zfs set compression=zstd-3 ZFS1
zfs set atime=off ZFS1

I would also be interesting in seeing if anyone knows about re-balancing the pool across both VDEVs and how one would go about doing that or if it would be worth it.
 
There is a script that does what's needed at github.com/markusressel/zfs-inplace-rebalancing.

When I do potentially problematic things like changing the topology (and the current situation allows me to) I set a global checkpoint to have a way back to the previous global state. See man zpool-checkpoint. It has drawbacks, but during maintenance it helps to stay calm if problems rise.
 
  • Like
Reactions: liamlows