Does PBS handle merging datastores gracefully?

NicJames2378

New Member
Jun 23, 2025
2
0
1
Our PBS instance unexpectedly ran out of storage on it's HDDs after migrating a whole bunch of VMs from vCenter recently. Once the datastore (named "Backup") hit 100%, it pretty quickly decided to not take any more data (as is reasonable). We threw it into Maintenance Mode and added more disks to the appliance before attempting to expand the raidz2, only to discover that this process can take multiple days per disk. As such, we decided that the fastest TTR may be to just offload all of the "Backup" datastore to a temporary ISCSI storage appliance under a new datastore ("ME4024"), delete the "Backup" datastore, and rebuild it new using all new, larger disks (10 total, max the appliance will hold internally). Once the zfs pool is recreated, we want to begin taking backups on the rebuilt 96TB "Backups" datastore again, but had concerns about moving all of our previous backups back to the machine.

We mapped the "ME4024" datastore via ISCSI as that is all the SAN supports. It isn't ideal to us to have our backup server rely on external storage, but this is only a temporary swap space and will suffice for now. The SAN is EoL and will not be permanent. Currently, we have 36TB of data copying to "ME4024" via a sync job. Once this is completed and verified, we are going to upgrade all of the datastore drives local to the PBS server and rebuild a new, larger raidz2 pool from them. This will give us approximately 3x the storage than we currently have. What we are unsure of is:
  1. Will we be able to start taking new backups immediately after rebuilding the "Backups" pool, or would we have to wait until the 36TB of existing data is synced back from "ME4024"?
  2. Would this impact the deduplication capabilities of PBS, or will it handle it like a champ?
  3. If we didn't want to have backup downtime for this entire duration, would it be a safe idea to point PVE to "ME4024" in the backup jobs for now, or could that cause issues as the 'old' data syncs over?
  4. Do you foresee any issues with this plan? All things considered, we have not irreversibly changed anything yet.
We are not too concerned with network congestion or bandwidth usage since both the ISCSI appliance and the PBS server are running on the same subnet through a 20GiB connection. We also aren't too concerned with the working hours required to complete the operation - most of this process will involve a single admin doing other things while monitoring it, after all. All we really care about is ensuring backup integrity with minimal missed backups and retaining access to the historical backups, since we have certain backup standards to keep.

  1. Why not build a second PBS and migrate to it instead? We don't have budget or spare servers laying around. We do have ample new drives, as we keep a certain percentage for emergencies.
  2. Why not keep data on the ISCSI device? This adds points of failure for the backup operations (network stack, SAN device, more cables), which should in theory remain simple and rock-solid. We also are EoL on the storage appliance and will be getting rid of it soon as it doesn't support NFS/CIFS.
  3. Why not wait for the ZFS expansion instead of juggling all of this? The expansion was an attempt to quick-fix things. Since it was showing it would take a couple weeks of expansion time, we figured we would take this opportunity to install larger disks instead.
  4. Why not connect the new disks alongside the original pool for zero downtime? Our appliance only has 10 drive bays (technically 10+2, but the +2 are used for the PBS OS), so we have no space to connect them simultaneously.
  5. Why not create a second datastore with larger drives, migrate data onto them, then replace the smaller drives and expand the pool? This probably would have worked, but seems like it would've taken even longer to complete since we have expansion times on even larger drives now. It still would've left the original questions as to how PBS handles merging the datastores anyways. I suppose it is a valid way to think about the new configuration if it simplifies things, and would likely be preferred for future readers who have the on-board slots open and no SAN for swap space.
  6. How did you add the ISCSI to PBS? We installed open-iscsi and used iscsiadm to establish the connection, then gdisk and mkfs for format as ext4, and lastly mounted it to a directory in /mnt. Then we use "proxmox-backup-manager datastore create" to set it up as a datastore we could pull sync to on the management GUI. I can provide additional information if anyone needs it, though I would always recommend relying on local storage vs ISCSI for a reliable backup appliance.
 
Last edited:
Our PBS instance unexpectedly ran out of storage on it's HDDs after migrating a whole bunch of VMs from vCenter recently. Once the datastore (named "Backup") hit 100%, it pretty quickly decided to not take any more data (as is reasonable). We threw it into Maintenance Mode and added more disks to the appliance before attempting to expand the raidz2, only to discover that this process can take multiple days per disk. As such, we decided that the fastest TTR may be to just offload all of the "Backup" datastore to a temporary ISCSI storage appliance under a new datastore ("ME4024"), delete the "Backup" datastore, and rebuild it new using all new, larger disks (10 total, max the appliance will hold internally). Once the zfs pool is recreated, we want to begin taking backups on the rebuilt 96TB "Backups" datastore again, but had concerns about moving all of our previous backups back to the machine.

We mapped the "ME4024" datastore via ISCSI as that is all the SAN supports. It isn't ideal to us to have our backup server rely on external storage, but this is only a temporary swap space and will suffice for now. The SAN is EoL and will not be permanent. Currently, we have 36TB of data copying to "ME4024" via a sync job. Once this is completed and verified, we are going to upgrade all of the datastore drives local to the PBS server and rebuild a new, larger raidz2 pool from them. This will give us approximately 3x the storage than we currently have. What we are unsure of is:
  1. Will we be able to start taking new backups immediately after rebuilding the "Backups" pool, or would we have to wait until the 36TB of existing data is synced back from "ME4024"?
No, sync jobs only allow to sync backup snapshots newer than the already present snapshot on the target backup group. So if you are backing up to a backup group, which has not been synced back yet, the sync will not get the old backups. This could be avoided by performing the backup to a dedicated temporary namespace instead while the offsite backups are being synced back, then create a local sync job to move the newly created backup snapshot from the temporary namespace and point the backup sources to the combined namespace once that is finished.
  1. Would this impact the deduplication capabilities of PBS, or will it handle it like a champ?
No, this will not affect deduplication as long as the same datastore is used, as the deduplication boundary is at datastore level. Namespaces can share the same chunks.
  1. If we didn't want to have backup downtime for this entire duration, would it be a safe idea to point PVE to "ME4024" in the backup jobs for now, or could that cause issues as the 'old' data syncs over?
This is a viable option as well if the bandwidth and IO performance is acceptable.
  1. Do you foresee any issues with this plan? All things considered, we have not irreversibly changed anything yet.
We are not too concerned with network congestion or bandwidth usage since both the ISCSI appliance and the PBS server are running on the same subnet through a 20GiB connection. We also aren't too concerned with the working hours required to complete the operation - most of this process will involve a single admin doing other things while monitoring it, after all. All we really care about is ensuring backup integrity with minimal missed backups and retaining access to the historical backups, since we have certain backup standards to keep.

  1. Why not build a second PBS and migrate to it instead? We don't have budget or spare servers laying around. We do have ample new drives, as we keep a certain percentage for emergencies.
  2. Why not keep data on the ISCSI device? This adds points of failure for the backup operations (network stack, SAN device, more cables), which should in theory remain simple and rock-solid. We also are EoL on the storage appliance and will be getting rid of it soon as it doesn't support NFS/CIFS.
  3. Why not wait for the ZFS expansion instead of juggling all of this? The expansion was an attempt to quick-fix things. Since it was showing it would take a couple weeks of expansion time, we figured we would take this opportunity to install larger disks instead.
  4. Why not connect the new disks alongside the original pool for zero downtime? Our appliance only has 10 drive bays (technically 10+2, but the +2 are used for the PBS OS), so we have no space to connect them simultaneously.
  5. Why not create a second datastore with larger drives, migrate data onto them, then replace the smaller drives and expand the pool? This probably would have worked, but seems like it would've taken even longer to complete since we have expansion times on even larger drives now. It still would've left the original questions as to how PBS handles merging the datastores anyways. I suppose it is a valid way to think about the new configuration if it simplifies things, and would likely be preferred for future readers who have the on-board slots open and no SAN for swap space.
  6. How did you add the ISCSI to PBS? We installed open-iscsi and used iscsiadm to establish the connection, then gdisk and mkfs for format as ext4, and lastly mounted it to a directory in /mnt. Then we use "proxmox-backup-manager datastore create" to set it up as a datastore we could pull sync to on the management GUI. I can provide additional information if anyone needs it, though I would always recommend relying on local storage vs ISCSI for a reliable backup appliance.
 
Last edited:
  • Like
Reactions: NicJames2378
Will we be able to start taking new backups immediately after rebuilding the "Backups" pool, or would we have to wait until the 36TB of existing data is synced back from "ME4024"?
You will be able to take backups immediately, there will be no real blocking there. But deduplication can be temporarily not yet seen and thus the source PVE might have to back up more data in the first backup run, which then the sync job doesn't have to copy anymore though, so it won't be extra work for your PBS, just a potentially significantly longer backup job on PVE for the first run.
Additionally, syncing will put some extra IO load on the backing storage, so the IO bandwidth capacity for the incoming new backups might be a bit reduced during the sync.
That said, you expand your current backing storage of your main datastore, so there will be not that high of a delta in backups, just the last few days, which also means that for you the effects should not be that big. And you could first do a sync job that really just syncs the most recent snapshots, as with that your delta should get relatively small faster. Without knowing the exact churn and delta rates in addition to performance/bandwidth capacity of all involved parts it's hard to tell for sure though.

Would this impact the deduplication capabilities of PBS, or will it handle it like a champ?
Above should already answer this indirectly, but to state it clearly here too: No, deduplication will not be impacted in the end. The underlying content addressable storage technology from PBS will ensure that the same data chunks will always be deduplicated at any time.

If we didn't want to have backup downtime for this entire duration, would it be a safe idea to point PVE to "ME4024" in the backup jobs for now, or could that cause issues as the 'old' data syncs over?
This might cause some extra work, but it really should not cause any issues. PBS is designed for handling syncs and backups in parallel without issues, and for PBS a sync and a new backup do not differ that much in how the actual backup data is handled.
Do you foresee any issues with this plan? All things considered, we have not irreversibly changed anything yet.
I cannot see any technical issues, as mentioned performance hits might have some impact, but that should be temporary. But one short note below:
We are not too concerned with network congestion or bandwidth usage since both the ISCSI appliance and the PBS server are running on the same subnet through a 20GiB connection.
That makes the sync fast, but how are the PVE systems attached to that PBS? As if sync is really fast and takes up lots of bandwidth and the delta from the main datastore getting full and now being rather significant big, then the new incoming backup jobs might be delayed quite a bit during that time.
 
I appreciate your thorough replies! It sounds like my best course of action will be to:
  1. Once the sync completes, recreate the zpool on the larger drives
  2. Create a new temporary namespace for new backups to go to while old backups are re-synced to the PBS server under the original namespace
    1. This will ensure we get backups back to functioning quickly, at the expense of some IO and network overhead
  3. After we disconnect the temporary ISCSI swap space, put the zpool datastore back in maintenance mode and sync our temporary namespace into the original one to merge backups and backup history for VMs.
  4. Finally, delete the temporary namespace and disable maintenance mode. Ensure backup jobs on all PVE servers are correctly mapped and test a new backup for confirmation.
From my understanding, this should maintain the backup history of all VMs and keep our historical and new backups structured correctly while using the larger storage space. Deduplication will not be thrashed since all backups were synced via PBS's tooling into the same datastore, and dedupe is namespace agnostic. The first initial backup to the new datastore will be substantially larger than usual due to chunks (deltas?) not yet existing, so multi-TB VMs should maybe be postponed temporarily until the re-sync completes to avoid excessive overhead and processing time.