Our PBS instance unexpectedly ran out of storage on it's HDDs after migrating a whole bunch of VMs from vCenter recently. Once the datastore (named "Backup") hit 100%, it pretty quickly decided to not take any more data (as is reasonable). We threw it into Maintenance Mode and added more disks to the appliance before attempting to expand the raidz2, only to discover that this process can take multiple days per disk. As such, we decided that the fastest TTR may be to just offload all of the "Backup" datastore to a temporary ISCSI storage appliance under a new datastore ("ME4024"), delete the "Backup" datastore, and rebuild it new using all new, larger disks (10 total, max the appliance will hold internally). Once the zfs pool is recreated, we want to begin taking backups on the rebuilt 96TB "Backups" datastore again, but had concerns about moving all of our previous backups back to the machine.
We mapped the "ME4024" datastore via ISCSI as that is all the SAN supports. It isn't ideal to us to have our backup server rely on external storage, but this is only a temporary swap space and will suffice for now. The SAN is EoL and will not be permanent. Currently, we have 36TB of data copying to "ME4024" via a sync job. Once this is completed and verified, we are going to upgrade all of the datastore drives local to the PBS server and rebuild a new, larger raidz2 pool from them. This will give us approximately 3x the storage than we currently have. What we are unsure of is:
We mapped the "ME4024" datastore via ISCSI as that is all the SAN supports. It isn't ideal to us to have our backup server rely on external storage, but this is only a temporary swap space and will suffice for now. The SAN is EoL and will not be permanent. Currently, we have 36TB of data copying to "ME4024" via a sync job. Once this is completed and verified, we are going to upgrade all of the datastore drives local to the PBS server and rebuild a new, larger raidz2 pool from them. This will give us approximately 3x the storage than we currently have. What we are unsure of is:
- Will we be able to start taking new backups immediately after rebuilding the "Backups" pool, or would we have to wait until the 36TB of existing data is synced back from "ME4024"?
- Would this impact the deduplication capabilities of PBS, or will it handle it like a champ?
- If we didn't want to have backup downtime for this entire duration, would it be a safe idea to point PVE to "ME4024" in the backup jobs for now, or could that cause issues as the 'old' data syncs over?
- Do you foresee any issues with this plan? All things considered, we have not irreversibly changed anything yet.
- Why not build a second PBS and migrate to it instead? We don't have budget or spare servers laying around. We do have ample new drives, as we keep a certain percentage for emergencies.
- Why not keep data on the ISCSI device? This adds points of failure for the backup operations (network stack, SAN device, more cables), which should in theory remain simple and rock-solid. We also are EoL on the storage appliance and will be getting rid of it soon as it doesn't support NFS/CIFS.
- Why not wait for the ZFS expansion instead of juggling all of this? The expansion was an attempt to quick-fix things. Since it was showing it would take a couple weeks of expansion time, we figured we would take this opportunity to install larger disks instead.
- Why not connect the new disks alongside the original pool for zero downtime? Our appliance only has 10 drive bays (technically 10+2, but the +2 are used for the PBS OS), so we have no space to connect them simultaneously.
- Why not create a second datastore with larger drives, migrate data onto them, then replace the smaller drives and expand the pool? This probably would have worked, but seems like it would've taken even longer to complete since we have expansion times on even larger drives now. It still would've left the original questions as to how PBS handles merging the datastores anyways. I suppose it is a valid way to think about the new configuration if it simplifies things, and would likely be preferred for future readers who have the on-board slots open and no SAN for swap space.
- How did you add the ISCSI to PBS? We installed open-iscsi and used iscsiadm to establish the connection, then gdisk and mkfs for format as ext4, and lastly mounted it to a directory in /mnt. Then we use "proxmox-backup-manager datastore create" to set it up as a datastore we could pull sync to on the management GUI. I can provide additional information if anyone needs it, though I would always recommend relying on local storage vs ISCSI for a reliable backup appliance.
Last edited: