Saving partial backup or don't upload existing chunks

jan.pekar

New Member
Oct 5, 2022
9
0
1
Hello Everyone,

I'm trying to use proxmox-backup-client to upload approximately 1TB of the file data to PBS over slow/unreliable WAN connection (50Mbit) and upload is always interrupted after 20 hours of uploading (provider issue?).

Problem is, that repeating of upload doesn't help me, because I noticed, that manifest file is never saved when the upload is interrupted

Error downloading .didx from previous manifest: Unable to open dynamic index "......pxar.didx" - No such file or directory (os error 2)

I also noticed in log, that already uploaded chunks to the server are uploaded again and again so I can never get one completed backup because it is never finished.

When I was using backuppc as a backup solution, rsync client was not uploading chunks existing on the server, but it looks like PBS is not checking that before upload and after it receives already existing chunks, it discards them.

Not uploading existing chunks or saving partial backup can help me with that problem and make PBS filesystem backup more usable (bandwidth efficient).
Any suggestions?

Thank you
Jan Pekar
 
the way skipping on the client side works is that the previous snapshot is locked, and that index is cross-referenced with the generated chunks to see which chunks don't need to be uploaded.

you can try to improve the behaviour by doing "increasing" backup attempts (so that you get an index for the chunks that you uploaded, with hopefully most of the chunks being re-used in the next upload). for example, for the first backup exclude all but the first top-level directory, and then if that succeeds remove the second top-level directory from the exclude list for the second run, and so on. I am not sure which strategy will lead to the most "re-use" (for example, it might be better to skip specific parts first and not based on the top level of the hierarchy), it likely also depends on the structure of the file system you want to backup.
 
Thank you. That should work, but it involves user intervention and care. I was looking for some more automatic solution like option for keeping incomplete backups so some index file exists and chunks from that backup are reused.
You can consider it as a feature request that can improve the first massive upload.
 
that's not really feasible at the moment, as we don't want to treat such incomplete backups as backups in general (that would be rather dangerous) and old clients would have no way of differentiating.

you could manually create a snapshot with an index file + manifest that references *all* chunks that currently exist in your chunk store, but that requires some knowledge about how those files are structured (and obviously, such a snapshot would be complete bogus and should be deleted after it has served its purpose!)
 
If there is some documentation, I can try. I'm using encrypted backup and I expect, that the file must be encrypted - so I need another documentation, how to do that.

Now I'm trying to upload partial backups using exclude, like you proposed.
Thank you
 
I have few notes regarding this problem, correct me if I'm wrong

You cannot mix 2 different backups on the same host. When you run different backup (with different .pxar name) on the same host, it looks for manifest in previous backup and cannot find it = uploads all again. So hourly backup of some dir, and daily backup of other dir (or daily/weekly) is not possible without uploading all data again when name.pxar of previous backup changes.

When you add more .pxar names in one backup, manifest is uploaded after the whole command is finished. So you can finish one name.pxar but when othername.pxar fails during upload, backup is considered as incomplete and no manifest/catalog is created.

I understand, that save last incomplete backup is not good solution. Backup can also fail at the beginning so manifest will contain only few files/chunks and you are at the beginning again.

I think that solution is, that server can respond on some PROPFIND .../chunk request and backup-client can check that server "knows" the chunk, and don't POST it again and save the bandwidth. It can be also solved by downloading list of all chunks stored on the server (list generated by GC?) when backup is starting, but that file can be large (when datastore is shared) so that is not optimal in some situations as PROPFIND I mentioned above.
 
I have few notes regarding this problem, correct me if I'm wrong

You cannot mix 2 different backups on the same host. When you run different backup (with different .pxar name) on the same host, it looks for manifest in previous backup and cannot find it = uploads all again. So hourly backup of some dir, and daily backup of other dir (or daily/weekly) is not possible without uploading all data again when name.pxar of previous backup changes.

yes and no, a snapshot consists of three parts: type (vm/ct/host), id (VMID/CTID/hostname/something of your choice) and timestamp
within that snapshot you have blobs (uploaded directly, like the config files or log file) and indices (fixed or dynamic) which refer to chunks.

the re-use previous manifest works based on the type+id pair. then from that manifest indices of the same name are re-used (so if your previous snapshot has foobar.pxar.didx, and the current backup has one as well, the previous one will be downloaded and all common chunks won't be uploaded).

When you add more .pxar names in one backup, manifest is uploaded after the whole command is finished. So you can finish one name.pxar but when othername.pxar fails during upload, backup is considered as incomplete and no manifest/catalog is created.
yes, the snapshot consists of all the files, a partial one is incomplete and will not be considered finished.
I understand, that save last incomplete backup is not good solution. Backup can also fail at the beginning so manifest will contain only few files/chunks and you are at the beginning again.

I think that solution is, that server can respond on some PROPFIND .../chunk request and backup-client can check that server "knows" the chunk, and don't POST it again and save the bandwidth. It can be also solved by downloading list of all chunks stored on the server (list generated by GC?) when backup is starting, but that file can be large (when datastore is shared) so that is not optimal in some situations as PROPFIND I mentioned above.
the issue with that is locking - if you want to re-use chunks you have to ensure they don't disappear while you are doing a backup. locking each chunk does not scale (there are too many of them). so we lock the whole previous snapshot (preventing removal of it and its indices, which in turn prevents removal of the chunks referenced within them).

you can still do the approach I mentioned (it doesn't matter whether you have multiple pxar files or a single one).

If there is some documentation, I can try. I'm using encrypted backup and I expect, that the file must be encrypted - so I need another documentation, how to do that.

IIRC the only difference there is that you need to sign the manifest at the end. but there are no docs for that, you need to look at the source code..

one alternative if you have the space is to setup a local PBS, do the backup there, and then either sync over the slow link (sync works differently and can re-use partial results as long as you don't run a GC) or via sneakernet to the target PBS.
 
the issue with that is locking - if you want to re-use chunks you have to ensure they don't disappear while you are doing a backup. locking each chunk does not scale (there are too many of them). so we lock the whole previous snapshot (preventing removal of it and its indices, which in turn prevents removal of the chunks referenced within them).
You can lock chunk the same way, like the new uploaded chunk is protected. How you protect chunks when there is no previous snapshot (first upload)?
You can modify creation/access time (ie touch it) - GC should than avoid cleaning it.

IIRC the only difference there is that you need to sign the manifest at the end. but there are no docs for that, you need to look at the source code..

one alternative if you have the space is to setup a local PBS, do the backup there, and then either sync over the slow link (sync works differently and can re-use partial results as long as you don't run a GC) or via sneakernet to the target PBS.
I was considering that - create local PBS, but than I need additional storage and that is not global solution to all use-cases I need to cover (backup single NAS at remote location etc).
 
You can lock chunk the same way, like the new uploaded chunk is protected. How you protect chunks when there is no previous snapshot (first upload)?
You can modify creation/access time (ie touch it) - GC should than avoid cleaning it.
newly uploaded chunks are protected by PBS tracking the active writers and using the oldest active writers' start timestamp as cutoff for GC. so while a backup is on-going (and the chunk not yet properly referenced in an index) the chunk is protected by the cut-off. if the backup has finished (and the cutoff moved) all chunks must be referenced, so the GC will not remove them.

but we cannot use the same mechanism to protect arbitrary chunks (for example by providing a new method in the backup protocol to say "I already have these chunks X, Y and Z but won't upload them"), because a client is not allowed to re-use arbitrary chunks, but only those were the server can verify that this specific client has access. the way we do that is .... by re-using the previous snapshot ;) all the chunks referenced by the previous snapshot are added to a "known chunks list", and this list + any fully uploaded chunks are what the client is allowed to add to the indices. the same restriction is applied to reading backup snapshots - the server will only allow downloading chunks which are referenced by the snapshot.

if we were to remove this restriction, a client could do all sorts of shenanigans:
- create a backup referencing all possible chunks (preventing GC!)
- create a backup referencing arbitrary chunks, hoping to obtain (via restore) sensitive data uploaded by other clients (like SSH keys, /etc/shadow contents, ...)

so this is an obvious no go from a security perspective.
 
Thank you for the explanation, now I understand the problem. I don't care about referencing "unknown" chunks, because I have separated datastores for different customers and using different encryption keys. But I understand that there is a use-case where there is no encryption and can be a fraud backup client, that can try to steal some data chunks.

So the only way how to improve behavior is "partial backup" save, which is saved in a different way than regular complete backup (different backup folder name) so the old client will not try to use it as a full backup, but the new backup client can reference it when this incomplete backup is the newest what can reference. Also, incomplete backup should be saved only when the previous incomplete backup is smaller (fewer chunks) than the current incomplete backup to prevent discarding a better backup.

Now I will focus on manually excluding files and growing my backup in blocks, that can upload in one backup execution.
Thank you again
Jan Pekar
 
some sort of checkpoint feature could be implemented - where the client does write out the indices and a checkpoint manifest every so often, and the "re-use" chunks logic could then work using the last real snapshot and the last temporary checkpoint, with the checkpoint being cleared once the next snapshot is able to be finalized (so there's always only one checkpoint that needs to be kept, after the last finished proper snapshot).

but we then need to take care that this
- is entirely opt-in (doesn't break older clients either when doing a backup, or when accessing datastore contents)
- doesn't interfere with pruning/GC/tape/sync/...
- allows clearing of no-longer-needed checkpoints to prevent them from taking up space forever

so all in all a not so trivial feature solving a single issue: backups with a big delta over unstable connections

with an existing workaround of having a (semi)-local PBS with a stable connection and syncing over the unstable one.

so all in all, do-able, but not trivial, and not the highest on our list of todos.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!