Restore individual files from S3 backup

gerco

Member
Sep 24, 2021
3
3
23
46
I'm trying out the new S3 backed datastore and realized that it doesn't seem to be possible to restore individual files from a backup like it can be done with container backups on a local disk. It's also not possibe to download the .didx file, which is possibly the cause of this unavailable feature?

I would like to be able to download individual files, especially from an S3 backup since restoring the full thing will take a lot of time (download) and cost money (egress traffic, requests). For a remote datastore like S3, being able to restore a small part of it seems like an important feature.

Might it be possible to keep enough information locally to enable this option? The didx file (if that would help) isn't very big and is certainly a lot smaller than the cache usage for the S3 datastore. At the moment, the S3 cache is the same size as the data on the bucket, is there a way to limit that size and keep most of the data only on S3?
 
I'm trying out the new S3 backed datastore and realized that it doesn't seem to be possible to restore individual files from a backup like it can be done with container backups on a local disk. It's also not possibe to download the .didx file, which is possibly the cause of this unavailable feature?
Just for clarification, it is possible to do single file restore for container/host backups to an s3 backed datastore, just like for regular datastores. I suppose what you mean is that you do not have a single file for a backup snapshot which you could upload/transfer to some other storage.

I would like to be able to download individual files, especially from an S3 backup since restoring the full thing will take a lot of time (download) and cost money (egress traffic, requests). For a remote datastore like S3, being able to restore a small part of it seems like an important feature.

You already can restore an individual snapshot from an S3 backed datastore without having to re-download all the chunk. On restore, only the chunks which are required for that snapshot and not already present in the local datastore cache are being downloaded. If you e.g. setup a new PBS and connect it with a pre-existing S3 store, an S3 refresh will only fetch the metadata, not the chunk data.

What is however not possible is to create a fully self contained backup as you would obtain with the PVE build-in backup tooling. This would require to either frist fully restore and then backup again, or have some custom scripts which would create these using the proxmox-backup-client for restore.

Might it be possible to keep enough information locally to enable this option? The didx file (if that would help) isn't very big and is certainly a lot smaller than the cache usage for the S3 datastore. At the moment, the S3 cache is the same size as the data on the bucket, is there a way to limit that size and keep most of the data only on S3?
The cache is using as much space as the underlying filesystem allows it to use, therefore either a dedicated filesystem or quotas are recommended, see https://pbs.proxmox.com/docs/storage.html#datastores-with-s3-backend

The index files are always kept on both, the local cache and the s3 backend, data chunks on the other hand will get evicted from the cache if there are no more free cache slots available.