Amazon S3 Glacier

Computerizer

New Member
Aug 6, 2025
1
0
1
I'm interested in trying out the new S3 datastore options for a secondary off-site backup. Since retrieval of backups would be very rare for this case (only if my primary and backup are both lost), I think using S3 Glacier to archive the data would be cost effective. Does it make sense to use this with PBS, though? If so, how would I implement it? Presumably I couldn't have all the objects sent to Glacier (e.g. via a Lifecycle rule) because some objects are needed for metadata.

Any suggestions or experiences yet?
 
Moving parts of the objects to cold storage will most likely cause you issues, at least for restores were you need to re-fetch the objects. Backup or sync jobs should be different, as already known chunk do not need to be uploaded again. But this is not considered for the current tech preview. Please do open an issue on https://bugzilla.proxmox.com, referencing this thread, so this can be further evaluated.

What should be doable with the current state is archive a whole datastore with its contents in case you don't actively use it anymore, but want to keep it around. Then you can simply move the whole datastore to cold storage, and restore it to hot storage when needed again. But this is provider specific and not managed by PBS.
 
  • Like
Reactions: Johannes S
Moving parts of the objects to cold storage will most likely cause you issues, at least for restores were you need to re-fetch the objects. Backup or sync jobs should be different, as already known chunk do not need to be uploaded again. But this is not considered for the current tech preview. Please do open an issue on https://bugzilla.proxmox.com, referencing this thread, so this can be further evaluated.

What should be doable with the current state is archive a whole datastore with its contents in case you don't actively use it anymore, but want to keep it around. Then you can simply move the whole datastore to cold storage, and restore it to hot storage when needed again. But this is provider specific and not managed by PBS.
So, for now, we should stick with Instant retrieval?
How does S3 interact with Backup Verification?
Are the backups still verified on a regular basis or is that left to the object storage provider?
 
So, for now, we should stick with Instant retrieval?
Yes, as mentioned above, moving the contents to different storage classes is not really possible because of deduplication. You might get away with it as long as you only upload new contents (not tested) but on restore you will get errors for sure, unless all the chunk data is moved to the corresponding storage class again.
How does S3 interact with Backup Verification?
Are the backups still verified on a regular basis or is that left to the object storage provider?
Data upload to the S3 backend is verified via checksumming, so the provider is then responsible for data integrity, but basic checks are also performed on retrieval.

The provider can however not guarantee logical backup snapshot consistency, e.g. are all the chunks referenced by a snapshot still present. Currently it is possible to perform a verification just like for regular backup snapshots, which will however fetch and check all the data, so will be very time and cost intensive. This might be improved upon by introducing a more light weight verification, leaving the data integrity to the storage provider, only performing logical consistency checks, as already proposed in https://bugzilla.proxmox.com/show_bug.cgi?id=4594

Garbage collection already performs something similar, so for the time being successful garbage collection is an indicator that the expected chunks are present on the s3 backend.
 
  • Like
Reactions: Johannes S
The provider can however not guarantee logical backup snapshot consistency, e.g. are all the chunks referenced by a snapshot still present. Currently it is possible to perform a verification just like for regular backup snapshots, which will however fetch and check all the data, so will be very time and cost intensive. This might be improved upon by introducing a more light weight verification, leaving the data integrity to the storage provider, only performing logical consistency checks, as already proposed in https://bugzilla.proxmox.com/show_bug.cgi?id=4594

Another option might be to introduce a configuration parameter which allows to read only a subset of all data. restic check has three modes:
  • Normal check, this is basically like the proposed lightweight verify for PBS ( consistency etc)
  • read-data Read all data to verify it's restorable
  • read-data-subset. This allows to check a subset be it a random subset of all data (in % ) or a subset n/t where n is the number of the choosen subset and t is the amount of subsets the data should be split into
The idea is basically that (to save cost and time) you create a cronjob or systemd timer to check a small subset every day, but at some point you would have actually check and verified all data. E.G. With n/7 after seven days you have a high propability to know that propably all your data is in fine shape. I usually have daily subset checks and from time to time I do a complete restore check.
Maybe this is a worthy approach looking into?

General doc:
https://restic.readthedocs.io/en/la...repos.html#checking-integrity-and-consistency
Restic Forum with a good example for the subset stuff: https://forum.restic.net/t/restic-check-read-data-read-data-subset-1-12/2549
 
  • Like
Reactions: Chris
Thanks for the pointer, will take a closer look on what may make sense for the PBS S3 implementation!
 
  • Like
Reactions: Johannes S