Hi, new here and didn't find this thread before posting my own. (
https://forum.proxmox.com/threads/lets-talk-deduplication-and-immutability.182803/ )
But id like to explain how I have managed to use AWS S3 with Object lock to achieve immutable deduplicated backups with a couple of additional tools.
There are only 2 issues I can see with the current S3 implementation.
1. old chunks from earlier backup have their object lock expire and can be deleted even if still required in newer backups.
2. If a bad actor deletes your chunks or meta data files for a backup either via compromised S3 account or a compromised PBS these chunks are marked for deletion via a delete marker file in the S3 bucket.
If PBS addressed these 2 issues alone it would be a fully deduplicated Immutable backup solution.
Here's what I am currently using to achieve a working solution.
1. to solve the need for a rolling object lock on old chunks I looked to this blog post (
https://aws.amazon.com/blogs/storag...ding-amazon-s3-object-lock-retention-periods/ ) which described an amazon cloud formation stack flow template that does just that - takes any currently active object locked file and rolls its object lock expiry date into the future by what ever days you configure it to.
2. to "undelete" chunks and meta data files that have been deleted either accidentally or maliciously a simple script can be run to find and delete all delete markers which will simply make current any object locked but "deleted" files.
these two mitigation steps combined with some lifecycle rules on the bucket to clean up files marked for deletion after their object lock has expired leaves you with a working immutable / deduplicated system to S3 storage (all be it amazon only due to the stack flow Im using but an equivalent script could be made for other providers)
Now if the PBS could perform these 2 functions its self the S3 storage would be useable without these add ons.
1. The rolling object lock feature could be added in various places.
1.1. during deduplication at the point a chunks is identified as a duplicate and therefore not uploaded to S3 the object lock could be renewed (this option would incur a potential large number of duplicate updates for common chunks)
1.2. during garbage collection when unwanted chunks are getting marked for deletion, wanted chunks could have their object lock extended (this is more efficient but I guess not strictly garbage collection)
1.3 during verification object locked chunks could be updated and any deleted chunks needed for that backup could also be undeleted (delete marker removed) (this is perhaps the best option as solves the second issue too and could be done monthly to minimize s3 transaction costs).
Anyway as im new here im happy to be told i am barking up the wrong tree or please do chime in if you feel my current S3 backup add ons arnt going to achieve what i think they are.