How to split chunks and snapshots off into different storage repos?

Jan 3, 2025
28
6
3
California
I'm aware of the ZFS special VDEV approach but that's not what I'm looking for here. I'm looking for two separate repos: one that handles all the snapshot files and the other that handles all the chunks. This way I'd be able to sync them separately to S3 and do immutable things that current iterations of PBS aren't doing.

Then again, if PBS is close to actually supporting immutable S3 repos I'll spare myself the engineering time to work around the issue.
 
PBS doesn't require ZFS.
PBS snapshots are just a file manifest (=index/catalog/map) of chunks required.
If you want chunks immutable then snapshots should too.
sorry if I miss your point, not English here.
This is nothing to do with the filesystem the backup repo is stored to, this is about splitting the repo into one for indexes/snapshots/etc and the other for the .chunks directory.
 
But you were the first to mention ZFS :-/ and, indeed, it's irrelevant.
I was referencing a PBS wiki page that suggested adding a ZFS special device to host the snapshots, with chunks on spinning metal. That's all.
what is the point ?
In a true DR scenario where we have to assume compromise, restoring directly from an immutable S3 repository is impossible - PBS won't mount the bucket successfully. The obvious solution is to duplicate the immutable bucket to a mutable bucket, then mount and restore from that. What I'm trying to do is reduce the amount of duplication required - the snapshot dir tree isn't very large compared to chunks, so if we can upload two filesystems to two buckets and only duplicate the smaller one at restore time... I think you see where I'm trying to go here.

Really looking forward to seeing finalized support for immutable S3 buckets so I don't need to engineer around it though.
 
argh ... not native English here.
ZFS special device to host the snapshots, with chunks on spinning metal.
is not correct. ZFS special device stores metadata of all files and can store small files.
There is no data in snapshots, they are about MBs , they are not the "diffirential data".
Snapshots are "just" manifest/catalog files of the used data stored in .chunks

I got your point about PBS doesn't like a "read only datastore" ( S3 or not ).
I'm not S3 user, but can probably be worked around with a "write overlay" , perhaps there are already topics about it.

EDIT: But in first place, I missing the point, how the backup is done if datastore is read only ?! What is your workflow ?
 
Last edited:
  • Like
Reactions: Johannes S and UdoB
The term "snapshot" has multiple and technically different meanings, like four or five or so. "Somebody" should write an FAQ article we could point to...
 
Last edited:
  • Like
Reactions: mikeely
EDIT: But in first place, I missing the point, how the backup is done if datastore is read only ?! What is your workflow ?
Workflow is pretty simple: backup to local -> offload to s3 -> (outside PBS) sync s3 mutable bucket to s3 immutable bucket

Restore is similar: sync s3 immutable bucket to s3 mutable bucket -> attach PBS to that bucket -> restore

What I'm trying to do is minimize the downtime and cost of the sync step during restore.
The term "snapshot" has multiple and technically different meanings, like four or five or so. "Somebody" should write an FAQ article we could point to...
Fair point. In this case, I'm using "snapshot" to reference the directory tree that stores all the metadata for the backups, i.e.:
/backup/ns/{namespace/vm/1000 and so on
With five sites, our entire /backup/ns directory is only 505M, with the .chunks directory being orders of magnitude larger.