Is dedup per server or per datastore ??

ozdjh

Renowned Member
Oct 8, 2019
117
28
68
Hi

We're evaluating PBS and for our-use case I created 2 datastores on the same underlying filesystem. The reason for the 2 datastores is that some of the backups are to be sync'ed offsite, but not all of them. So, 1 datastore for replicated backups and the other datastore for local-only backups. That makes the sync task very easy to understand and configure.

But, I see on the filesystem that the chunks directory is relative to the datastore. So, does that imply that the dedup is relative to the datastore each chunk is written to? For example, if the same "chunk" was written to both datastores would it be dedup'ed or would it be written twice, once to each datastore?

And if it's the latter, what's the best configuration for having a subset of backups sync'ed to an external PBS instance? Would that be a use-case for namespaces?


Thanks
David
...
 
Hi,

yes deduplication is only per datastore (in the .chunks folder as you rightly identified), that's the reason why namespaces were introduces, to have a semantically separated place for backups, but still share the deduplication
 
Another thing to consider is if the clients encrypt their backups. Then the encryption key is another separation. If two different encryption keys would generate the same chunk, it wouldn't be a good encryption. ;)