Client side deduplication?

syssifus

New Member
Apr 25, 2024
2
0
1
Hey there,
I've recently set up PBS to serve as a central backup server in my network, but I've run into quite an inconvenience: Whenever I make a backup of e.g. my home directory, it's taking quite a while since it has to transfer like 200+ gigs over the network. With about 120MBps (seemingly the HDD's upper limit) this takes maybe a little less than 20 minutes (Edit: It took 30 minutes), but compared to backups to a hard drive in this computer it's quite horrendously slow. This seems to stem from the fact that it always copies all files to the backup server and then does the deduplication there.
Now I'm wondering: Is there a way to do some kind of client side deduplication before sending the files off to the server so that only the changed files are sent over? Doing backups this way is really pretty painful, even though PBS is almost the perfect solution for my use case.
Thanks!
 
Last edited:
This seems to stem from the fact that it always copies all files to the backup server and then does the deduplication there.
No. It shouldn't copy all that data. But it has to read and hash all that data each time to then decide that nothing (or at least only data that changed) needs to be uploaded. Google for "dirty-bitmapping". Dirty-bitmapping only works for QEMU block devices right now and only if you don't restart a VM or server between your incremental backups.
 
Thanks for the info! While I'm running PBS in a VM, I'm using ZFS so mirror two drives, so no dirty-bitmapping (very interesting that)
Maybe the reason why my previous testing didn't yield any great result was because I was testing the initial setup with files taking up too little space overall.

Edit: Did another backup to check this theory, but while it was cut down to just a third of the initial backup time, 10 minutes is still quite a lot for 40 megs of apparent change. I guess the whole back and fourth request thing is a downside of this system then. Oh well. I think this is solved then.
 
Last edited:
I'm using ZFS so mirror two drives, so no dirty-bitmapping
wrong, dirty bitmap is from qemu process (edit: only works with VM, no with LXC) , it works with all formats including zfs, lvmthin, qcow2...
it allow skipping reread all source data.
only changed blocks data will be sent to PBS. it works only when VM is running, not shutdown or reboot.
 
Last edited:
wrong, dirty bitmap is from qemu process, it works with all formats including zfs, lvmthin, qcow2...
it allow skipping reread all source data.
only changed blocks data will be sent to PBS. it works only when VM is running, not shutdown or reboot.
Yes, so basically for all VMs but not LXCs or backups via the proxnox-backup-client.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!