PBS: large amount of host files (ZFS backend, CIFS mounted)

Zamana · Jan 31, 2023

Hello!

I'm trying to use PBS to backup my host files, but there is something that I can't understand. Let me explain...

On my ZFS pools, I have a dataset with 3TB of media files. This dataset is mounted in a Debian VM through CIFS. I wrote a script to backup this "folder" to PBS using proxmox-backup-client and everything ran fine. The first backup took 10 hours to complete on my local network. No problem.

On my second run, I thought (wrongly, apparently...) that PBS would backup incrementally, only the difference from the first to the second run, but that wasn't the case. The backup took the same 10 hours again.

So, my question is: that incremental feature is only for CTs/VMs? Host files does not benefit from this feature? Or am I doing something wrong?

Thanks.

hvisage · Feb 1, 2023

There are two types of incremental backups:
1. "Dirty" block level incremental
2. "tar"/streaming level incremental

1. is done with QEMU block devices, that have kept a changed/dirty blocklist since the last backup. The rules are the QEMU VM should not have restarted, shutdown-started, nor migrated between the two backups for the dirty blocklist to be valid and usable. THAT is blindingly fast on small incremental backups.
If it can't do it with a dirty block list, it'll do it like 2:
2. The data is going through all the initial backup motions, ie. the block devices are read IN FULL, and the files are TARred up in the same manner, just now after every compression of a backup block, it checks with the server if there exist such a block with this SHA256(?) hash and if there does, it doesn't send that block... thus deduplication is now "handled" by the server
This unfortunately is not the best solution for LXCs (especially big ones based on numbers for small files getting updated) and proxmox-backup-client backups streaming files, as you will have to do the whole file read process nearly every backup
Yes, I have similar "troubles" with LXCs having huge size to backup

fabian · Feb 1, 2023

not 100% correct, but close enough:
- for VMs, there exists a special short cut as long as the VM hasn't been stopped since the last backup, it will neither read nor upload unchanged chunks ("fast" incremental mode, client-side deduplication)
- for every backup, if a previous snapshot exists, the client will download its indices and after reading & chunking data, skip uploading those chunks that were already contained in the previous snapshot ("regular" incremental, client-side deduplication)
- the server will store each chunk only once - if a client uploads a chunk that already exists, it will be re-used (server-side deduplication)

so for your use case, the bottle neck seems to be read speed (if both the first and subsequent backups take roughly the same time). if you look at the logs, they should show that only changed parts (+some overhead, deduplication is not on the file-level but on chunks that might contain multiple files) were uploaded.

Search

Search

PBS: large amount of host files (ZFS backend, CIFS mounted)

Zamana

Renowned Member

hvisage

Renowned Member

fabian

Proxmox Staff Member