[SOLVED] No de-duplication when archive differs from previous

pssilversp

New Member
Jan 20, 2023
2
1
3
I recently installed PBS and am having problems getting the expected de-duplication when creating .pxar archives. See test output below. PBS only reliably de-duplicates when the archive name and path is exactly the same as the previous one.

I installed the server (no-subscription repo) on top of a bare metal Debian 11. I've tried to run backups from the client installed on various hosts running Ubuntu 20.04, as well as from the same host running the server. These tests, with inline comments, are from the client running on the same machine as the server.

Any ideas why it's not de-duplicating? The "Unable to open dynamic index" error seems to occurs whenever the .pxar name doesn't match the immediately previous backup.

paul@spare:~$ proxmox-backup-client version client version: 2.3.2 server version: 2.3.2 # First backup of source path #1. # Observe: Behaves normally (uncompressed). paul@spare:~$ proxmox-backup-client backup rsync-test.pxar:/srv/rsync/test Starting backup: host/spare/2023-01-20T20:52:45Z Client name: spare Starting backup protocol: Fri Jan 20 15:52:45 2023 fingerprint: ea:ab:04:e0:76:39:19:b4:2b:58:91:fd:a1:72:95:45:44:cf:39:96:af:d5:ea:3e:3b:2b:dd:1d:6c:b9:66:15 Are you sure you want to continue connecting? (y/n): y No previous manifest available. Upload directory '/srv/rsync/test' to 'paul@pam!spare@spare:8007:active' as rsync-test.pxar.didx rsync-test.pxar: had to backup 115.072 MiB of 115.072 MiB (compressed 21.267 MiB) in 1.10s rsync-test.pxar: average backup speed: 104.403 MiB/s Uploaded backup catalog (2.18 KiB) Duration: 5.39s End Time: Fri Jan 20 15:52:51 2023 # Second backup of source path #1 with zero file changes since first backup. # Observe: Re-uses previous chunks as expected, 100% compression paul@spare:~$ proxmox-backup-client backup rsync-test.pxar:/srv/rsync/test Starting backup: host/spare/2023-01-20T20:53:03Z Client name: spare Starting backup protocol: Fri Jan 20 15:53:03 2023 Downloading previous manifest (Fri Jan 20 15:52:45 2023) Upload directory '/srv/rsync/test' to 'paul@pam!spare@spare:8007:active' as rsync-test.pxar.didx rsync-test.pxar: had to backup 0 B of 115.072 MiB (compressed 0 B) in 0.47s rsync-test.pxar: average backup speed: 0 B/s rsync-test.pxar: backup was done incrementally, reused 115.072 MiB (100.0%) Uploaded backup catalog (2.18 KiB) Duration: 0.72s End Time: Fri Jan 20 15:53:03 202 # Create a new backup: different .pxar name, different source path (#2). # Observe: Uncompressed, as expected. paul@spare:~$ proxmox-backup-client backup rsync-test.pxar:/srv/rsync/test2 Starting backup: host/spare/2023-01-20T20:54:37Z Client name: spare Starting backup protocol: Fri Jan 20 15:54:37 2023 Downloading previous manifest (Fri Jan 20 15:53:03 2023) Upload directory '/srv/rsync/test2' to 'paul@pam!spare@spare:8007:active' as rsync-test.pxar.didx rsync-test.pxar: had to backup 442.023 MiB of 442.023 MiB (compressed 65.686 MiB) in 3.96s rsync-test.pxar: average backup speed: 111.756 MiB/s Uploaded backup catalog (318 B) Duration: 5.19s End Time: Fri Jan 20 15:54:43 2023 # Back up original source path (#1) a third time. Observe: no compression. # Observe: Uncompressed, not reporting any compression. paul@spare:~$ proxmox-backup-client backup rsync-test.pxar:/srv/rsync/test Starting backup: host/spare/2023-01-20T20:55:15Z Client name: spare Starting backup protocol: Fri Jan 20 15:55:15 2023 Downloading previous manifest (Fri Jan 20 15:54:37 2023) Upload directory '/srv/rsync/test' to 'paul@pam!spare@spare:8007:active' as rsync-test.pxar.didx rsync-test.pxar: had to backup 115.072 MiB of 115.072 MiB (compressed 21.267 MiB) in 1.09s rsync-test.pxar: average backup speed: 106.002 MiB/s Uploaded backup catalog (2.18 KiB) Duration: 1.34s End Time: Fri Jan 20 15:55:16 2023 # Back up original source path (#1) a fourth time, under a new archive (.pxar) name. # Observe: Uncompress. Also reports error reading .didx from previous manifest, although this is the first time I have used this archive name. paul@spare:~$ proxmox-backup-client backup rsync-test-NEWNAME.pxar:/srv/rsync/test Starting backup: host/spare/2023-01-20T20:57:46Z Client name: spare Starting backup protocol: Fri Jan 20 15:57:46 2023 Downloading previous manifest (Fri Jan 20 15:55:15 2023) Upload directory '/srv/rsync/test' to 'paul@pam!spare@spare:8007:active' as rsync-test-NEWNAME.pxar.didx Error downloading .didx from previous manifest: Unable to open dynamic index "/mnt/pbs1/active/host/spare/2023-01-20T20:55:15Z/rsync-test-NEWNAME.pxar.didx" - No such file or directory (os error 2) rsync-test-NEWNAME.pxar: had to backup 115.072 MiB of 115.072 MiB (compressed 21.267 MiB) in 1.06s rsync-test-NEWNAME.pxar: average backup speed: 108.171 MiB/s Uploaded backup catalog (2.188 KiB) Duration: 1.37s End Time: Fri Jan 20 15:57:47 2023 # Back up original source path (#1) a fifth time, under its original .pxarc name. # Observe: Uncompressed. And seems to expect the previous backup to have the same .pxar name. paul@spare:~$ proxmox-backup-client backup rsync-test.pxar:/srv/rsync/test Starting backup: host/spare/2023-01-20T21:26:14Z Client name: spare Starting backup protocol: Fri Jan 20 16:26:14 2023 Downloading previous manifest (Fri Jan 20 15:57:46 2023) Upload directory '/srv/rsync/test' to 'paul@pam!spare@spare:8007:active' as rsync-test.pxar.didx Error downloading .didx from previous manifest: Unable to open dynamic index "/mnt/pbs1/active/host/spare/2023-01-20T20:57:46Z/rsync-test.pxar.didx" - No such file or directory (os error 2) rsync-test.pxar: had to backup 115.072 MiB of 115.072 MiB (compressed 21.267 MiB) in 1.09s rsync-test.pxar: average backup speed: 106.039 MiB/s Uploaded backup catalog (2.18 KiB) Duration: 1.42s End Time: Fri Jan 20 16:26:15 2023
 
Did you check the server logs for those backups? I think what you're seeing is the client trying (and failing, just like borg-backup does sometimes) to optimize (only new/changed files) what it sends to the server, not what the server is actually doing with that data.
 
  • Like
Reactions: pssilversp
deduplication happens on multiple levels, like @mow indicated:
- the client will fetch the previous/last snapshot of the same backup group
- if it exists, it will fetch the index for each archive you want to backup
- any chunks that are part of the previous snapshots index and the current generated archive are not uploaded (client-side deduplication)

- the server will discard uploaded chunks that are already available in the datastore (server-side deduplication)

for VMs, there's an additional layer since a running VM can keep track of which blocks are changed since the last backup, and can thus even save reading those and generating the corresponding chunks, instead of just skipping the upload.

for your test case:
Code:
# Create a new backup: different .pxar name, different source path (#2).

you didn't actually change the archive name ;) but yeah, if you want client-side deduplication you need to
- not mix different source paths (e.g., backup /a in one snapshot, then /b in the next)
- use the same archive names

if you want to have different backup chains for different paths (e.g., backups of /a and backups of /b, without combining them into single backup snapshots covering both), you can specify --backup-id to override the default (local hostname). then each chain should be able to properly reference the previous snapshot within that chain, and deduplicate accordingly.
 
deduplication happens on multiple levels, like @mow indicated:
- the client will fetch the previous/last snapshot of the same backup group
- if it exists, it will fetch the index for each archive you want to backup
- any chunks that are part of the previous snapshots index and the current generated archive are not uploaded (client-side deduplication)

- the server will discard uploaded chunks that are already available in the datastore (server-side deduplication)

Thanks, You and @mow are correct. I hadn't realized these two sides of de-duplication. Indeed, when I check the server side logs, there is only 1 chunk added when duplication is ~100%. Running "find" and "du" utilities on the datastore filesystem confirms this.
you didn't actually change the archive name ;) but yeah, if you want client-side deduplication you need to
- not mix different source paths (e.g., backup /a in one snapshot, then /b in the next)
- use the same archive names

if you want to have different backup chains for different paths (e.g., backups of /a and backups of /b, without combining them into single backup snapshots covering both), you can specify --backup-id to override the default (local hostname). then each chain should be able to properly reference the previous snapshot within that chain, and deduplicate accordingly.
As long as the server de-duplicates, I can live with the inefficiency of the client not de-duplicating when backups paths change. It only takes a few more seconds for a small backup to finish. I may use your --backup--id suggestion to separate smaller backups from larger ones, and ensure that the latter always find the latest snapshot/index.
 
  • Like
Reactions: mow

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!