Corrupt backups are not re-synced

dleidert

Member
Jun 17, 2023
4
0
6
Hi,

I have multiple backup servers. One of them, consider it a secondary server, frequently runs a sync from a master server. Due to some faulty memory stick, the whole system restarted during some of these synchronizations until I discovered the issue. Now, multiple backups of that time period are compromised. The garbage collector produces errors that look like this:

Code:
TASK ERROR: marking used chunks failed: can't open dynamic index '"[..]/drive.ppxar.didx"': index too small (0)

Looking at the files, some of them indeed have a size of zero bytes:

Code:
total 0
-rw-r--r-- 1 backup backup 0 Sep 20 05:49 efi.mpxar.didx
-rw-r--r-- 1 backup backup 0 Sep 20 05:49 efi.ppxar.didx
-rw-r--r-- 1 backup backup 0 Sep 20 05:49 index.json.blob
-rw-r--r-- 1 backup backup 0 Sep 20 05:49 root.mpxar.didx
-rw-r--r-- 1 backup backup 0 Sep 20 05:49 root.ppxar.didx

But for some reason, the sync job doesn't pick these corrupt backups up although I have enabled the resync-option for any corrupt backups. Then I thought, maybe they need to be marked as corrupt by a verification? Thus, I ran verification jobs. Unfortunately, they are still not re-synced, even after the verification fails. And the verification job prints these weird errors:

Code:
2025-09-24T04:04:00+02:00: "can't verify chunk, load failed - store '[..]', unable to load chunk '4e4a9cfda5f6b53b3ec7cf2bd14928e236189f2c8619bbcef66a2e5723e99e38' - blob too small (0 bytes)."
2025-09-24T04:04:00+02:00: failed to get s3 backend while trying to rename bad chunk: 4e4a9cfda5f6b53b3ec7cf2bd14928e236189f2c8619bbcef66a2e5723e99e38
2025-09-24T04:04:00+02:00: corrupted chunk renamed to "[..]/.chunks/4e4a/4e4a9cfda5f6b53b3ec7cf2bd14928e236189f2c8619bbcef66a2e5723e99e38.0.bad"

But these are local backups. Why is the error saying something about an S3 backend? I haven't configured any S3 storage.

Why arent the corrupt backups re-synced? How can I force a re-sync?

I could just copy the correct didx files and I'm sure that would fix it. I could probably also remove all corrupt backups and transfer them again. I'm just not sure if that will fix the garbage collector. What is the best course of action here?
 
Hi,
But for some reason, the sync job doesn't pick these corrupt backups up although I have enabled the resync-option for any corrupt backups. Then I thought, maybe they need to be marked as corrupt by a verification? Thus, I ran verification jobs. Unfortunately, they are still not re-synced, even after the verification fails.
there was a regression in PBS version 4.0 with the resync corrupt, a fix is already available in git but not packaged yet at the time of writing, see https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=2fb481c4432f890612d9b0cd684c050ccb72ad2f

And the verification job prints these weird errors:

Code:
2025-09-24T04:04:00+02:00: "can't verify chunk, load failed - store '[..]', unable to load chunk '4e4a9cfda5f6b53b3ec7cf2bd14928e236189f2c8619bbcef66a2e5723e99e38' - blob too small (0 bytes)."
2025-09-24T04:04:00+02:00: failed to get s3 backend while trying to rename bad chunk: 4e4a9cfda5f6b53b3ec7cf2bd14928e236189f2c8619bbcef66a2e5723e99e38
2025-09-24T04:04:00+02:00: corrupted chunk renamed to "[..]/.chunks/4e4a/4e4a9cfda5f6b53b3ec7cf2bd14928e236189f2c8619bbcef66a2e5723e99e38.0.bad"

But these are local backups. Why is the error saying something about an S3 backend? I haven't configured any S3 storage.
Had a quick glance, this is indeed an incomplete branch, leading to the message being shown even though not an s3 backend. But the message is informative only, will send a patch to adapt this.

Why arent the corrupt backups re-synced? How can I force a re-sync?

I could just copy the correct didx files and I'm sure that would fix it. I could probably also remove all corrupt backups and transfer them again. I'm just not sure if that will fix the garbage collector. What is the best course of action here?
You will either have to wait for the regression fix to be packaged with the next version of proxmox-backup-manager or delete all snapshots of the group containing the corrupt one, up to and including the corrupt one. Then resync the snapshots.