Remote has 351 blob too small (0 bytes) during verify, primary ok, how to resync?

guerby · Nov 15, 2021

Hi,

During our weekly verify one week ago on a PBS datastore "datastore2" on PBS 1.1.13-2 machine "backup2" we got 351 errors "blob too small (0 bytes)". The verify job renamed the zero sized blob to ".0.bad":

Code:

# extract from LOGFILE=/var/log/proxmox-backup/tasks/...:
2021-11-07T13:33:39+01:00: verify datastore2:vm/10000/2021-06-18T21:30:01Z
2021-11-07T13:33:39+01:00:   check qemu-server.conf.blob
2021-11-07T13:33:39+01:00:   check drive-scsi0.img.fidx
2021-11-07T13:37:24+01:00: can't verify chunk, load failed - store 'datastore2', unable to load chunk '01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264' - blob too small (0 bytes).
2021-11-07T13:37:24+01:00: corrupted chunk renamed to "/mnt/datastore/datastore2/.chunks/01db/01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264.0.bad"

root@backup2:~# ls -l  /mnt/datastore/datastore2/.chunks/01db/01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264.0.bad
-rw-r--r-- 1 backup backup 0 Oct 13 17:51 /mnt/datastore/datastore2/.chunks/01db/01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264.0.bad
root@backup2:~# ls -l  /mnt/datastore/datastore2/.chunks/01db/01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264
ls: cannot access '/mnt/datastore/datastore2/.chunks/01db/01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264': No such file or directory

root@backup2:~# grep "blob too small" $LOGFILE|wc -l
351

This datastore2 is a daily remote sync of "datastore1" on PBS 1.1.13-2 machine "backup1", on this machine the corresponding chunk seem to be there with a non zero size, here is the first one:

Code:

root@backup1:~# ls -l  /mnt/datastore/datastore1/.chunks/01db/01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264*
-rw-r--r-- 1 backup backup 2948386 Jun 18 14:09 /mnt/datastore/datastore1/.chunks/01db/01dbac89d42d6d75d5d43878d5d3142d86941e082c41b7b2c268b3aef42c2264

Last week-end verify on datastore2 was successfull, probably it didn't recheck those problematic zero sized chunks.

Is there a way to force resync of those zero sized chunks?

I can do it manually but I wonder if there's a better way to make sure a primary and remote are well in sync.

Note: datastore1 and datastore2 are both about 10 TB. The underlying ZFS have scrubbed with zero error.

dcsapak · Nov 16, 2021

guerby said:
I can do it manually but I wonder if there's a better way to make sure a primary and remote are well in sync.

you could delete the offending snapshots and sync again (though this will only sync the snapshots newer than any valid snapshots on the target)
but doing it manually (or with rsync for example) is ok

the more interesting question is why do those chunks not contain any data? do you have any idea how that might have happened?

guerby · Nov 16, 2021

dcsapak said:
you could delete the offending snapshots and sync again (though this will only sync the snapshots newer than any valid snapshots on the target)
but doing it manually (or with rsync for example) is ok

the more interesting question is why do those chunks not contain any data? do you have any idea how that might have happened?

We had various crash with older PBS version and the PBS was undersized (CPU, RAM), we also had some reboot issues. So far my manual checks show all zero sized chunks on "backup2" are dated Jun 18 on "backup1".

So probably not a direct PBS bug. Found another "blob too small" in the forum:

https://forum.proxmox.com/threads/remote-offsite-backup-options.72846/

However it would be nice to have a PBS provided tool to make sure a primary and remote are in sync and if not suggest actions with hints about safety of suggested actions: replacing a zero sized chunk by the primary non zero sized one should be pretty safe.

Currently it looks like if a verify finds something wrong you have to fix manually which is not something you want to do that much on your backup infrastructure.

I'm usinc rsync dry-run and checksum to sort the issue right now.

Is there a PBS tool to display info on a given chunk?

Thanks again!

dcsapak · Nov 16, 2021

mhmm.. currently the sync is based on snapshots, so it will not sync chunks for existing snapshots.. this would probably blow up the sync time dramatically
maybe we can make some "chunk healing" during sync, were we look at the local verification status, and if it's bad and the remote one is ok, we sync again..
would you mind opening an enhancment request on our bugtracker so that we can properly track it ? https://bugzilla.proxmox.com/

guerby said:
Is there a PBS tool to display info on a given chunk?

there is 'proxmox-backup-debug' which can decode single chunks and indexes, as well as calculating the crc32 checksum, but this could be extended. what information did you have in mind?

guerby · Nov 16, 2021

dcsapak said:
mhmm.. currently the sync is based on snapshots, so it will not sync chunks for existing snapshots.. this would probably blow up the sync time dramatically
maybe we can make some "chunk healing" during sync, were we look at the local verification status, and if it's bad and the remote one is ok, we sync again..
would you mind opening an enhancment request on our bugtracker so that we can properly track it ? https://bugzilla.proxmox.com/

there is 'proxmox-backup-debug' which can decode single chunks and indexes, as well as calculating the crc32 checksum, but this could be extended. what information did you have in mind?

Done (we have "BASIC" level support on PBS but I didn't look how to link account to that yet

https://bugzilla.proxmox.com/show_bug.cgi?id=3727

For proxmox-backup-debug it looks like it's not there in 1.x, will test a 2.x ASAP, thanks for the hint!

Search

Search

Remote has 351 blob too small (0 bytes) during verify, primary ok, how to resync?

guerby

Member

dcsapak

Proxmox Staff Member

guerby

Member

dcsapak

Proxmox Staff Member

guerby

Member