[solved by re-re-verify] All four snapshots in a backup fail re-verification, new snapshot fails verification as well

wbk

Active Member
Oct 27, 2019
195
24
38
TL;WR:
This must have been a one-off run-together of circumstances
  • 4 snapshots of a backup failed re-verification
    • 8 corrupt chunks in snapshot 1
  • running 5th snapshot backup while verifying 1-4
  • 5th snapshot also fails verification
    • 3 different chunks
  • restore snapshot 1 from tape (2 attempts) does not write any chunks to disk
  • re-verification of snapshot 1 passes
  • sequential re-verification of snapshots 2, 3, 4 passes
  • re-verification of snapshot 5 still fails, will create new snapshot after upgrading to PVE8 / PBS8
Lessons learned:
  1. Patience
  2. Maybe you can not just throw everything at once at low-end hardware
  3. Bit rot does exist
Thanks for the help!


Original post follows:
-------------------------------------------------

Hi all,

On re-verification of snapshots in a backup, all four of them fail.

The 'verificate all'-job was running while there was a new snapshot in this group being made.

The newly created snapshot fails the initial verification.

There is only one container in this backup.

There is another backup group with multiple containers. They were re-verified in the same job run, and passed.

I have the first snapshot of the failed group on tape. It is being restored now.

I am a bit surprised that five snapshots of a specific container fail verification at once. Can a problem in the first snapshot cause subsequent snapshots of that container to fail verification? Because of deduplication, a large number of chunks in the first snapshot are also part of subsequent snapshots.

More importantly, 'Now what?'

The source container is running without problems, so I have no stress ;-) I could create a new backup set with an initial backup for this container, but it takes a day and a half making it less than a 'quick check'.

Thanks for reading!
 
Last edited:
Hi all,

On re-verification of snapshots in a backup, all four of them fail.

The 'verificate all'-job was running while there was a new snapshot in this group being made.

The newly created snapshot fails the initial verification.

There is only one container in this backup.

There is another backup group with multiple containers. They were re-verified in the same job run, and passed.

I have the first snapshot of the failed group on tape. It is being restored now.

I am a bit surprised that five snapshots of a specific container fail verification at once. Can a problem in the first snapshot cause subsequent snapshots of that container to fail verification? Because of deduplication, a large number of chunks in the first snapshot are also part of subsequent snapshots.

More importantly, 'Now what?'

The source container is running without problems, so I have no stress ;-) I could create a new backup set with an initial backup for this container, but it takes a day and a half making it less than a 'quick check'.

Thanks for reading!
Hi,
the verification for a backup will fail if one or more of the chunks indexed by that backup is corrupt, meaning the hash of its content is not the expected hash anymore. Since the chunks are deduplicated, meaning the same chunk can be indexed by multiple backups (as the data stored in that chunk is the same for all of them), verification for all of the backups referencing that chunk will fail, the bad chunk being flagged as such.

If you backup the same container again, the chunk will be rewritten (assuming the data in the container did not change since then), and the new and older backups should verify to be okay again.

If however the data changed in-between, the chunk cannot be recreated, the old backups will still be corrupt, but the new one will be fine again.

In your case, the new backup run seems to still have indexed the corrput chunk, so re-run the backup once again and re-run the verify jobs after successful backup.
 
Hi Chris & forum,

Thanks for your explanation! Sorry for not getting back any sooner. I was writing a reply while checking and comparing logs when my graphics card crashed, losing my draft and 'research'.

I'll try to edit this post tonight with some useful reply!
<edit 10 days later>
Sorry, I didn't make 'tonight', nor did I re-run my analysis to have anything useful to add yet. I am running a tape restore with the owner of the job corresponding the owner of the backup, and postpone my new backup + verification until that has finished.
 
Last edited:
There were eight chunks marked as corrupt in the first snapshot, which caused all subsequent snapshots to fail verification.

The tape restore (of the first snapshot) did not restore any chunks, as it did not find any deficiencies in the snapshot.

Re-verification of the first snapshot afterwards did not find any corrupt chunks

I verified the next three snapshots without failing any.

My fifth snapshot, which I created while having the other four fail verification, failed verification directly after creation and also on verification after the other four passed verification. There are three corrupt chunks.

PBS runs on a fairly low-specced box (elderdly quad-core Atom D525 with 4 GB of RAM, no ECC, with ZFS on single spindle spinning SAS as datastore).

The failed verifications come to be while doing the fifth backup, ie, I was running the backup while also starting verifications on existing snapshots. That is no reason for things to break, but in the heat of the battle a single might topple over the wrong way.

It might be interesting to speculate over the exact order of events that might have caused the error, but I think that that would not be helpful to any next visitor of this thread.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!