Verification: how are chunks selected?

reto

Member
Feb 12, 2022
23
5
8
102
How are snapshots verified? Always in full or are recently verified chunks skipped?

Lets say:
  • you have 3 snapshots, each 100gb, but all of them share 90gb of chunks.
  • all three are due to verify one day after another
  • What happens?
    • Option a) every day 100 GB chunks are read and verified?
    • Option b) First day reads 100 GB, second day reads 10 GB (because the other 90gb chunks were recently verified), etc. ?
I assume Option a) is being used, but this would mean that almost n*2 more data is being 'read' than really necessary, right?
 
I assume Option a) is being used,
you're right, this is the way it's implemented, because saving a 'last verified' timestamp would be taxing on the storage in addition to the already taxing chunking

also, just because the chunks were recently verified, does not mean that they are currently. if a chunk breaks between yesterday and today, and i verifiy
a backup today, it would be marked as verified, when in reality it's not. when always verifying all chunks necessary, we can be sure that the timeframe
is as short as possible

when verifying multiple snapshots in a single job (e.g. all of a backup group), we do not verify a chunk twice (for that job)
 
also, just because the chunks were recently verified, does not mean that they are currently. if a chunk breaks between yesterday and today, and i verifiy
a backup today, it would be marked as verified, when in reality it's not. when always verifying all chunks necessary, we can be sure that the timeframe
is as short as possible

Yes, good point, but at this point I already accepted, that my data is only verified every 'N days'. With the current system I'm verifying almost everything daily, assuming 95% data is unchanged.
when verifying multiple snapshots in a single job (e.g. all of a backup group), we do not verify a chunk twice (for that job)
Ah, that sounds useful. To be honest, maybe i'll run just the verification every month, and verify everything, It's probably much much less IO overall? Another idea might be to do different backgroups on different days, this would at least spread out the load.
 
Ah, that sounds useful. To be honest, maybe i'll run just the verification every month, and verify everything, It's probably much much less IO overall?
yes, since each chunk is only read once per job
 
@dcsapak this feels much much better, down from 1-2h every day (~45h per month), to 2.5h every month. Most of my backups are large and rarely changing, maybe the situation might look different in other use cases.

Maybe point out this issue more clearly in the documentation? Maybe suggest the once-a-month validation as another recommended pattern?

Just my two cents, this feels much healthier ;).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!