Verify too liberal / slow

Big4SMK

Active Member
Jun 7, 2017
23
2
43
40
I have my PBS set up to do daily verifies, and re-verify set to 30 days. With a daily backup of several hundreds Gb of data that hardly changes, I was expecting this to be a good setup. However, I am facing multi hour verfies each days which seems surprising.

The verify log shows plenty of "SKIPPED" messages as I'd expect, but the (new) ones that do need to be verified take ages even though my backup log showed that 0 bytes where transferred for the new (incremental) backup.

Could it be that the verify process is ignoring previously verified chunks? And thus effectively checks almost all my chunks on disk on a daily basis?
 
  • Like
Reactions: Elliott Partridge
Hey,

the verification is not "incremental", so yes, everything will be verified possibly multiple times a day, depending on your backup frequency. Maybe consider reducing the number of verifications, something like once a week for new backups and reverification every 1-2 months seems reasonable. Something you also should consider is how reliable your underlying storage is since many modern file systems already do error detection, and possibly correction.

Note: Caching chunk verification results could speed up the whole process, however, it would also introduce the problem of possibly basing a verification on outdated chunk verification results.
 
I fail to see how you suggestions would help me. Wouldn't it just change the problem from one 6 hour verify daily to one 6x7=54 hour verify weekly? Or is the process smart enough to do chunk-verification-deduplication for a single run, just not for deduplication across multiple runs?

On you note, I would argue that that is exactly what I would expect from my "re-verify after 30 days" job. The goal would be to verify data on disk every 30 days (or maybe 2 months per your suggestion), and with an "OK" it signals exectly that. It is identical to what is shown in the UI now, all my backups are marked "OK", even though my daily verify job might have last checked it 29 days ago.
On top of that, I would expect I could launch a one-time verify job, that would either just re-verify all chunks regardless of last verify timestamp, or it could give me a choice between something like "force reverfication" or "Only verify older than <somedate>".

[/my 2 cents]
 
I would hope that verifying multiple snapshots that have few incremental changes in succession would result in only marginal increases in verification time (we are verifying the chunks themselves, no?). If this is true, then it seems it would be better to run verifications in batches rather than after each backup, as @Hannes Laimer suggests. But I'm just guessing here. Interested to hear staff's response.

I'm considering giving up on verification, as my backup storage filesystem is ZFS, which already detects bit-rot & corruption.

EDIT: The proof is in the pudding. Looking at a group verification (log below), I can see that the first snapshot has a much higher verification size than the following snapshots. So @Big4SMK , you shouldn't expect to see a 7x multiplier on a weekly verification job. It would be some marginal factor (~1.6x in my example below, which covers 13 snapshots).

Code:
2021-03-26T09:53:00-04:00: verify group backup:ct/100 (13 snapshots)
2021-03-26T09:53:00-04:00: verify backup:ct/100/2020-11-24T08:00:02Z
2021-03-26T09:53:00-04:00:   check pct.conf.blob
2021-03-26T09:53:00-04:00:   check root.pxar.didx
2021-03-26T09:53:49-04:00:   verified 319.21/816.04 MiB in 48.94 seconds, speed 6.52/16.67 MiB/s (0 errors)
2021-03-26T09:53:49-04:00:   check catalog.pcat1.didx
2021-03-26T09:53:49-04:00:   verified 0.20/0.46 MiB in 0.15 seconds, speed 1.32/2.99 MiB/s (0 errors)
2021-03-26T09:53:49-04:00: percentage done: 0.55% (0 of 14 groups, 1 of 13 group snapshots)
2021-03-26T09:53:49-04:00: verify backup:ct/100/2020-11-23T08:00:02Z
2021-03-26T09:53:49-04:00:   check pct.conf.blob
2021-03-26T09:53:49-04:00:   check root.pxar.didx
2021-03-26T09:53:50-04:00:   verified 28.71/78.04 MiB in 1.26 seconds, speed 22.74/61.83 MiB/s (0 errors)
2021-03-26T09:53:50-04:00:   check catalog.pcat1.didx
2021-03-26T09:53:50-04:00:   verified 0.20/0.46 MiB in 0.11 seconds, speed 1.92/4.34 MiB/s (0 errors)
2021-03-26T09:53:50-04:00: percentage done: 1.10% (0 of 14 groups, 2 of 13 group snapshots)
2021-03-26T09:53:50-04:00: verify backup:ct/100/2020-11-22T08:00:02Z
2021-03-26T09:53:50-04:00:   check pct.conf.blob
2021-03-26T09:53:50-04:00:   check root.pxar.didx
2021-03-26T09:53:54-04:00:   verified 32.25/96.36 MiB in 3.53 seconds, speed 9.13/27.29 MiB/s (0 errors)
2021-03-26T09:53:54-04:00:   check catalog.pcat1.didx
2021-03-26T09:53:54-04:00:   verified 0.20/0.46 MiB in 0.15 seconds, speed 1.39/3.16 MiB/s (0 errors)
2021-03-26T09:53:54-04:00: percentage done: 1.65% (0 of 14 groups, 3 of 13 group snapshots)
2021-03-26T09:53:54-04:00: verify backup:ct/100/2020-11-21T08:00:02Z
2021-03-26T09:53:54-04:00:   check pct.conf.blob
2021-03-26T09:53:54-04:00:   check root.pxar.didx
2021-03-26T09:53:56-04:00:   verified 32.30/94.01 MiB in 1.55 seconds, speed 20.88/60.79 MiB/s (0 errors)
2021-03-26T09:53:56-04:00:   check catalog.pcat1.didx
2021-03-26T09:53:56-04:00:   verified 0.20/0.46 MiB in 0.09 seconds, speed 2.34/5.30 MiB/s (0 errors)
2021-03-26T09:53:56-04:00: percentage done: 2.20% (0 of 14 groups, 4 of 13 group snapshots)
2021-03-26T09:53:56-04:00: verify backup:ct/100/2020-11-20T08:00:02Z
2021-03-26T09:53:56-04:00:   check pct.conf.blob
2021-03-26T09:53:56-04:00:   check root.pxar.didx
2021-03-26T09:53:59-04:00:   verified 28.21/75.46 MiB in 2.98 seconds, speed 9.46/25.31 MiB/s (0 errors)
2021-03-26T09:53:59-04:00:   check catalog.pcat1.didx
2021-03-26T09:53:59-04:00:   verified 0.20/0.46 MiB in 0.14 seconds, speed 1.48/3.36 MiB/s (0 errors)
2021-03-26T09:53:59-04:00: percentage done: 2.75% (0 of 14 groups, 5 of 13 group snapshots)
2021-03-26T09:53:59-04:00: verify backup:ct/100/2020-11-19T08:00:02Z
2021-03-26T09:53:59-04:00:   check pct.conf.blob
2021-03-26T09:53:59-04:00:   check root.pxar.didx
2021-03-26T09:54:00-04:00:   verified 19.93/54.23 MiB in 1.16 seconds, speed 17.19/46.78 MiB/s (0 errors)
2021-03-26T09:54:00-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:00-04:00:   verified 0.20/0.46 MiB in 0.21 seconds, speed 0.96/2.17 MiB/s (0 errors)
2021-03-26T09:54:00-04:00: percentage done: 3.30% (0 of 14 groups, 6 of 13 group snapshots)
2021-03-26T09:54:00-04:00: verify backup:ct/100/2020-11-18T08:00:03Z
2021-03-26T09:54:00-04:00:   check pct.conf.blob
2021-03-26T09:54:00-04:00:   check root.pxar.didx
2021-03-26T09:54:02-04:00:   verified 28.09/74.53 MiB in 1.34 seconds, speed 20.90/55.44 MiB/s (0 errors)
2021-03-26T09:54:02-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:02-04:00:   verified 0.20/0.46 MiB in 0.15 seconds, speed 1.31/2.98 MiB/s (0 errors)
2021-03-26T09:54:02-04:00: percentage done: 3.85% (0 of 14 groups, 7 of 13 group snapshots)
2021-03-26T09:54:02-04:00: verify backup:ct/100/2020-11-17T08:00:03Z
2021-03-26T09:54:02-04:00:   check pct.conf.blob
2021-03-26T09:54:02-04:00:   check root.pxar.didx
2021-03-26T09:54:05-04:00:   verified 24.81/74.13 MiB in 3.53 seconds, speed 7.02/20.98 MiB/s (0 errors)
2021-03-26T09:54:05-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:05-04:00:   verified 0.20/0.46 MiB in 0.17 seconds, speed 1.16/2.62 MiB/s (0 errors)
2021-03-26T09:54:05-04:00: percentage done: 4.40% (0 of 14 groups, 8 of 13 group snapshots)
2021-03-26T09:54:05-04:00: verify backup:ct/100/2020-11-15T08:00:02Z
2021-03-26T09:54:05-04:00:   check pct.conf.blob
2021-03-26T09:54:05-04:00:   check root.pxar.didx
2021-03-26T09:54:09-04:00:   verified 24.66/68.19 MiB in 3.38 seconds, speed 7.29/20.17 MiB/s (0 errors)
2021-03-26T09:54:09-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:09-04:00:   verified 0.20/0.46 MiB in 0.14 seconds, speed 1.43/3.24 MiB/s (0 errors)
2021-03-26T09:54:09-04:00: percentage done: 4.95% (0 of 14 groups, 9 of 13 group snapshots)
2021-03-26T09:54:09-04:00: verify backup:ct/100/2020-11-08T08:00:03Z
2021-03-26T09:54:09-04:00:   check pct.conf.blob
2021-03-26T09:54:09-04:00:   check root.pxar.didx
2021-03-26T09:54:11-04:00:   verified 31.00/87.62 MiB in 1.99 seconds, speed 15.56/43.96 MiB/s (0 errors)
2021-03-26T09:54:11-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:11-04:00:   verified 0.20/0.46 MiB in 0.20 seconds, speed 1.00/2.26 MiB/s (0 errors)
2021-03-26T09:54:11-04:00: percentage done: 5.49% (0 of 14 groups, 10 of 13 group snapshots)
2021-03-26T09:54:11-04:00: verify backup:ct/100/2020-11-01T08:00:02Z
2021-03-26T09:54:11-04:00:   check pct.conf.blob
2021-03-26T09:54:11-04:00:   check root.pxar.didx
2021-03-26T09:54:13-04:00:   verified 12.39/42.51 MiB in 1.26 seconds, speed 9.86/33.83 MiB/s (0 errors)
2021-03-26T09:54:13-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:14-04:00:   verified 0.20/0.46 MiB in 0.34 seconds, speed 0.60/1.35 MiB/s (0 errors)
2021-03-26T09:54:14-04:00: percentage done: 6.04% (0 of 14 groups, 11 of 13 group snapshots)
2021-03-26T09:54:14-04:00: verify backup:ct/100/2020-10-25T07:00:02Z
2021-03-26T09:54:14-04:00:   check pct.conf.blob
2021-03-26T09:54:14-04:00:   check root.pxar.didx
2021-03-26T09:54:16-04:00:   verified 29.85/84.28 MiB in 1.96 seconds, speed 15.23/43.01 MiB/s (0 errors)
2021-03-26T09:54:16-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:17-04:00:   verified 0.20/0.46 MiB in 0.26 seconds, speed 0.77/1.75 MiB/s (0 errors)
2021-03-26T09:54:17-04:00: percentage done: 6.59% (0 of 14 groups, 12 of 13 group snapshots)
2021-03-26T09:54:17-04:00: verify backup:ct/100/2020-10-18T07:00:02Z
2021-03-26T09:54:17-04:00:   check pct.conf.blob
2021-03-26T09:54:17-04:00:   check root.pxar.didx
2021-03-26T09:54:21-04:00:   verified 27.92/77.02 MiB in 3.91 seconds, speed 7.14/19.70 MiB/s (0 errors)
2021-03-26T09:54:21-04:00:   check catalog.pcat1.didx
2021-03-26T09:54:21-04:00:   verified 0.20/0.46 MiB in 0.24 seconds, speed 0.85/1.94 MiB/s (0 errors)
2021-03-26T09:54:21-04:00: percentage done: 7.14% (0 of 14 groups, 13 of 13 group snapshots)
 
Last edited:
@Hannes Laimer I'm trying to test like @Elliott Partridge but somehow there is still a verification job.Here's a screenshot of my last daily backup:

Capture.PNG


I can't find the source of the verify task.

1) The verify jobs screen for my datastore is empty
1617260462006.png
2) Same for one level up
1617260490496.png
3) same for CLI
1617260544680.png
4) verify jobs from the task summary show Mar 26 as the last runtime, which is when I removed the scheduled verification so that does make sense
1617260609443.png
5) there are no hints in syslog that a verify task was started.
6) I have rebooted the server about 48 hours ago, to no avail.


How do I stop that verification from happening so I can start testing the incremental verification?

This is a PBS that is syncing from a remote that DOES have a job to verify daily. Can it be that the verfiy status is synched accidentally?

[EDIT] Oh SH*T that's it

Here are my steps to reproduce, let's say we have PBS-A which I use for backing up my containers from PVE, and PBS-B which is synching from PBS-A as the remote.

1) Make a backup from PVE to PBS-A
2) On PBS-A I see an unverified backup, as the daily verify didn't run yet
3) On PBS-B, do a manual remote sync. I now have an unverified backup from my container here
4) On PBS-A, manually run a verify job for the backup. I now have a verified backup here
5) On PBS-B, do another manual remote sync. I now have a VERIFIED backup from my container here

That sure sounds like a bug to me. I have never asked PBS-B to verify the snapshot.
 

Attachments

  • 1617260426004.png
    1617260426004.png
    11 KB · Views: 7
Last edited:
I would hope that verifying multiple snapshots that have few incremental changes in succession would result in only marginal increases in verification time (we are verifying the chunks themselves, no?). If this is true, then it seems it would be better to run verifications in batches rather than after each backup, as @Hannes Laimer suggests. But I'm just guessing here. Interested to hear staff's response.

I'm considering giving up on verification, as my backup storage filesystem is ZFS, which already detects bit-rot & corruption.

EDIT: The proof is in the pudding. Looking at a group verification (log below), I can see that the first snapshot has a much higher verification size than the following snapshots. So @Big4SMK , you shouldn't expect to see a 7x multiplier on a weekly verification job. It would be some marginal factor (~1.6x in my example below, which covers 13 snapshots).
yes, as you've correctly deduced with your experiment, a single chunk will only be verified once in a verification run/job/task. but the verify result (and result caching) has the snapshot as granularity, not the chunk.
 
  • Like
Reactions: Elliott Partridge
[EDIT] Oh SH*T that's it

Here are my steps to reproduce, let's say we have PBS-A which I use for backing up my containers from PVE, and PBS-B which is synching from PBS-A as the remote.

1) Make a backup from PVE to PBS-A
2) On PBS-A I see an unverified backup, as the daily verify didn't run yet
3) On PBS-B, do a manual remote sync. I now have an unverified backup from my container here
4) On PBS-A, manually run a verify job for the backup. I now have a verified backup here
5) On PBS-B, do another manual remote sync. I now have a VERIFIED backup from my container here

That sure sounds like a bug to me. I have never asked PBS-B to verify the snapshot.

pulling/syncing actually does a verification as part of the download (else we'd have to take the remote's word for it, which is not a good idea ;)), so we keep the verification result around as well. uploading a backup does the same btw (otherwise a client could upload broken chunks). note that in both cases the verification is just in-memory before writing the chunk to disk, so it's cheaper than for a plain verification that has to load the chunk first as well.

the "verify after backup" is just for those who are extra paranoid about the backup handling having bugs (e.g., a bug that forgets to reference some deduplicated chunk), or their storage being broken and sometimes not syncing or bit-flipping or whatever.
 
I'm getting confused now. While I 100% agree that it would be a bad idea to trust the remote's verification status, I don't see how verification is being handeled on PBS-B. I guess I have a couple of questions that might help me understand your statements.

1) What is meant when a backup is marked as "verified"? I was expecting verification to mean "make sure the bits on disk are ok". I don't get how verification in memory before writing to disk would solve that issue, to me it seems like that would only verify if data was transferred over the network correctly, not whether it was written to disk without errors.

2) When PBS-B syncs an unverified backup from PBS-A, why does it show as unverified on PBS-B. This seems to contradict you statement that "pulling/syncing actually does a verification as part of the download". If it does a verify in the process, shouldn't a synced backup always be marked as verified on the receiving end? (this goes to observation 3 in the edit of my previous post)

3) What happens when PBS-B has synced an unverified backup from PBS-A in the past, but on a subsequent sync finds the backup is verified on the remote? (observation 5 in the edit of my previous post)
3a) Does the data get transferred again? (if so, this would push me towards scheduling my sync job on PBS-B after the verify job on PBS-A is done to make sure I don't transfer the backup twice over the internet)
3b) Does the data on disk get reverified? if so, can you explain why this is so much faster than running a separate verify job on PBS-B, which is what I opened this topic for in the first place? Is this because only new chunks get transferred to the client so only those need to be (re)verified insteadof all chunks for a normal verfify job?

4) When a verified backup is synced on PBS-B, why does it show that verification happened at the time the backup was verified on PBS-A, shouldn't this be the time the sync happened if that's when verification is happening?
 
I'm getting confused now. While I 100% agree that it would be a bad idea to trust the remote's verification status, I don't see how verification is being handeled on PBS-B. I guess I have a couple of questions that might help me understand your statements.

1) What is meant when a backup is marked as "verified"? I was expecting verification to mean "make sure the bits on disk are ok". I don't get how verification in memory before writing to disk would solve that issue, to me it seems like that would only verify if data was transferred over the network correctly, not whether it was written to disk without errors.

a backup is marked as verified if a verify task was successful (consistency of index, and verification of all referenced chunks: load from disk, check CRC, if unencrypted also verify digest). the latter part also happens when chunks are uploaded as part of a backup, or downloaded as part of a pull/sync job - in that case, it is just used as a safeguard, and DOES NOT write a verification result to the backup metadata.

2) When PBS-B syncs an unverified backup from PBS-A, why does it show as unverified on PBS-B. This seems to contradict you statement that "pulling/syncing actually does a verification as part of the download". If it does a verify in the process, shouldn't a synced backup always be marked as verified on the receiving end? (this goes to observation 3 in the edit of my previous post)
no, you misunderstood what I wrote. a pull will never create a verification result where there was none before. if a backup snapshot already has a verification result before the pull, that result will be synced and not thrown away or modified - the verification result is part of the metadata. this is okay since the verification result contains a timestamp that states it was before the sync, and also since we verify when pulling that what we download is what we expect.
3) What happens when PBS-B has synced an unverified backup from PBS-A in the past, but on a subsequent sync finds the backup is verified on the remote? (observation 5 in the edit of my previous post)
3a) Does the data get transferred again? (if so, this would push me towards scheduling my sync job on PBS-B after the verify job on PBS-A is done to make sure I don't transfer the backup twice over the internet)
no, because once a snapshot has been synced once it won't be synced again (the last step of the sync puts the index.json in place, and after that sync will ignore it) - except for the very last one, which will be re-checked to see if a client log has been uploaded and needs to be synced.
3b) Does the data on disk get reverified? if so, can you explain why this is so much faster than running a separate verify job on PBS-B, which is what I opened this topic for in the first place? Is this because only new chunks get transferred to the client so only those need to be (re)verified insteadof all chunks for a normal verfify job?
yes. and also, because the pull (like a large verification) is optimized to reduce work and only downloads + verifies a chunk once even if referenced by multiple snapshots in the group.
4) When a verified backup is synced on PBS-B, why does it show that verification happened at the time the backup was verified on PBS-A, shouldn't this be the time the sync happened if that's when verification is happening?
no, because the actual last "full verify" was on PBS-A so that is the timestamp the verification result should reflect. the verification result also contains the information on which node/system it was verified (although I just noticed that this is not displayed well in the GUI ;))
 
Thank you for the thorough response, the information is greatly appreciated. I have everything I need to come up with a backup/verify plan that's adequate to my different use cases.

The main points to me are:
1) backups are verified even though a verify result is not always registered. This is on backup as well as sync/transfer.
2) Re-verify will check all chunks of all backups that will need to be verified, but checks each chunk only once for a single job, therefor you are better off doing large verifies sparsely vs doing small verifies often.

"Trust is good, control is better "
 
Thank you for the thorough response, the information is greatly appreciated. I have everything I need to come up with a backup/verify plan that's adequate to my different use cases.

The main points to me are:
1) backups are verified even though a verify result is not always registered. This is on backup as well as sync/transfer.
yeah. but like you said, those are in-memory verifications only. we do always write out to temp files and rename those into the final place, and we always write chunks before indices before the manifest, so a storage would have to be very ill-behaved for PBS to think everything was written out okay while it is in fact broken.

2) Re-verify will check all chunks of all backups that will need to be verified, but checks each chunk only once for a single job, therefor you are better off doing large verifies sparsely vs doing small verifies often.
that is correct. if your storage already takes care of bit-rot (e.g, ZFS), you might also decide that extra verification on top is not worth the extra load (or just do it once a month/quarter/.. to benefit more from the intra-task deduplication logic).
"Trust is good, control is better "
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!