[SOLVED] Verification failed on these snapshots/groups:

gramels

Member
May 9, 2022
13
3
8
Hi,

I do get

Code:
Datastore: nas0

Verification failed on these snapshots/groups:


  vm/113/2023-04-22T02:25:46Z
  vm/113/2023-04-22T01:08:44Z
  vm/113/2023-04-21T22:26:17Z
  vm/113/2023-04-21T20:25:32Z
  vm/113/2023-04-21T18:25:38Z
  vm/113/2023-04-21T16:25:25Z
  vm/113/2023-04-21T14:25:45Z
  vm/113/2023-04-21T12:25:38Z
  vm/113/2023-04-21T10:25:36Z
  vm/113/2023-04-21T08:25:42Z
  vm/113/2023-04-21T06:25:47Z
  vm/113/2023-04-21T04:25:55Z

an do not understand how to fix this.
Likely something happened, when the storage was full.

Here comes the log of the verify task.
 

Attachments

PBS can't fix them. In case you got a filesystem that can heal itself, like for example ZFS, you could tell the filesystem to fix that corrupted data. If not you can try to backup those VMs again and PBS will try to replace the damaged chunks with healthy ones. This of cause will only work if those chunks still exist.
 
Hi Dunuin, thanks or your answere.
If not you can try to backup those VMs again and PBS will try to replace the damaged chunks with healthy one.
How would I do that? I tried to restart the guest to mark the hole disk dirty. But it continues to backup and the verify fails. It seems some chunks got lost by cleaning up the full filesystem.

What do I need to do to get a clean full backup again?
 
Please do not create new threads or post the same post in other, much older threads!

If a chunk is corrupt, you can only try to recreate it. Either from another backup location further down your backup chain. For example a remote PBS to which the backups are synced to or a tape.
If a new backup of the VM does not recreate the chunk either, you are out of luck. That data is not in the VM anymore as it was when that, now corrupt, chunk was created.

The PBS will let you know which backups are affected. You could remove them.

This is why, as @Dunuin mentioned, using a storage for the backups that can recover from bitflips is quite useful. Or otherwise to have additional copies of the backups.
 
sorry for teh x-post, but there was no reaction for weeks.

Still do not understand what concrete steps to do. The guest gets back-upped every 2 hours and verify daily fails since 6 weeks now. How do I get to a clean backup?

Ho do I force to recreate a chunk?
How can I trigger a new clean backup?
 
Newer backups are verified without a problem right? Then those backups that contain the corrupted chunk will not be able to do a full restore anymore. Unless you really need some data that is only on these backups, I would remove them.

The other question though is, when looking through the log, why the chunk went missing
Code:
2023-04-22T09:10:46+02:00: can't verify chunk, load failed - store 'nas0', unable to load chunk 'd12bef61b509848a8f00774db4849e1610dcd965bf08c6eb886b258a9f4b5ef5' - No such file or directory (os error 2)

and why we see errors like these:
Code:
2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-21T01:19:30Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-21T01:19:30Z" for locking - ENOENT: No such file or directory
2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-20T22:26:05Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-20T22:26:05Z" for locking - ENOENT: No such file or directory
Are these directories gone or still present?

Files and directories should not get lost :-/

What kind of storage is it? Looks like an NFS share. But what kind of NAS, and what is the filesystem backing the share?

Is there any kind of script running that might interact with the files themselves? Or was the data moved at some point?
 
Newer backups are verified without a problem right? Then those backups that contain the corrupted chunk will not be able to do a full restore anymore. Unless you really need some data that is only on these backups, I would remove them.
They do not get verified, this is what confuses me.
I deleted all failed backups, but the new ones after continue to fail.

The other question though is, when looking through the log, why the chunk went missing
Code:
2023-04-22T09:10:46+02:00: can't verify chunk, load failed - store 'nas0', unable to load chunk 'd12bef61b509848a8f00774db4849e1610dcd965bf08c6eb886b258a9f4b5ef5' - No such file or directory (os error 2)

and why we see errors like these:
Code:
2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-21T01:19:30Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-21T01:19:30Z" for locking - ENOENT: No such file or directory
2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-20T22:26:05Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-20T22:26:05Z" for locking - ENOENT: No such file or directory
Are these directories gone or still present?

Files and directories should not get lost :-/

Code:
root@pbs1:/mnt/nas0-nfs-pbs/vm/113# cd 2023-04-20T22:26:05Z
-bash: cd: 2023-04-20T22:26:05Z: No such file or directory
root@pbs1:/mnt/nas0-nfs-pbs/vm/113# cd 2023-04-21T01:19:30Z
-bash: cd: 2023-04-21T01:19:30Z: No such file or directory
root@pbs1:/mnt/nas0-nfs-pbs/vm/113#

What kind of storage is it? Looks like an NFS share. But what kind of NAS, and what is the filesystem backing the share?
Synology NAS brtf

Is there any kind of script running that might interact with the files themselves?
No.
Or was the data moved at some point?

The error started to appear after the storage capacity was full.
Might have happened due to an inappropriate cleanup.

The main question for me is how to get back to a verifiable backup, even without those lost chunks. The VM 113 is running fine. All backup verify since back then fail.

Screenshot 2023-05-03 at 15.38.01.png
 
Last edited:
The error started to appear after the storage capacity was full.
Might have happened due to an inappropriate cleanup.
Okay, that could be an explanation.

With the chunk definitely missing and not coming back, the easiest way would probably be to remove these backups that show the failed verification. You can do so by clicking the red garbage can icon.

Once they are gone, observe the next verification. If it doesn't go through, please post the logs, so we can investigate further.
 
I tried to restart the guest to mark the hole disk dirty. But it continues to backup and the verify fails. It seems some chunks got lost by cleaning up the full filesystem.
One more thing here. Did you reboot from within the VM or was the VM completely powered off at some point?
You should be able to access the task log from the backup after the reboot. Does it mention that a new dirty-bitmap was created?
 
Last edited:
Okay, then it must have been a fresh dirty-bitmap and the chunk would have been sent to the PBS if it would still exist in that form.
 
With the chunk definitely missing and not coming back, the easiest way would probably be to remove these backups that show the failed verification. You can do so by clicking the red garbage can icon.
I did that already, but will try again.
Likely I missed the not yet verified ones.
Deleted all failed and not yet verified and report back.

Thanks for your help!
 
seems solved.
Verification passed on all new backups.

The issues (beside the inappropriate deletion of a chunk) was

Unverified backups had not been deleted, only failed once. I assume that those backups still referenced the lost chunk.
Once I deleted all failed and unverified backups, a propper backup had been recreated.

Open question remains:

Why does PBS not recreate the lost junk?

@aaron Thanks for your help!
 
  • Like
Reactions: aaron
Why does PBS not recreate the lost junk?
PBS can't fix them. In case you got a filesystem that can heal itself, like for example ZFS, you could tell the filesystem to fix that corrupted data. If not you can try to backup those VMs again and PBS will try to replace the damaged chunks with healthy ones. This of cause will only work if those chunks still exist.
If those data doesn't exist anymore on your PVEs virtual disk, there is nothing it could recreate the chunks from.
 
Last edited:
Adding to the explanation: if the datastore ran out of space… well, the chunk might not have been created at all and in the meantime the data on the VM changed, so a new backup created a different chunk.
 
Adding to the explanation: if the datastore ran out of space… well, the chunk might not have been created at all and in the meantime the data on the VM changed, so a new backup created a different chunk.
But then, why did new backups fail to verify? Why would they reference a block that isn't needed anymore?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!