[SOLVED] Verification failed on these snapshots/groups:

gramels · Apr 22, 2023

Hi,

I do get

Code:

Datastore: nas0

Verification failed on these snapshots/groups:


  vm/113/2023-04-22T02:25:46Z
  vm/113/2023-04-22T01:08:44Z
  vm/113/2023-04-21T22:26:17Z
  vm/113/2023-04-21T20:25:32Z
  vm/113/2023-04-21T18:25:38Z
  vm/113/2023-04-21T16:25:25Z
  vm/113/2023-04-21T14:25:45Z
  vm/113/2023-04-21T12:25:38Z
  vm/113/2023-04-21T10:25:36Z
  vm/113/2023-04-21T08:25:42Z
  vm/113/2023-04-21T06:25:47Z
  vm/113/2023-04-21T04:25:55Z

an do not understand how to fix this.
Likely something happened, when the storage was full.

Here comes the log of the verify task.

Dunuin · Apr 23, 2023

PBS can't fix them. In case you got a filesystem that can heal itself, like for example ZFS, you could tell the filesystem to fix that corrupted data. If not you can try to backup those VMs again and PBS will try to replace the damaged chunks with healthy ones. This of cause will only work if those chunks still exist.

gramels · Apr 23, 2023

Hi Dunuin, thanks or your answere.

Dunuin said:
If not you can try to backup those VMs again and PBS will try to replace the damaged chunks with healthy one.

How would I do that? I tried to restart the guest to mark the hole disk dirty. But it continues to backup and the verify fails. It seems some chunks got lost by cleaning up the full filesystem.

What do I need to do to get a clean full backup again?

aaron · May 3, 2023

Please do not create new threads or post the same post in other, much older threads!

If a chunk is corrupt, you can only try to recreate it. Either from another backup location further down your backup chain. For example a remote PBS to which the backups are synced to or a tape.
If a new backup of the VM does not recreate the chunk either, you are out of luck. That data is not in the VM anymore as it was when that, now corrupt, chunk was created.

The PBS will let you know which backups are affected. You could remove them.

This is why, as @Dunuin mentioned, using a storage for the backups that can recover from bitflips is quite useful. Or otherwise to have additional copies of the backups.

gramels · May 3, 2023

sorry for teh x-post, but there was no reaction for weeks.

Still do not understand what concrete steps to do. The guest gets back-upped every 2 hours and verify daily fails since 6 weeks now. How do I get to a clean backup?

Ho do I force to recreate a chunk?
How can I trigger a new clean backup?

aaron · May 3, 2023

Newer backups are verified without a problem right? Then those backups that contain the corrupted chunk will not be able to do a full restore anymore. Unless you really need some data that is only on these backups, I would remove them.

The other question though is, when looking through the log, why the chunk went missing

Code:

2023-04-22T09:10:46+02:00: can't verify chunk, load failed - store 'nas0', unable to load chunk 'd12bef61b509848a8f00774db4849e1610dcd965bf08c6eb886b258a9f4b5ef5' - No such file or directory (os error 2)

and why we see errors like these:

Code:

2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-21T01:19:30Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-21T01:19:30Z" for locking - ENOENT: No such file or directory
2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-20T22:26:05Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-20T22:26:05Z" for locking - ENOENT: No such file or directory

Are these directories gone or still present?

Files and directories should not get lost :-/

What kind of storage is it? Looks like an NFS share. But what kind of NAS, and what is the filesystem backing the share?

Is there any kind of script running that might interact with the files themselves? Or was the data moved at some point?

gramels · May 3, 2023

aaron said:
Newer backups are verified without a problem right? Then those backups that contain the corrupted chunk will not be able to do a full restore anymore. Unless you really need some data that is only on these backups, I would remove them.

They do not get verified, this is what confuses me.
I deleted all failed backups, but the new ones after continue to fail.

aaron said:

The other question though is, when looking through the log, why the chunk went missing

Code:

2023-04-22T09:10:46+02:00: can't verify chunk, load failed - store 'nas0', unable to load chunk 'd12bef61b509848a8f00774db4849e1610dcd965bf08c6eb886b258a9f4b5ef5' - No such file or directory (os error 2)

and why we see errors like these:

Code:

2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-21T01:19:30Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-21T01:19:30Z" for locking - ENOENT: No such file or directory
2023-04-22T09:17:18+02:00: SKIPPED: verify nas0:vm/113/2023-04-20T22:26:05Z - could not acquire snapshot lock: unable to open snapshot directory "/mnt/nas0-nfs-pbs/vm/113/2023-04-20T22:26:05Z" for locking - ENOENT: No such file or directory

Are these directories gone or still present?

Files and directories should not get lost :-/

Code:

root@pbs1:/mnt/nas0-nfs-pbs/vm/113# cd 2023-04-20T22:26:05Z
-bash: cd: 2023-04-20T22:26:05Z: No such file or directory
root@pbs1:/mnt/nas0-nfs-pbs/vm/113# cd 2023-04-21T01:19:30Z
-bash: cd: 2023-04-21T01:19:30Z: No such file or directory
root@pbs1:/mnt/nas0-nfs-pbs/vm/113#

aaron said:
What kind of storage is it? Looks like an NFS share. But what kind of NAS, and what is the filesystem backing the share?

Synology NAS brtf

aaron said:
Is there any kind of script running that might interact with the files themselves?

No.

aaron said:
Or was the data moved at some point?

The error started to appear after the storage capacity was full.
Might have happened due to an inappropriate cleanup.

The main question for me is how to get back to a verifiable backup, even without those lost chunks. The VM 113 is running fine. All backup verify since back then fail.

aaron · May 4, 2023

gramels said:
The error started to appear after the storage capacity was full.
Might have happened due to an inappropriate cleanup.

Okay, that could be an explanation.

With the chunk definitely missing and not coming back, the easiest way would probably be to remove these backups that show the failed verification. You can do so by clicking the red garbage can icon.

Once they are gone, observe the next verification. If it doesn't go through, please post the logs, so we can investigate further.

aaron · May 4, 2023

gramels said:
I tried to restart the guest to mark the hole disk dirty. But it continues to backup and the verify fails. It seems some chunks got lost by cleaning up the full filesystem.

One more thing here. Did you reboot from within the VM or was the VM completely powered off at some point?
You should be able to access the task log from the backup after the reboot. Does it mention that a new dirty-bitmap was created?

gramels · May 4, 2023

aaron said:
One more thing here. Did you reboot from within the VM or was the VM completely powered off at some point?

outside VM reboot

aaron · May 4, 2023

Okay, then it must have been a fresh dirty-bitmap and the chunk would have been sent to the PBS if it would still exist in that form.

gramels · May 4, 2023

aaron said:
With the chunk definitely missing and not coming back, the easiest way would probably be to remove these backups that show the failed verification. You can do so by clicking the red garbage can icon.

I did that already, but will try again.
Likely I missed the not yet verified ones.
Deleted all failed and not yet verified and report back.

Thanks for your help!

gramels · May 5, 2023

seems solved.
Verification passed on all new backups.

The issues (beside the inappropriate deletion of a chunk) was

Unverified backups had not been deleted, only failed once. I assume that those backups still referenced the lost chunk.
Once I deleted all failed and unverified backups, a propper backup had been recreated.

Open question remains:

Why does PBS not recreate the lost junk?

@aaron Thanks for your help!

Dunuin · May 5, 2023

gramels said:
Why does PBS not recreate the lost junk?

Dunuin said:
PBS can't fix them. In case you got a filesystem that can heal itself, like for example ZFS, you could tell the filesystem to fix that corrupted data. If not you can try to backup those VMs again and PBS will try to replace the damaged chunks with healthy ones. This of cause will only work if those chunks still exist.

If those data doesn't exist anymore on your PVEs virtual disk, there is nothing it could recreate the chunks from.

aaron · May 5, 2023

Adding to the explanation: if the datastore ran out of space… well, the chunk might not have been created at all and in the meantime the data on the VM changed, so a new backup created a different chunk.

mow · May 23, 2023

aaron said:
Adding to the explanation: if the datastore ran out of space… well, the chunk might not have been created at all and in the meantime the data on the VM changed, so a new backup created a different chunk.

But then, why did new backups fail to verify? Why would they reference a block that isn't needed anymore?

Search

Search

[SOLVED] Verification failed on these snapshots/groups:

gramels

Member

Attachments

Dunuin

Distinguished Member

gramels

Member

aaron

Proxmox Staff Member

gramels

Member

aaron

Proxmox Staff Member

gramels

Member

aaron

Proxmox Staff Member

aaron

Proxmox Staff Member

gramels

Member

aaron

Proxmox Staff Member

gramels

Member

gramels

Member

Dunuin

Distinguished Member

aaron

Proxmox Staff Member

mow

Active Member

We value your privacy