PBS - verify state sometimes fails

Giovanne

New Member
Aug 23, 2022
4
0
1
Hello,

In my PBS , he do a verify in backups every day, and i've notice in a VM backup , the verify state it's All Ok , but in the next day has a lot fails , this happens a lot, what is funny because somedays i has fails and somedays i don't have, and it's in some VMs , is there a specific reason for this occur?

the error in task viewer log show is:

can't verify chunk, load failed - store 'PBS', unable to load chunk


Thanks
 
Hi Giovanne,

can you maybe post the entire task log? The error indicates that a chunk for the backup couldn't be loaded. If that occurs sometimes, the verify task would fail one day where it can't load a given chunk and succeeds another when it can.

Can you also post your datastore configuration? It should be located at `/etc/proxmox-backup/datastore.cfg`.
 
Hi Giovanne,

can you maybe post the entire task log? The error indicates that a chunk for the backup couldn't be loaded. If that occurs sometimes, the verify task would fail one day where it can't load a given chunk and succeeds another when it can.

Can you also post your datastore configuration? It should be located at `/etc/proxmox-backup/datastore.cfg`.
Sure, you mean like this ?

2022-09-12T14:39:45-03:00: can't verify chunk, load failed - store 'PBS', unable to load chunk 'af820b1adc0201eeee6b9f837b1d8eac085f7ad75ca8ded078c7bb392d8647f1' - No such file or directory (os error 2)
2022-09-12T14:39:45-03:00: verified 38.17/580.00 MiB in 1.95 seconds, speed 19.58/297.53 MiB/s (1 errors)
2022-09-12T14:39:45-03:00: verify PBS:vm/42XX/2022-08-31T03:31:26Z/drive-sata2.img.fidx failed: chunks could not be verified
2022-09-12T14:39:45-03:00: check drive-sata1.img.fidx
2022-09-12T14:39:45-03:00: verified 2.39/20.00 MiB in 0.09 seconds, speed 25.61/214.08 MiB/s (0 errors)
2022-09-12T14:39:45-03:00: check drive-sata0.img.fidx
2022-09-12T14:39:46-03:00: verified 19.97/212.00 MiB in 0.90 seconds, speed 22.12/234.83 MiB/s (0 errors)
2022-09-12T14:39:46-03:00: percentage done: 65.84% (47/72 groups, 13/32 snapshots in group #48)
2022-09-12T14:39:46-03:00: verify PBS:vm/42XX/2022-08-30T03:10:46Z
2022-09-12T14:39:46-03:00: check qemu-server.conf.blob
2022-09-12T14:39:46-03:00: check drive-sata2.img.fidx
2022-09-12T14:39:46-03:00: chunk af820b1adc0201eeee6b9f837b1d8eac085f7ad75ca8ded078c7bb392d8647f1 was marked as corrupt
2022-09-12T14:39:48-03:00: verified 26.11/556.00 MiB in 2.09 seconds, speed 12.47/265.57 MiB/s (1 errors)
2022-09-12T14:39:48-03:00: verify PBS:vm/42XX/2022-08-30T03:10:46Z/drive-sata2.img.fidx failed: chunks could not be verified


datastore.cfg:

datastore: PBS
comment Backup VMs
gc-schedule 21:00
notify-user root@pam
path /PBS

~
 
datastore.cfg:

datastore: PBS
comment Backup VMs
gc-schedule 21:00
notify-user root@pam
path /PBS

  • Is /PBS local or a network share mount point? If share, what protocol?
  • What filesystem is on there?
  • Is it on a raid? If so, which kind? HW? SW? If SW, which exactly?
  • What drives does it use? Exact model number(s) would be good.
  • Did you already run long SMART-tests on the disk(s)?
  • Maybe even a memtest.
 
  • Like
Reactions: sterzy
  • Is /PBS local or a network share mount point? If share, what protocol?
  • What filesystem is on there?
  • Is it on a raid? If so, which kind? HW? SW? If SW, which exactly?
  • What drives does it use? Exact model number(s) would be good.
  • Did you already run long SMART-tests on the disk(s)?
  • Maybe even a memtest.

It's Local PBS/

The filesystem it's ext4

there's no Raid

No , recently i don't the memtest or SMART-test , because i think this is something also happens, not everyday , i don't blame the disk because a month ago we change de Disks, you think anyways it's necessary?
 
No , recently i don't the memtest or SMART-test , because i think this is something also happens, not everyday , i don't blame the disk because a month ago we change de Disks, you think anyways it's necessary?

I personally "burn-in" and intensively (let) test every new HDD, before I take it into production. The fact that they are new, does not count much. New things can ever also be faulty or DoA. (Especially HDDs on the transportation, for example.)

But I would recommend at least regular SMART-tests (especially long ones) anyway.

And since your logs tell us about missing and corrupted data, it might be a really good time to run those long SMART-tests and most likely also a memtest. Should not hurt anyway (aside from downtime, at least for the memtest).

Did you have the same problem with the old drives too, before you exchanged them?

That are my thoughts; lets wait what @sterzy recommends. :)
 
I think your suggestions are pretty good @Neobin. SMART and memtest can't hurt. Although if you actually have fault RAM I would be surprised if "chunks only verifying sometimes" was your only issue.

It would also be interesting to know whether a Verify job takes longer than a day. Otherwise it could be that two Verify Jobs overlap, which in turn could cause some unexpected behavior.

You could also check whether your file system on the disk and repair it if necessary with e2fsck.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!