Is this fix included in 4.0.16? Still having the same issue on this version with Backblaze, but I haven't deleted and '.bad' chunks yet or looked into that.But you are getting an error because the chunk is actually missing on the S3 object store, so this is not a bug but that snapshot being corrupt. Can you maybe see if that chunk was incorrectly flagged as corrupt by checking if a corresponding object with a.0.bad
extension exists on your S3 object store to be located in<datastore-name>/.chunks/3cef/
. If that is the case, try renaming that object by dropping the extension and do a re-verify of that snapshot.
The fix unfortunately did not make it into the package as expected, see https://bugzilla.proxmox.com/show_bug.cgi?id=6665#c4Is this fix included in 4.0.16? Still having the same issue on this version with Backblaze, but I haven't deleted and '.bad' chunks yet or looked into that.
.0.bad
extension via external tooling and then try the re-verify of snapshots currently marked as corrupt. If the chunks are truly bad, they will of course be renamed nevertheless.Is there any way to reset the state of verification jobs without having to manually delete the .bad chunks? My B2 bucket is so large that it's taking forever to list all of the chunk files.The fix unfortunately did not make it into the package as expected, see https://bugzilla.proxmox.com/show_bug.cgi?id=6665#c4
But independent from that, what exact errors do you get during verification? Please post the full verify task log.
Further, I do suggest to rename the chunks currently marked as bad by dropping the.0.bad
extension via external tooling and then try the re-verify of snapshots currently marked as corrupt. If the chunks are truly bad, they will of course be renamed nevertheless.
I switched to 4.0.15-1 from pbs-test and tried verifying a few random backups that have never had verification attempted before. Its still erroring almost immediately on all of them. See attached. This is Backblaze B2 bucket. Log attached.The fix unfortunately did not make it into the package as expected, see https://bugzilla.proxmox.com/show_bug.cgi?id=6665#c4
But independent from that, what exact errors do you get during verification? Please post the full verify task log.
Further, I do suggest to rename the chunks currently marked as bad by dropping the.0.bad
extension via external tooling and then try the re-verify of snapshots currently marked as corrupt. If the chunks are truly bad, they will of course be renamed nevertheless.
.0.bad
chunks you currently have. You must do this on both, the local datastore cache and the s3 bucket, dropping their .0.bad
filename extension. For example, from your log output we see a currently missing chunk 70b99ddaaca175fedde2c67c57be08f454ae4736969272697e05db44421aa033
. You should find a file/object with <your-base-path>/.chunks/70b9/70b99ddaaca175fedde2c67c57be08f454ae4736969272697e05db44421aa033.0.bad
in both, your bucket and local cache. Repeat that for all bad chunks.random
Thanks for the info. I understand that previously verified backups (verified on the bad version) will need to be fixed like this. But, I'm trying to verify random backups, ones that have **never** had a verify job run on them on before, and it still fails with these same missing chunk errors every time. I'm on 4.0.15-1 from pbs-test. Shouldn't new, never before verified backups work?But this is expected if the chunk is no longer present as indicated by your verify task log, e.g. because it has been incorrectly flagged as bad. Please note that chunks can be shared for consecutive backups, the backup client will reuse already known chunks from the last snapshot in the group if the snapshot was verified or the verify state is unknown. This fast incremental mode allows to only upload new data chunks. But it is not checked if that chunk is actually present, which would defy the purpose and speedup of this fast incremental method.
As attempt to recover from your current situation, I suggest you first set the datastore into maintenance mode offline and then rename all the.0.bad
chunks you currently have. You must do this on both, the local datastore cache and the s3 bucket, dropping their.0.bad
filename extension. For example, from your log output we see a currently missing chunk70b99ddaaca175fedde2c67c57be08f454ae4736969272697e05db44421aa033
. You should find a file/object with<your-base-path>/.chunks/70b9/70b99ddaaca175fedde2c67c57be08f454ae4736969272697e05db44421aa033.0.bad
in both, your bucket and local cache. Repeat that for all bad chunks.
This will make sure that chunks which got previously renamed incorrectly because of the bug to be present again. Once the renaming is done, you should clear the maintenance mode again and run a verification job for at least the last snapshot of each backup group. By this, either the snapshot will be verified or fail verification. If it failed verification, subsequent backup jobs will not reuse known chunks, but re-upload them. This could also heal some of the already present snapshots, if they referenced the same chunks.
If you do sync jobs, you can also run a pull sync job with the re-sync corrupt, which will re-sync also the snapshots currently marked as corrupt.
Last but not least, you might want to hold of until the packaged fix reaches your system, as any transient networking issue can lead to the chunks being incorrectly renamed.
Newly created snapshots might reuse pre-existing chunks from the previous backup snapshot in that group. So if that snapshot has not yet been verified at the time when the new backup snapshot was created, it can reindex chunks, which however have been moved by another unrelated verify job. You have to break the chain by verifying the last snapshot of a group. If the next backup sees this snapshot as verify failed, it will not reuse chunk, but rather re-upload them.Thanks for the info. I understand that previously verified backups (verified on the bad version) will need to be fixed like this. But, I'm trying to verify random backups, ones that have **never** had a verify job run on them on before, and it still fails with these same missing chunk errors every time. I'm on 4.0.15-1 from pbs-test. Shouldn't new, never before verified backups work?
I understand this, I'm trying backups from 'groups' that have never been verified. I haven't had a single verification job pass on PBS 4.0, ever. I don't run verify jobs on any schedule or cron, 99% of the VM 'groups' have never had verification attempted. I've attempted to verify at least 5 manually, all on groups that have never had a verify job run on any of the backups, and all still fail with these same errors (4.0.15-1).Newly created snapshots might reuse pre-existing chunks from the previous backup snapshot in that group. So if that snapshot has not yet been verified at the time when the new backup snapshot was created, it can reindex chunks, which however have been moved by another unrelated verify job. You have to break the chain by verifying the last snapshot of a group. If the next backup sees this snapshot as verify failed, it will not reuse chunk, but rather re-upload them.
stat <your-cache-base-path>/.chunks/70b9/70b99ddaaca175fedde2c67c57be08f454ae4736969272697e05db44421aa033.0.bad
and stat <your-cache-base-path>/.chunks/70b9/70b99ddaaca175fedde2c67c57be08f454ae4736969272697e05db44421aa033
<your-datastore-name>/.chunks/70b9/
, e.g. in the backblaze web interface.We use essential cookies to make this site work, and optional cookies to enhance your experience.