S3 Buckets constantly failing verification jobs

TrustyHippo · Aug 16, 2025

I have configured BackBlaze S3, and backups seem to run just fine. However, every day I get failed verifications. I have two local NFS/NAS datastores, and I've never seen a failed verification. I know S3 support is not GA, but this is a bit worrisome.

jiriteach · Aug 17, 2025

Im seeing this as well. Lots of failures of verification jobs.

Chris · Aug 18, 2025

Can you please provide the verfication task log from the failed jobs as well as the sytemd journal from the timespan of the verification job? Also, check if this is related to the issue reported here https://bugzilla.proxmox.com/show_bug.cgi?id=6665

jiriteach · Aug 19, 2025

Chris said:
Can you please provide the verfication task log from the failed jobs as well as the sytemd journal from the timespan of the verification job? Also, check if this is related to the issue reported here https://bugzilla.proxmox.com/show_bug.cgi?id=6665

Thanks. Here is a verification job which failed earlier. Manually re-run after failure and it worked again.

Using Minio s3 object store which is local on my network. Backups working fine - no issue from multiple nodes. Verfication jobs - sometimes fail and after re-run's pass. I was experiencing the same error when using an AWS s3 bucket so tried to use something internally to test if it could be network related but same issue.

From the task log -

Code:

()
2025-08-19T08:21:27+12:00: verify datastore pbs-s3
2025-08-19T08:21:27+12:00: found 2 groups
2025-08-19T08:21:27+12:00: verify group pbs-s3:vm/101 (1 snapshots)
2025-08-19T08:21:27+12:00: verify pbs-s3:vm/101/2025-08-18T13:30:02Z
2025-08-19T08:21:27+12:00:   check qemu-server.conf.blob
2025-08-19T08:21:27+12:00:   check drive-scsi0.img.fidx
2025-08-19T08:44:45+12:00: verify pbs-s3:vm/101/2025-08-18T13:30:02Z/drive-scsi0.img.fidx failed: error reading a body from connection
2025-08-19T08:44:45+12:00:   check drive-efidisk0.img.fidx
2025-08-19T08:44:46+12:00:   verified 0.02/0.52 MiB in 0.17 seconds, speed 0.12/3.00 MiB/s (0 errors)
2025-08-19T08:44:46+12:00: percentage done: 50.00% (1/2 groups)
2025-08-19T08:44:46+12:00: verify group pbs-s3:vm/105 (2 snapshots)
2025-08-19T08:44:46+12:00: verify pbs-s3:vm/105/2025-08-18T13:56:17Z
2025-08-19T08:44:46+12:00:   check qemu-server.conf.blob
2025-08-19T08:44:46+12:00:   check drive-scsi0.img.fidx
2025-08-19T08:53:53+12:00:   verified 3999.12/11292.00 MiB in 546.80 seconds, speed 7.31/20.65 MiB/s (0 errors)
2025-08-19T08:53:53+12:00: percentage done: 75.00% (1/2 groups, 1/2 snapshots in group #2)
2025-08-19T08:53:53+12:00: verify pbs-s3:vm/105/2025-08-18T10:36:24Z
2025-08-19T08:53:53+12:00:   check qemu-server.conf.blob
2025-08-19T08:53:53+12:00:   check drive-scsi0.img.fidx
2025-08-19T08:54:14+12:00:   verified 137.71/832.00 MiB in 21.11 seconds, speed 6.52/39.41 MiB/s (0 errors)
2025-08-19T08:54:15+12:00: percentage done: 100.00% (2/2 groups)
2025-08-19T08:54:15+12:00: Failed to verify the following snapshots/groups:
2025-08-19T08:54:15+12:00:     vm/101/2025-08-18T13:30:02Z
2025-08-19T08:54:15+12:00: TASK ERROR: verification failed - please check the log for details

journalctl doesn't show much else apart from -

Code:

Aug 19 08:38:16 pbs proxmox-backup-proxy[745]: rrd journal successfully committed (36 files in 0.016 seconds)
Aug 19 08:54:15 pbs proxmox-backup-proxy[745]: TASK ERROR: verification failed - please check the log for details
Aug 19 09:08:16 pbs proxmox-backup-proxy[745]: rrd journal successfully committed (36 files in 0.016 seconds)

Chris · Aug 19, 2025

jiriteach said:
2025-08-19T08:44:45+12:00: verify pbs-s3:vm/101/2025-08-18T13:30:02Z/drive-scsi0.img.fidx failed: error reading a body from connection

But this seems to be network related...? After all the error states that the client was not able to read the response. Does this always occur for the same snapshot and how reproducible is this?

jiriteach · Aug 19, 2025

Chris said:
But this seems to be network related...? After all the error states that the client was not able to read the response. Does this always occur for the same snapshot and how reproducible is this?

I've tried AWS s3 and Cloudflare R2 with the closest regions so ~ 12msec latency. Backups are perfectly fine and fast with no problems.
I've tried local s3 via Minio - on the network so < 1msec latency.

Failures are random. Retries will work and at other times not. Have to retry again.
I have 2 backups in Cloudflare R2 at the moment which just wont verify. Tried 4 times. Others working fine.

Its certaintly reproducable and I think it seems to be related to larger images. I say this - verification on images with a local s3 via Minio - smaller seems to work. Larger seems to stop/timeout and eventually fail. Re-run sometimes works. The other day - I had to verify a backup 3 times to get it verified against my local s3 instance.

Locally theres no network issues. Those servers are under constant monitoring so would know if there were network drops.

jiriteach · Aug 19, 2025

Have 1 node which is failing every verification against Cloudflare R2 backups and retries don’t work either.

Backups sync quickly and it’s on a high speed fibre connection. All other operations work fine - prune, gc etc.

Only different is the cache disk is mounted via an external HDD. Wonder if the performance of this could be the cause. Not much info to go on apart from what I shared above since there’s not much in the logs.

Will try a new cache disk - ssd and see if that makes any difference.

Thanks

Chris · Aug 20, 2025

Please share the exact error messages you get when verification fails, as that will help to identify issues! Just stating that the verification failed does not give enough context on why it failed in the particular setup and configuration you tested.

jiriteach said:
Locally theres no network issues. Those servers are under constant monitoring so would know if there were network drops.

I rather suspect the data being transferred to max out your bandwidth, leading to partial transfers of chunk contents and/or DNS lookup failures (at least that is what I observed when investigating an unrelated issue, which however also is most likely caused by network congestion https://forum.proxmox.com/threads/pbs-4-0-wasabi-post-fixed_chunk-400-bad-request.169620/).

I'm therefore currently evaluating how to best handle this, probably by implementing a shared rate limit for the s3 client implementation in PBS.

jiriteach said:
Failures are random. Retries will work and at other times not. Have to retry again.

This also point towards the actual chunk data being valid, but the network congestion and currently not implemented retry logic in case of failed chunk fetching.

jiriteach said:
Will try a new cache disk - ssd and see if that makes any difference.

The cache is bypassed for verification jobs, only on failed verification the local cached chunk is invalidated also. So this will not change verification issues as reported.

What you could try instead of verification jobs, is to sync your snapshot from the s3 backed datastore to a local datastore. This will also check the chunk presence and validity. Maybe that can be used as workaround, but you might run into the same issues again.

TrustyHippo · Aug 21, 2025

I'm seeing:

Code:

2025-08-21T00:01:15-07:00: SKIPPED: verify Backblaze:ct/111/2025-08-18T14:29:19Z (recently verified)
2025-08-21T00:01:15-07:00: percentage done: 56.25% (9/16 groups)
2025-08-21T00:01:15-07:00: verify group Backblaze:ct/113 (5 snapshots)
2025-08-21T00:01:15-07:00: verify Backblaze:ct/113/2025-08-21T05:07:14Z
2025-08-21T00:01:15-07:00:   check pct.conf.blob
2025-08-21T00:01:15-07:00:   check root.pxar.didx
2025-08-21T00:04:14-07:00: "can't verify chunk, load failed - client error (SendRequest)"
2025-08-21T00:04:14-07:00: corrupted chunk renamed to "/mnt/backblaze/.chunks/f3ae/f3ae24280a491318b53cd0b43df10a57082a4bccf50d97e5bdfd9adc30d02046.0.bad"
2025-08-21T00:04:40-07:00:   verified 1768.08/3116.49 MiB in 204.99 seconds, speed 8.63/15.20 MiB/s (1 errors)
2025-08-21T00:04:40-07:00: verify Backblaze:ct/113/2025-08-21T05:07:14Z/root.pxar.didx failed: chunks could not be verified
2025-08-21T00:04:40-07:00:   check catalog.pcat1.didx
2025-08-21T00:04:41-07:00:   verified 0.24/0.68 MiB in 0.85 seconds, speed 0.29/0.80 MiB/s (0 errors)
2025-08-21T00:04:41-07:00: percentage done: 57.50% (9/16 groups, 1/5 snapshots in group #10)
2025-08-21T00:04:41-07:00: SKIPPED: verify Backblaze:ct/113/2025-08-20T05:06:56Z (recently verified)
2025-08-21T00:04:41-07:00: percentage done: 58.75% (9/16 groups, 2/5 snapshots in group #10)
2025-08-21T00:04:41-07:00: SKIPPED: verify Backblaze:ct/113/2025-08-19T13:56:33Z (recently verified)
2025-08-21T00:04:41-07:00: percentage done: 60.00% (9/16 groups, 3/5 snapshots in group #10)
2025-08-21T00:04:41-07:00: SKIPPED: verify Backblaze:ct/113/2025-08-19T11:06:19Z (recently verified)
2025-08-21T00:04:41-07:00: percentage done: 61.25% (9/16 groups, 4/5 snapshots in group #10)
2025-08-21T00:04:41-07:00: SKIPPED: verify Backblaze:ct/113/2025-08-18T14:45:01Z (recently verified)
2025-08-21T00:04:41-07:00: percentage done: 62.50% (10/16 groups)
2025-08-21T00:04:41-07:00: verify group Backblaze:ct/115 (5 snapshots)
2025-08-21T00:04:41-07:00: verify Backblaze:ct/115/2025-08-21T05:09:04Z
2025-08-21T00:04:41-07:00:   check pct.conf.blob
2025-08-21T00:04:42-07:00:   check root.pxar.didx
2025-08-21T00:04:57-07:00: "can't verify chunk, load failed - client error (SendRequest)"
2025-08-21T00:04:57-07:00: corrupted chunk renamed to "/mnt/backblaze/.chunks/1581/158149be41b8488bee40e202dc4e599b80333d76016b3c4484e47fdc5bd63e2b.0.bad"
2025-08-21T00:12:58-07:00:   verified 3859.46/8820.43 MiB in 496.54 seconds, speed 7.77/17.76 MiB/s (1 errors)
2025-08-21T00:12:58-07:00: verify Backblaze:ct/115/2025-08-21T05:09:04Z/root.pxar.didx failed: chunks could not be verified

TrustyHippo · Aug 21, 2025

Code:

Aug 21 00:00:39 pbs systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Aug 21 00:00:39 pbs systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Aug 21 00:00:39 pbs systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Aug 21 00:02:48 pbs systemd[1]: Starting logrotate.service - Rotate log files...
Aug 21 00:02:48 pbs systemd[1]: logrotate.service: Deactivated successfully.
Aug 21 00:02:48 pbs systemd[1]: Finished logrotate.service - Rotate log files.
Aug 21 00:03:30 pbs chronyd[747]: Selected source 216.240.36.24 (2.debian.pool.ntp.org)
Aug 21 00:04:14 pbs proxmox-backup-proxy[780]: can't verify chunk, load failed - client error (SendRequest)
Aug 21 00:04:57 pbs proxmox-backup-proxy[780]: can't verify chunk, load failed - client error (SendRequest)

Chris · Aug 21, 2025

TrustyHippo said:

Code:

Aug 21 00:00:39 pbs systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Aug 21 00:00:39 pbs systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Aug 21 00:00:39 pbs systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Aug 21 00:02:48 pbs systemd[1]: Starting logrotate.service - Rotate log files...
Aug 21 00:02:48 pbs systemd[1]: logrotate.service: Deactivated successfully.
Aug 21 00:02:48 pbs systemd[1]: Finished logrotate.service - Rotate log files.
Aug 21 00:03:30 pbs chronyd[747]: Selected source 216.240.36.24 (2.debian.pool.ntp.org)
Aug 21 00:04:14 pbs proxmox-backup-proxy[780]: can't verify chunk, load failed - client error (SendRequest)
Aug 21 00:04:57 pbs proxmox-backup-proxy[780]: can't verify chunk, load failed - client error (SendRequest)

Thanks, could you also provide the (estimated) bandwidth you have available for up and download via the s3 endpoint? Would help in order to better reproduce the issue.

TrustyHippo · Aug 21, 2025

Chris said:
Thanks, could you also provide the (estimated) bandwidth you have available for up and download via the s3 endpoint? Would help in order to better reproduce the issue.

1 Gbps symmetrical. During the backup job PBS can literally push 940 Mbps sustained for the entire 110GB backup size.

TrustyHippo · Aug 28, 2025

More Backblaze failures (basically daily):
2025-08-28T06:59:41-07:00: percentage done: 68.75% (11/16 groups)
2025-08-28T06:59:41-07:00: verify group Backblaze:vm/101 (12 snapshots)
2025-08-28T06:59:41-07:00: verify Backblaze:vm/101/2025-08-28T12:00:07Z
2025-08-28T06:59:41-07:00: check qemu-server.conf.blob
2025-08-28T06:59:41-07:00: check drive-tpmstate0-backup.img.fidx
2025-08-28T06:59:41-07:00: verified 0.01/4.00 MiB in 0.16 seconds, speed 0.04/24.57 MiB/s (0 errors)
2025-08-28T06:59:41-07:00: check drive-scsi0.img.fidx
2025-08-28T07:08:05-07:00: "can't verify missing chunk with digest 3cefbf7ffcce046be3506b67593d87d2c95f84c4f309d18e63cc798db0d6d5bb"
2025-08-28T07:08:05-07:00: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
<Code>InvalidRequest</Code>
<Message>no such file: 4_z4b1ba7b2820bbbd09f890a1a_f404bd9644e5df75e_d20250827_m140349_c004_v7007000_t0000_u01756303429853</Message>
</Error>

2025-08-28T07:08:05-07:00: failed to copy corrupt chunk on s3 backend: 3cefbf7ffcce046be3506b67593d87d2c95f84c4f309d18e63cc798db0d6d5bb
2025-08-28T07:30:23-07:00: verified 30545.96/31988.00 MiB in 1842.16 seconds, speed 16.58/17.36 MiB/s (1 errors)
2025-08-28T07:30:23-07:00: verify Backblaze:vm/101/2025-08-28T12:00:07Z/drive-scsi0.img.fidx failed: chunks could not be verified
2025-08-28T07:30:23-07:00: check drive-efidisk0.img.fidx

Chris · Aug 29, 2025

TrustyHippo said:
2025-08-28T07:08:05-07:00: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
<Code>InvalidRequest</Code>
<Message>no such file: 4_z4b1ba7b2820bbbd09f890a1a_f404bd9644e5df75e_d20250827_m140349_c004_v7007000_t0000_u01756303429853</Message>
</Error>

Thanks for the report, this does however look more like an issue on backblaze side or like the request got mangled during transit.

Nevertheless, there are now patches available on the developer mailing list to fix most of the verification related issues, see https://lore.proxmox.com/pbs-devel/20250825131007.626777-1-c.ebner@proxmox.com/T/

jiriteach · Aug 29, 2025

Thanks Chris. How does one download these patches?

Cheers

Chris · Aug 29, 2025

jiriteach said:
Thanks Chris. How does now download these patches?

These are neither applied nor packaged yet, still undergoing review process. So unless you do apply and build from source, you will have to be patient until they are accepted, packaged and released.

You can subscribe to the issue to get status notifications, see https://bugzilla.proxmox.com/show_bug.cgi?id=6665

Chris · Sep 16, 2025

proxmox-backup-server version 4.0.15-1 is available in the pbs-test repository. You might want to test if the included bugfixes solve your issue, thanks. To activate the test repo, please see https://pbs.proxmox.com/docs/installation.html#proxmox-backup-test-repository

A traffic rate limiter implementation for the s3 client is also in the working, but not yet included in this bugfix release, https://lore.proxmox.com/pbs-devel/20250916124147.513342-1-c.ebner@proxmox.com/T/#t

jiriteach · Sep 22, 2025

Chris said:
proxmox-backup-server version 4.0.15-1 is available in the pbs-test repository. You might want to test if the included bugfixes solve your issue, thanks. To activate the test repo, please see https://pbs.proxmox.com/docs/installation.html#proxmox-backup-test-repository

A traffic rate limiter implementation for the s3 client is also in the working, but not yet included in this bugfix release, https://lore.proxmox.com/pbs-devel/20250916124147.513342-1-c.ebner@proxmox.com/T/#t

Thanks for the work on this. Running this version and seems to be working well so far - against AWS S3 buckets.

glances · Sep 25, 2025

TrustyHippo said:
More Backblaze failures (basically daily):
2025-08-28T06:59:41-07:00: percentage done: 68.75% (11/16 groups)
2025-08-28T06:59:41-07:00: verify group Backblaze:vm/101 (12 snapshots)
2025-08-28T06:59:41-07:00: verify Backblaze:vm/101/2025-08-28T12:00:07Z
2025-08-28T06:59:41-07:00: check qemu-server.conf.blob
2025-08-28T06:59:41-07:00: check drive-tpmstate0-backup.img.fidx
2025-08-28T06:59:41-07:00: verified 0.01/4.00 MiB in 0.16 seconds, speed 0.04/24.57 MiB/s (0 errors)
2025-08-28T06:59:41-07:00: check drive-scsi0.img.fidx
2025-08-28T07:08:05-07:00: "can't verify missing chunk with digest 3cefbf7ffcce046be3506b67593d87d2c95f84c4f309d18e63cc798db0d6d5bb"
2025-08-28T07:08:05-07:00: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
<Code>InvalidRequest</Code>
<Message>no such file: 4_z4b1ba7b2820bbbd09f890a1a_f404bd9644e5df75e_d20250827_m140349_c004_v7007000_t0000_u01756303429853</Message>
</Error>

2025-08-28T07:08:05-07:00: failed to copy corrupt chunk on s3 backend: 3cefbf7ffcce046be3506b67593d87d2c95f84c4f309d18e63cc798db0d6d5bb
2025-08-28T07:30:23-07:00: verified 30545.96/31988.00 MiB in 1842.16 seconds, speed 16.58/17.36 MiB/s (1 errors)
2025-08-28T07:30:23-07:00: verify Backblaze:vm/101/2025-08-28T12:00:07Z/drive-scsi0.img.fidx failed: chunks could not be verified
2025-08-28T07:30:23-07:00: check drive-efidisk0.img.fidx

Are you still having these issues with Backblaze on 4.0.15-1? I am, so was curious if it's a provider specific thing.

Chris · Sep 25, 2025

glances said:
Are you still having these issues with Backblaze on 4.0.15-1? I am, so was curious if it's a provider specific thing.

But you are getting an error because the chunk is actually missing on the S3 object store, so this is not a bug but that snapshot being corrupt. Can you maybe see if that chunk was incorrectly flagged as corrupt by checking if a corresponding object with a .0.bad extension exists on your S3 object store to be located in <datastore-name>/.chunks/3cef/. If that is the case, try renaming that object by dropping the extension and do a re-verify of that snapshot.

S3 Buckets constantly failing verification jobs

Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

New Member

Proxmox Staff Member

Proxmox Staff Member

New Member

Member

Proxmox Staff Member

We value your privacy