Hi,
Shortly after PBS 4.0 came out, I configured a S3 backup datastore for about 600GB of data. I had verification, prune, and garbage collection jobs running daily after my backup completed. I got a surprise bill 5x my typical amount after configuring PBS to use my Backblaze S3 bucket. The verification jobs always ran for multiple hours each day, even with minimal backup change rate. As you can see in my bill, PBS downloaded 3.5TB of data over the month, even though I did zero restores. It also made 10 MILLION s3 get object requests. I also saw on my internet usage via my IPS and firewall that PBS was using massive amounts of bandwidth during those verification jobs.
This is expected, I'll try to explain what is going on here. The backup protocol is optimized to only upload the necessary data from the client to the server, and from the server to the S3 backend. The client (PVE) will use the fast incremental mode if possible to avoid re-reading and re-uploading unchanged data blocks (chunks). Only these which changed are uploaded and re-indexed, others re-indexed only. For security reasons, the client is only allowed to re-index and skip upload for chunks it has uploaded in the previous backup snapshot, it will re-upload others. The server then checks if that chunk is already known and present on its local datastore cache, avoiding re-upload when possible. This helps to reduce upload requests and bandwidth to the S3 API. Data consistency when uploading to the S3 backend is further assured by check-summing, so it can be assured the data the server has send is also persisted to the S3 object store.
If you now perform a verification of the backup snapshot, the index file referencing all the data chunks will be used to identify all objects which belong to this snapshot and fetch them from the object store, not just the newly uploaded. Once fetched, the chunk data is verified to assure that the data it stores is the data it must contain. This is analogous to what happens on regular datastores. So if you are verifying after each backup snapshot, you are effectively downloading all data chunks just as if you would do a full restore. There is room for improvements regarding the verification process performance and cost effectiveness with ideas floating around on how to do that, but not implemented yet.
For the time being a verification will always fetch the chunk objects from S3, unconditionally. So I would recommend to strongly reduce the frequency at which you do verifications, and to do verifications in bigger patches, not just single snapshots. This is since the verification job has an optimization which avoids re-download and re-verification of already verifyied chunks for the same verification job.
The PBS S3 has had consistent verification failures, although they seem much less often in the latest builds.
But these verification failures were not caused by the data being corrupt, but rather by a bug in the verification logic for s3 backed datastores, which has been fixed since, see
https://bugzilla.proxmox.com/show_bug.cgi?id=6665 and
https://forum.proxmox.com/threads/s3-buckets-constantly-failing-verification-jobs.169875/post-809182. So the verification jobs actually flagged perfectly valid chunks as corrupt, because of transient networking issues.
In general, the recommendation is to monitor your costs if you are using the datastore with S3 backend [0] and reduce the necessary operations to a minimum. Also, since costs are highly provider specific, it depends what operations can lead to what costs.
[0]
https://pbs.proxmox.com/docs/storage.html#datastores-with-s3-backend