PBS cannot restore large disk from S3

Hashfastr

New Member
Feb 13, 2026
3
1
3
Hello, been struggling with this recently and seeing that S3 support is new I suppose it's time to make a post.

I have PBS 4.1.2 running as a VM on, and supporting, a PVE 8.4.16 installation. PVE has 32c/64t with 768GB RAM. PBS has 32c with 32GB RAM, along with a 8TB. S3 storage is on Backblaze B2. I am trying to restore a ~4TB VM disk from S3 to a ZFS disk roughly 142TB in size. ZFS has a 512GB NVMe dedicated and 128GB RAM for L1 cache. Because of ZFS large backups, like from NFS, will cause IO wait issues and lock the server up if left to just pull full bandwidth.

Originally PBS had 8GB and 4c, along with only a 2TB cache disk for S3. During this time I could backup small (and large) LXC containers without issue, and could create the backup of my VM in S3 of ~4TB. Restoring LXC containers (16GB at most each) worked just fine. When restoring this large VM however would always finish the boot disk (128GB) fairly easily, but would always fail about 12% (10 minutes) into restoring the 4TB data disk. Changing download rate here does nothing, still would end in roughly the same spot.

From PVE
Code:
restore failed: error reading a body from connection

From PBS:
Code:
2026-02-12T16:09:41-07:00: found empty chunk 'c3fac2e62cc0744fad01d9209c1c7f0f28a56f3ccb68cf891c894b5851b4e8f0' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: found empty chunk '0bf6e34979e63902185d54db944b76990c085ae2b8307ef7d1c369e4f6e093e4' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: GET /chunk
2026-02-12T16:09:41-07:00: GET /chunk
2026-02-12T16:09:41-07:00: GET /chunk: 400 Bad Request: error reading a body from connection
2026-02-12T16:09:41-07:00: reader finished successfully

With this config it would sometimes completely saturate IO, to alleviate this I added more CPU and RAM to great success. Also increased the cache disk after failing to restore the individual disk from CLI. Increasing verification workers/readers also seemed to help, but now it seems like I'm just throwing resources to somehow race the problem. The last run I got 46% (18.8hrs) done before getting the same error, drastic improvement.

Regardless of run logs like this would appear:

Code:
2026-02-12T16:09:39-07:00: GET /chunk
2026-02-12T16:09:39-07:00: GET /chunk
2026-02-12T16:09:40-07:00: found empty chunk '869b1ea0d0f801de6891c27a05a8b8b14879763d24ccedc5ad4583288bae0350' in store Backblaze, overwriting
2026-02-12T16:09:40-07:00: GET /chunk
2026-02-12T16:09:40-07:00: found empty chunk 'cbf7b9f741452ea6c4e696ef4cf5949cb99aa88107a8719a43714655996b5de0' in store Backblaze, overwriting
2026-02-12T16:09:40-07:00: GET /chunk
2026-02-12T16:09:40-07:00: found empty chunk 'e4a3d7bf3ac09cf89664cb85698f72ab97cd937fc9ec2a9372dac5159fd149e5' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: found empty chunk 'b1cb6388393cab95dacb07a6e0c79c5b0adf2545a183e585439434f369815255' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: GET /chunk

I'm unsure if this is nonsense, but it still generates warnings. Attached are the logs from PVE on this cited backup, along with modified logs for the PBS task on downloading that disk. Original PBS log file was too large (68MB) so I just gave a file with the first 1000 and last 1000 logs.
 

Attachments

Last edited:
Hi,
2026-02-12T16:09:41-07:00: GET /chunk: 400 Bad Request: error reading a body from connection
this would indicate that the connection from Backblaze got closed while the PBS acting as S3 client was trying to fetch the contents. Might be that the API is overloaded with the requests, dropping some connections. Do you see any further error messages in the systemd journal of the PBS with respect to this?

If your PBS storage is enough, you might want to try and pull the backup snapshot to the PBS local storage first, only then restoring it to PVE from the PBS local datastore.

Changing download rate here does nothing, still would end in roughly the same spot.
Where did you set the rate limit? On the PVE restore job or on the S3 endpoint? Please try to set the rate limit on the S3 endpoint config to limit the download bandwidth from the S3 API.

2026-02-12T16:09:41-07:00: found empty chunk 'b1cb6388393cab95dacb07a6e0c79c5b0adf2545a183e585439434f369815255' in store Backblaze, overwriting
These are benign warnings in case of datastores backed by S3. The local datastore cache truncates local chunks to size 0 in case they get evicted, on re-insertion (which happens during your download), the chunk is re-inserted, the empty chunk marker file overwritten by the full chunk again. Will see if it makes sense to silence these for S3 stores to avoid log flodding.
 
There is nothing in abnormal in systemd logs during the timeframe unfortunately.
If your PBS storage is enough, you might want to try and pull the backup snapshot to the PBS local storage first, only then restoring it to PVE from the PBS local datastore.
I did try this using the PBS tools from CLI and same behavior. Haven't tried with any traditional tools yet, but I will now.

Where did you set the rate limit?
PVE restore job, it did noticeably change the backup speed, but I'll try from the S3 config as well.

These are benign warnings in case of datastores backed by S3.
Figured as much but when looking for something in the logs, when you're out of ideas, it can be worry some
 
Hi just checking in here, if I have a VM that is like 300GB in size and my local Cache is just 256GB ( a dedicated Disk) will I still be able to restore that 300GB VM or is this a limitation?
 
Success! It took 32 hours and seemed to have quick progress at the start but then tapered off to very slow afterwards. This is with setting the S3 endpoint to be 50MB/s instead of uncapped.

Hi just checking in here, if I have a VM that is like 300GB in size and my local Cache is just 256GB ( a dedicated Disk) will I still be able to restore that 300GB VM or is this a limitation?
I don't think so, my cache disk never exceeded 1.5TB on the 4TB restore
 
  • Like
Reactions: crazywolf13
Hi just checking in here, if I have a VM that is like 300GB in size and my local Cache is just 256GB ( a dedicated Disk) will I still be able to restore that 300GB VM or is this a limitation?
If the VM is larger than the cache, the restore will of course still work. The local storage is only used as cache to keep track of already known chunks to avoid re-upload to the S3 backend, store namespace/group/snapshot metadata for fast access and listing without the need for S3 api calls and to keep least recently used chunks, to avoid download bandwidth. For the cache to work as expected, it needs to at least fit the metadata for snapshots ecc. and inodes for keeping track of the chunks. The available cache slots for the least recently used chunks is then calculated based on the available unused storage space.
 
Success! It took 32 hours and seemed to have quick progress at the start but then tapered off to very slow afterwards. This is with setting the S3 endpoint to be 50MB/s instead of uncapped.


I don't think so, my cache disk never exceeded 1.5TB on the 4TB restore
Glad to hear that it worked, will see on how we can improve error handling and also add retry logic in case the server does not return the expected response body on get requests.