PBS cannot restore large disk from S3

Hashfastr · Friday at 06:09

Hello, been struggling with this recently and seeing that S3 support is new I suppose it's time to make a post.

I have PBS 4.1.2 running as a VM on, and supporting, a PVE 8.4.16 installation. PVE has 32c/64t with 768GB RAM. PBS has 32c with 32GB RAM, along with a 8TB. S3 storage is on Backblaze B2. I am trying to restore a ~4TB VM disk from S3 to a ZFS disk roughly 142TB in size. ZFS has a 512GB NVMe dedicated and 128GB RAM for L1 cache. Because of ZFS large backups, like from NFS, will cause IO wait issues and lock the server up if left to just pull full bandwidth.

Originally PBS had 8GB and 4c, along with only a 2TB cache disk for S3. During this time I could backup small (and large) LXC containers without issue, and could create the backup of my VM in S3 of ~4TB. Restoring LXC containers (16GB at most each) worked just fine. When restoring this large VM however would always finish the boot disk (128GB) fairly easily, but would always fail about 12% (10 minutes) into restoring the 4TB data disk. Changing download rate here does nothing, still would end in roughly the same spot.

From PVE

Code:

restore failed: error reading a body from connection

From PBS:

Code:

2026-02-12T16:09:41-07:00: found empty chunk 'c3fac2e62cc0744fad01d9209c1c7f0f28a56f3ccb68cf891c894b5851b4e8f0' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: found empty chunk '0bf6e34979e63902185d54db944b76990c085ae2b8307ef7d1c369e4f6e093e4' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: GET /chunk
2026-02-12T16:09:41-07:00: GET /chunk
2026-02-12T16:09:41-07:00: GET /chunk: 400 Bad Request: error reading a body from connection
2026-02-12T16:09:41-07:00: reader finished successfully

With this config it would sometimes completely saturate IO, to alleviate this I added more CPU and RAM to great success. Also increased the cache disk after failing to restore the individual disk from CLI. Increasing verification workers/readers also seemed to help, but now it seems like I'm just throwing resources to somehow race the problem. The last run I got 46% (18.8hrs) done before getting the same error, drastic improvement.

Regardless of run logs like this would appear:

Code:

2026-02-12T16:09:39-07:00: GET /chunk
2026-02-12T16:09:39-07:00: GET /chunk
2026-02-12T16:09:40-07:00: found empty chunk '869b1ea0d0f801de6891c27a05a8b8b14879763d24ccedc5ad4583288bae0350' in store Backblaze, overwriting
2026-02-12T16:09:40-07:00: GET /chunk
2026-02-12T16:09:40-07:00: found empty chunk 'cbf7b9f741452ea6c4e696ef4cf5949cb99aa88107a8719a43714655996b5de0' in store Backblaze, overwriting
2026-02-12T16:09:40-07:00: GET /chunk
2026-02-12T16:09:40-07:00: found empty chunk 'e4a3d7bf3ac09cf89664cb85698f72ab97cd937fc9ec2a9372dac5159fd149e5' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: found empty chunk 'b1cb6388393cab95dacb07a6e0c79c5b0adf2545a183e585439434f369815255' in store Backblaze, overwriting
2026-02-12T16:09:41-07:00: GET /chunk

I'm unsure if this is nonsense, but it still generates warnings. Attached are the logs from PVE on this cited backup, along with modified logs for the PBS task on downloading that disk. Original PBS log file was too large (68MB) so I just gave a file with the first 1000 and last 1000 logs.

Chris · Friday at 10:09

Hi,

Hashfastr said:
2026-02-12T16:09:41-07:00: GET /chunk: 400 Bad Request: error reading a body from connection

this would indicate that the connection from Backblaze got closed while the PBS acting as S3 client was trying to fetch the contents. Might be that the API is overloaded with the requests, dropping some connections. Do you see any further error messages in the systemd journal of the PBS with respect to this?

If your PBS storage is enough, you might want to try and pull the backup snapshot to the PBS local storage first, only then restoring it to PVE from the PBS local datastore.

Hashfastr said:
Changing download rate here does nothing, still would end in roughly the same spot.

Where did you set the rate limit? On the PVE restore job or on the S3 endpoint? Please try to set the rate limit on the S3 endpoint config to limit the download bandwidth from the S3 API.

Hashfastr said:
2026-02-12T16:09:41-07:00: found empty chunk 'b1cb6388393cab95dacb07a6e0c79c5b0adf2545a183e585439434f369815255' in store Backblaze, overwriting

These are benign warnings in case of datastores backed by S3. The local datastore cache truncates local chunks to size 0 in case they get evicted, on re-insertion (which happens during your download), the chunk is re-inserted, the empty chunk marker file overwritten by the full chunk again. Will see if it makes sense to silence these for S3 stores to avoid log flodding.

Hashfastr · Saturday at 05:42

There is nothing in abnormal in systemd logs during the timeframe unfortunately.

Chris said:
If your PBS storage is enough, you might want to try and pull the backup snapshot to the PBS local storage first, only then restoring it to PVE from the PBS local datastore.

I did try this using the PBS tools from CLI and same behavior. Haven't tried with any traditional tools yet, but I will now.

Chris said:
Where did you set the rate limit?

PVE restore job, it did noticeably change the backup speed, but I'll try from the S3 config as well.

Chris said:
These are benign warnings in case of datastores backed by S3.

Figured as much but when looking for something in the logs, when you're out of ideas, it can be worry some

crazywolf13 · Saturday at 17:51

Hi just checking in here, if I have a VM that is like 300GB in size and my local Cache is just 256GB ( a dedicated Disk) will I still be able to restore that 300GB VM or is this a limitation?

Hashfastr · Sunday at 19:33

Success! It took 32 hours and seemed to have quick progress at the start but then tapered off to very slow afterwards. This is with setting the S3 endpoint to be 50MB/s instead of uncapped.

crazywolf13 said:
Hi just checking in here, if I have a VM that is like 300GB in size and my local Cache is just 256GB ( a dedicated Disk) will I still be able to restore that 300GB VM or is this a limitation?

I don't think so, my cache disk never exceeded 1.5TB on the 4TB restore

Chris · 2026-02-16T10:01:45+0100

crazywolf13 said:
Hi just checking in here, if I have a VM that is like 300GB in size and my local Cache is just 256GB ( a dedicated Disk) will I still be able to restore that 300GB VM or is this a limitation?

If the VM is larger than the cache, the restore will of course still work. The local storage is only used as cache to keep track of already known chunks to avoid re-upload to the S3 backend, store namespace/group/snapshot metadata for fast access and listing without the need for S3 api calls and to keep least recently used chunks, to avoid download bandwidth. For the cache to work as expected, it needs to at least fit the metadata for snapshots ecc. and inodes for keeping track of the chunks. The available cache slots for the least recently used chunks is then calculated based on the available unused storage space.

crazywolf13 · 2026-02-16T10:03:38+0100

Thanks for the clarification!

So it's far working quite well

Chris · 2026-02-16T10:04:43+0100

Hashfastr said:
Success! It took 32 hours and seemed to have quick progress at the start but then tapered off to very slow afterwards. This is with setting the S3 endpoint to be 50MB/s instead of uncapped.

I don't think so, my cache disk never exceeded 1.5TB on the 4TB restore

Glad to hear that it worked, will see on how we can improve error handling and also add retry logic in case the server does not return the expected response body on get requests.

Search

Search

PBS cannot restore large disk from S3

Hashfastr

New Member

Attachments

Chris

Proxmox Staff Member

Hashfastr

New Member

crazywolf13

Member

Hashfastr

New Member

Chris

Proxmox Staff Member

crazywolf13

Member

Chris

Proxmox Staff Member

We value your privacy