S3: Restore failing: "failed to extract file: failed to copy file contents: unexpected status code 504 Gateway Timeout"

dekiesel

Member
Apr 30, 2023
88
12
13
Hi,

I am moving all my services from one node to another using "backup and restore".
I have moved 14 LXC successfully so far, but one of them just doesn't want to restore on the other node.

PVE log:
Code:
()
recovering backed-up configuration from 'PBS-Hetzner:backup/ct/123/2026-02-19T07:49:01Z'
Using encryption key from file descriptor..   
Fingerprint: XX
restoring 'PBS-Hetzner:backup/ct/123/2026-02-19T07:49:01Z' now..
Using encryption key from file descriptor..   
Fingerprint: XX
Error: error extracting archive - encountered unexpected error during extraction: error at entry "4_5866121381472637091.mp4": failed to extract file: failed to copy file contents: unexpected status code 504 Gateway Timeout
TASK ERROR: unable to restore CT 123 - command 'lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client restore '--crypt-mode=encrypt' '--keyfd=14' ct/123/2026-02-19T07:49:01Z root.pxar /var/lib/lxc/123/rootfs --allow-existing-dirs --repository backupuser@pbs@proxmox-backup-server.lan:hetznerbucket' failed: exit code 255

PBS log (full log attached):

Code:
[...]
2026-02-19T10:31:01+00:00: found empty chunk '3eb79034b6adc78bb81768ab8baab97222263eae23ac458a45abcadd46c5306c' in store hetznerbucket, overwriting
2026-02-19T10:31:01+00:00: GET /chunk
2026-02-19T10:32:01+00:00: GET /chunk: 400 Bad Request: error reading a body from connection
2026-02-19T10:32:01+00:00: reader finished successfully
2026-02-19T10:32:01+00:00: TASK WARNINGS: 1432

Here is the LXC conf from the original node:
Code:
root@pve:/etc/pve/lxc# cat 123.conf
## Alpine-Docker LXC
#  ### https%3A//tteck.github.io/Proxmox/
#  <a href='https%3A//ko-fi.com/D1D7EP4GF'><img src='https%3A//img.shields.io/badge/%E2%98%95-Buy me a coffee-red' /></a>
#mp1%3A /dev/data/thinvoltest,mp=/media/data
arch: amd64
cores: 1
features: keyctl=1,nesting=1
hostname: sambaserver
memory: 512
mp2: wdreddata:vm-123-disk-0,mp=/media/storage,backup=1,mountoptions=noatime,size=500G
net1: name=eth101,bridge=vmbr0,firewall=1,hwaddr=2E:C2:FC:5E:CB:09,ip=dhcp,tag=101,type=veth
onboot: 1
ostype: alpine
rootfs: local-lvm:vm-123-disk-0,size=3G
startup: order=5,up=15
swap: 0
tags: prod
unprivileged: 1

Does anybody have an idea why this keeps failing?
Is there a timeout value I could experiment with?

Thanks!
 

Attachments

Just got a new error during a retry:

Code:
2026-02-19T13:26:29+00:00: found empty chunk 'b512f2323878ba3883d3b691a938f912eb5ac7a2f9dd299adfe8e5b3ca37b393' in store hetznerbucket, overwriting
2026-02-19T13:26:29+00:00: GET /chunk
2026-02-19T13:27:29+00:00: <?xml version="1.0" encoding="UTF-8"?>
<Error>
    <Code>GatewayTimeout</Code>
    <Message>The server did not respond in time.</Message>
    <RequestId>N/A</RequestId>
    <HostId>N/A</HostId>
</Error>
2026-02-19T13:27:29+00:00: GET /chunk: 400 Bad Request: unexpected status code 504 Gateway Timeout
2026-02-19T13:27:29+00:00: reader finished successfully
2026-02-19T13:27:29+00:00: TASK WARNINGS: 61

Seems like increasing the timeout somehow would be the way to go?
 
The 504 errors are typically encountered if the provider API is overwhelmed with the requests (see also https://forum.proxmox.com/threads/f...ected-status-code-504-gateway-timeout.176833/)

You can set a bandwidth rate limit (both for up- and download) on the S3 endpoint configuration.
I've already set a 10 mib/s limit, but let me retry with 5.
Maybe I shouldn't have chosen hetzner... Are there any european s3 providers the proxmox team recommends?

/edit: I am getting the same error at 5mib/s

/edit: same at 1mib/s
 
Last edited:
I've already set a 10 mib/s limit, but let me retry with 5.
Maybe I shouldn't have chosen hetzner... Are there any european s3 providers the proxmox team recommends?
I politely decline to give any recommendations.
/edit: I am getting the same error at 5mib/s

/edit: same at 1mib/s
Okay, so also the request rate as seen from your initially posted restore task log does not seem that excessive. Maybe you could contact Hetzner support directly on why these request cannot be processed. Maybe there is a more stringent request rate limit at play?
 

People here are having similar issues (as you probably remember) and somebody contacted Hetzner, but they only gave a standard reply citing their limits:

https://docs.hetzner.com/de/storage/object-storage/overview/#limits
 
  • Like
Reactions: Johannes S

People here are having similar issues (as you probably remember) and somebody contacted Hetzner, but they only gave a standard reply citing their limits:

https://docs.hetzner.com/de/storage/object-storage/overview/#limits
Yes, but from the logs you provided it would not seem that any of these limits are being reached. Do you have any other concurrent S3 operations ongoing, using the same bucket?
 
Unfortunately not. The node isn't busy and neither is the container pbs is running in.
It's weird, I moved 14 other containers without issue, but this one, the biggest by far, is giving me a headache.

If there are any logs that could help debug this please just let me know. It'll take some time though, I am trying out a different provider.
 
If there are any logs that could help debug this please just let me know. It'll take some time though, I am trying out a different provider.
The issue here is that the API responds with unexpected 504 status codes or drops the connection before sending the expected data, but to get insight why this would need some insight on the S3 API hosting side. While we are working on improving the back-off and retry logic in the PBS client (e.g. [0]), this issue warrants some further investigation also on the provider side IMHO.

[0] https://lore.proxmox.com/pbs-devel/20260123145835.625914-1-c.ebner@proxmox.com/T/
 
Got this very helpful answer:


Code:
Sehr geehrter Kunde,

Wir möchten Ihnen mitteilen, dass die meisten S3-Buckets, die vor dem 22. Januar erstellt wurden, tatsächlich eine verminderte Leistung aufweisen.
Wir sind uns bewusst, dass dies nicht immer möglich ist, aber nur neue Buckets, die am NBG-Standort erstellt wurden, weisen eine normale Leistung auf.

Neu erstellte Buckets im NBG werden besser funktionieren, da sie auf neu bereitgestellter, unabhängiger und isolierter Hardware bereitgestellt werden.
Ältere Buckets im NBG sowie an den Standorten FSN und HEL werden vorerst weiterhin Leistungseinbußen aufweisen. Dies ist auf die Benutzerlast zurückzuführen, die voraussichtlich nicht so schnell von selbst sinken wird.

Wir entschuldigen uns für die Unannehmlichkeiten.
https://status.hetzner.com/de/incident/ebd62173-d902-4e75-939a-265c0b3f1ddb


Mit freundlichen Grüßen,
Brian P

So: If your bucket is in another location than NBG: create a new bucket in nbg.
If your bucket is in NBG, but older than 22.6.2026: create a new bucket in nbg.
 
Last edited:
I'm having the same issue on Hetzner.

It also happens during backup and verify, it is not strictly related to restore.
It also happens with other softwares, it is not strictly related to Proxmox VE.

What other softwares do, is wait a brief moment and retry: it works.

Can we have a retry count/setting/mechanism in proxmox too? I think it would apply to a wide range of situations and is definitely an useful feature.
 
The issue here is that the API responds with unexpected 504 status codes or drops the connection before sending the expected data, but to get insight why this would need some insight on the S3 API hosting side. While we are working on improving the back-off and retry logic in the PBS client (e.g. [0]), this issue warrants some further investigation also on the provider side IMHO.

[0] https://lore.proxmox.com/pbs-devel/20260123145835.625914-1-c.ebner@proxmox.com/T/

I've had a look at the patches and i believe this is the solution.
It should not apply to just 500 and 503 but to any 5xx.

Can i test it somehow?