[SOLVED] S3 Hetzner failures with Gargbage collection

harunk · Wednesday at 08:27

Hello,

i have s3 datastore configured with Hetzner and the Garbage collection is failing consistently with the following error. The backups to s3 are running fine without any issues. Any ideas on what to check ?

Thanks!

Rich (BB code):

2026-05-13T02:30:20+02:00: marked 99% (755 of 762 index files)
2026-05-13T02:30:20+02:00: marked 100% (762 of 762 index files)
2026-05-13T02:30:20+02:00: Start GC phase2 (sweep unused chunks)
2026-05-13T02:33:41+02:00: queued notification (id=ea91cebe-b5d6-4515-8ea5-81e880da4bce)
2026-05-13T02:33:41+02:00: TASK ERROR: error reading a body from connection: unexpected EOF during chunk size line

Chris · Wednesday at 08:41

During phase 2 of garbage collection for datastores backed by S3, PBS iterates trough all the chunks present on the backend and deletes those which are no longer in use (detected during phase 1). The error would indicate that one of the response bodies for such a list object v2 S3 API is unexpectedly truncated.

How is the PBS host memory usage during garbage collection? Do you run into memory constrains? Is the PBS running on hardware or in a VM? If it is not memory constrained, most likely this is either a networking issue or an issue on the provider side.

harunk · Wednesday at 10:26

The PBS is a standalone machine and according to the statistics page (see image below), there are no memory constraints. The s3 garbage collect runs at 2:30 everyday and here is the system summary from the Server administration page. Both cpu and memory have enough headroom available.

Interestingly, if it's a provider issue then the tasks that run before and after (backup, pruning) should also have some issues right? but they don't exhibit the same issue as here.

Chris · Wednesday at 10:40

harunk said:
Interestingly, if it's a provider issue then the tasks that run before and after (backup, pruning) should also have some issues right?

Not necessarily: The GC perform a very specific API request with prefix and limit parameters during phase 2. Do you see any errors around the GC in the systemd journal? How is the network between you and the provider? Any proxies, VPN, ecc. which might interfere with the traffic?

Also, please post proxmox-backup-manager version --verbose

Edit: Also, please see if this applies to you: https://forum.proxmox.com/threads/f...s-code-504-gateway-timeout.176833/post-839329

harunk · Wednesday at 10:57

Thanks for the response.

There are no proxies and the server is whitelisted in firewall . The logs in systemd show only the following

May 13 02:33:41 pbs0 proxmox-backup-proxy[137566]: TASK ERROR: error reading a body from connection: unexpected EOF during chunk size line

May 13 02:33:44 pbs0 proxmox-backup-api[137538]: notified via target `mail-to-root`

proxmox-backup-manager version --verbose

Code:

proxmox-backup-manager version --verbose
proxmox-backup                      4.2.0         running kernel: 6.17.13-2-pve
proxmox-backup-server               4.2.0-1       running version: 4.2.0       
proxmox-kernel-helper               9.0.4                                     
proxmox-kernel-7.0                  7.0.2-2                                   
proxmox-kernel-7.0.2-2-pve-signed   7.0.2-2                                   
proxmox-kernel-7.0.0-3-pve-signed   7.0.0-3                                   
proxmox-kernel-6.17.13-7-pve-signed 6.17.13-7                                 
proxmox-kernel-6.17                 6.17.13-7                                 
proxmox-kernel-6.17.13-6-pve-signed 6.17.13-6                                 
proxmox-kernel-6.17.13-4-pve-signed 6.17.13-4                                 
proxmox-kernel-6.17.13-2-pve-signed 6.17.13-2                                 
proxmox-kernel-6.17.13-1-pve-signed 6.17.13-1                                 
proxmox-kernel-6.17.2-1-pve-signed  6.17.2-1                                   
ifupdown2                           3.3.0-1+pmx12                             
libjs-extjs                         7.0.0-5                                   
proxmox-backup-docs                 4.2.0-1                                   
proxmox-backup-client               4.2.0-1                                   
proxmox-mail-forward                1.0.3                                     
proxmox-mini-journalreader          1.6                                       
proxmox-offline-mirror-helper       0.7.3                                     
proxmox-widget-toolkit              5.1.9                                     
pve-xtermjs                         5.5.0-3                                   
smartmontools                       7.4-pve1                                   
zfsutils-linux                      2.4.1-pve1

Edit: Also, please see if this applies to you: https://forum.proxmox.com/threads/f...s-code-504-gateway-timeout.176833/post-839329

This bucket is new and was created i think only a couple of months back so don't think we are affected by this issue.

Chris · Wednesday at 17:28

Internal testing with a S3 bucket hosted at Hetzner showed no issues. Could you also share the network config? I would also suggest to reach out to Hetzner, they might have more insights if the connection is prematurely closed on there side.

harunk · 2026-05-15T08:45:33+0200

Update, I have changed the schedule from 2:30 to 7:00 and it seems to be running fine without any issues so far. Also i have created a support ticket with Hetzner to ask if they are doing any rate limiting during that time.

harunk · 2026-05-15T21:51:36+0200

I have heard back from hetzner and the reply was that the s3 bucket was having some 412 errors and they also mentioned that the workflow is `bursty` doing lot of requests within short span and the recommendation was to reduce the number of threads or parallel tasks

I check the documentation and under proxmox-backup-manager s3 endpoint update <id> [OPTIONS] there was as `--put-rate-limit ` option and i used it to update the config but i don't see it updated in the `s3.cfg` in `/etc/proxmox-backup` so i manually added the following line put-rate-limit 750

proxmox-backup-manager s3 endpoint update <my s3 id> --put-rate-limit 750

I guess this might not solve the issue since the rate limit applies to all API calls not just PUT. Maybe there is a global configuration available to limit the requests to specific value?

Chris · 2026-05-16T09:42:19+0200

harunk said:
I have heard back from hetzner and the reply was that the s3 bucket was having some 412 errors and they also mentioned that the workflow is `bursty` doing lot of requests within short span and the recommendation was to reduce the number of threads or parallel tasks

Thanks for sharing this feedback, exit code 412 would indicate precondition failed errors. This is however normal operation, as during backup chunk uploading the PBS S3 client performs conditional uploads, meaning a per-condition is added for the object to not already be present, since it can be skipped in that case. This is done to reduce upload traffic. Where you performing some concurrent backups which would explain this?

harunk said:
I check the documentation and under proxmox-backup-manager s3 endpoint update <id> [OPTIONS] there was as `--put-rate-limit ` option and i used it to update the config but i don't see it updated in the `s3.cfg` in `/etc/proxmox-backup` so i manually added the following line put-rate-limit 750

proxmox-backup-manager s3 endpoint update <my s3 id> --put-rate-limit 750

I guess this might not solve the issue since the rate limit applies to all API calls not just PUT. Maybe there is a global configuration available to limit the requests to specific value?

This by itself will have no effect for garbage collection, the request there are using GET and POST/DELETE methods. But rescheduling to a time where not much else is performing concurrent requests will help. Please also note 750 request/s is a lot, so this would probably not make a difference.

However, please open an enhancement request at https://bugzilla.proxmox.com for providing a mechanism to also request rate limit all other methods, not just PUT and an option to control how many in-flight request there are to one given S3 endpoint.

harunk · 2026-05-16T11:36:44+0200

Nope, there are no other concurrent backups/jobs.

This is a standalone server connected to Hetzner and interestingly during the times where the backup or garbage collection runs, there are no other requests/jobs running on Hetzner and i am as surprised since the timing was chosen specifically so that PBS can have dedicated access to s3/ hetzner.

I have not changed any of the backup options to speed up the process. (All are the defaults) . Also the backed up data per day is ~ 50 - 100 GB, so not that much traffic.

I have opened a enhancement request and you can close this ISSUE . I will reopen a new one in case the current operations also show failures in future. For now changing the time slot and adding the `put-rate-limit` (this seems to be not necessary as the requests / sec are much lower than the upper limit (~20/sec)) has resulted in no failures at least in the past 2 days.

Thanks for the quick response and feedback.

Search

Search

[SOLVED] S3 Hetzner failures with Gargbage collection

harunk

New Member

Chris

Proxmox Staff Member

harunk

New Member

Chris

Proxmox Staff Member

harunk

New Member

Chris

Proxmox Staff Member

harunk

New Member

harunk

New Member

Chris

Proxmox Staff Member

harunk

New Member

We value your privacy