Crashing API PBS 4.0.11

kitsune242

Member
Oct 27, 2022
5
1
6
Hi,
I’m running into a rather strange and specific issue.
I have the latest version of PBS 4.0.11 installed on bare metal. It’s connected to an S3 endpoint as the main storage and an NFS share used as cache. The whole setup is configured correctly and works perfectly fine for single backups.
However, when I start running dozens of backups in parallel, after anywhere from a few minutes up to ~3 hours, the system essentially becomes unavailable. More precisely:
  • The PBS web interface on port 8007 stops responding.
  • According to the logs, the proxmox-backup-proxy service is still running normally (no errors or crashes are shown).
  • Backups stop progressing
If I run:
systemctl restart proxmox-backup-proxy.service
everything immediately goes back to normal, and backups work again — but only for a while, until the issue repeats.
All traffic runs over 10Gb NICs. I’m not sure where exactly to look for the root cause — whether it’s PBS itself, the proxy service, or something related to network/storage performance.

Any pointers on how to debug this would be greatly appreciated.
 
What are the specs of your hardware? Have you monitored I/O pressure, memory & CPU usage, etc.?
 
As a coworker just pointed out to me (thanks Chris!), you might be running into a bug that was (very) recently fixed.

If possible, can you try activating the pbs-test repository and install proxmox-backup-server version 4.0.16-1 or greater, and then see if the issue persists?