Hi,
I’m running into a rather strange and specific issue.
I have the latest version of PBS 4.0.11 installed on bare metal. It’s connected to an S3 endpoint as the main storage and an NFS share used as cache. The whole setup is configured correctly and works perfectly fine for single backups.
However, when I start running dozens of backups in parallel, after anywhere from a few minutes up to ~3 hours, the system essentially becomes unavailable. More precisely:
systemctl restart proxmox-backup-proxy.service
everything immediately goes back to normal, and backups work again — but only for a while, until the issue repeats.
All traffic runs over 10Gb NICs. I’m not sure where exactly to look for the root cause — whether it’s PBS itself, the proxy service, or something related to network/storage performance.
Any pointers on how to debug this would be greatly appreciated.
I’m running into a rather strange and specific issue.
I have the latest version of PBS 4.0.11 installed on bare metal. It’s connected to an S3 endpoint as the main storage and an NFS share used as cache. The whole setup is configured correctly and works perfectly fine for single backups.
However, when I start running dozens of backups in parallel, after anywhere from a few minutes up to ~3 hours, the system essentially becomes unavailable. More precisely:
- The PBS web interface on port 8007 stops responding.
- According to the logs, the proxmox-backup-proxy service is still running normally (no errors or crashes are shown).
- Backups stop progressing
systemctl restart proxmox-backup-proxy.service
everything immediately goes back to normal, and backups work again — but only for a while, until the issue repeats.
All traffic runs over 10Gb NICs. I’m not sure where exactly to look for the root cause — whether it’s PBS itself, the proxy service, or something related to network/storage performance.
Any pointers on how to debug this would be greatly appreciated.