Scheduled backups intermittently fail, manual run works: could not activate storage, 500 can't connect

realyogensha

New Member
Nov 15, 2024
2
0
1
I have a 3 node cluster configured to back up to Proxmox Backup Server. When I manually run the backup by clicking "Run now" in the web UI Datacenter/Backup configuration page, a backup task is launched for all 3 hosts, and all 3 complete consistently without issue. However, when the backup runs automatically according to the configured schedule, one or more of the backup tasks fails with the error:

Code:
TASK ERROR: could not activate storage 'backup': backup: error fetching datastores - 500 Can't connect to pbs.local.domain:8007

I can browse the datastore from all 3 hosts without issue and they are all able to connect from the CLI:

Code:
root@pvehost:~# proxmox-backup-client version --repository pbs.local.domain:backup
Password for "root@pam": *******************
client version: 3.2.8
server version: 3.2.2

All 3 PVE nodes and the PBS host have the latest updates.

The error is pretty vague. Is there a way to enable more verbose logging? Any other suggestions?
 
I did a packet capture from the PBS host when a failure occurred and confirmed that the PVE host was actually able to connect. It appears PBS stops responding during TLS negotiation and the connection times out. This sorta points to a load problem, so instead of a single backup task for all PVE hosts, I created separate backup tasks for each PVE host. This seems more reliable, but it has some downsides.

The PBS host specs are as recommended, and I wouldn't expect 3 hosts with only 9 VM/CTs to overwhelm PBS so thoroughly that it can't even fire up TLS.

Is there some setting that may be causing this? Any way to enable more verbose logging?