PBS Sync stalls after recent v4.1.0 upgrades

tkw

New Member
Sep 11, 2024
1
0
1
I've got PBS VMs on each of my two PVE hosts and the hosts backup to the other host's PBS VM as part of my home lab. Once a day, at different times, each PBS VM does a pull sync of the content from the other PVE host as an extra copy.

This worked fine for many months, with sync jobs running in a few minutes at most unless there were significant changes. However since upgrading both PBS installs from v3.4 to v4.1.0 as well as both hosts to v9.1.2 I've twice had a sync job not finish (events were a few days apart). One ran for 30 hours and the other for 4 hours until I noticed and cancelled the tasks. The hosts are connected with 10Gb fibre.

The sync task logs show that the sync seems to be running fine at first but then after a line referring to the next sync archive fidx nothing else happens. I see little disk I/O during this stuck time on the pulling PBS, although the IO delay stays higher than idle for the whole time until I cancel. On the sending side I see that after the stuck time it continues to log chunk downloads but often more than 10 minutes apart instead of many per second, until the task is cancelled.

In between these two events syncs have worked in a normal time frame. After each failure I immediately re-ran the sync task and it finished within a couple of minutes.

I've read the other posts about slowing syncs and stuck backups, but not sure if this is a different issue given not even finishing in over a day, or another symptom of the same root issue?