sync job failing "connection reset"

R.Warps

New Member
Jan 2, 2026
5
0
1
Hello,
I have 3 proxmox backup servers, one main and two remotes. the remotes share the same hardware and are configured the same.
One of the remotes has recently been moved off site to replace an existing remote PBS.
Prior to the move sync jobs where succeeding and the PBS was able to pull backups from the main PBS, both where in the same subnet.
The second remote still in the same subnet as the main syncs just fine.

Both remotes have been configured to limit the transfer-speed as they are planned to be behind relatively slow 200Mbit/s connections.

After its move the remote PBS is failing to pull backups from specific VMs.
Smaller containers and VMs sync fine but the larger ones are failing to sync.
The schedule has been staggered so both remotes do not end up pulling from the main PBS as the same time.
I also made sure all PBSes involved sync their time so their clocks match.

in the remote PBS the sync job log states the following (snippet not full log)
Code:
2026-01-02T20:33:57+01:00: percentage done: 47.37% (9/19 groups)
2026-01-02T20:33:58+01:00: skipped: 14 snapshot(s) (2025-08-31T16:00:01Z .. 2025-12-30T17:00:00Z) - older than the newest snapshot present on sync target
2026-01-02T20:33:58+01:00: re-sync snapshot vm/100/2025-12-31T17:00:02Z
2026-01-02T20:33:58+01:00: no data changes
2026-01-02T20:33:58+01:00: percentage done: 48.25% (9/19 groups, 1/6 snapshots in group #10)
2026-01-02T20:33:58+01:00: sync snapshot vm/100/2026-01-01T17:00:04Z
2026-01-02T20:33:58+01:00: sync archive qemu-server.conf.blob
2026-01-02T20:33:58+01:00: sync archive drive-scsi1.img.fidx
2026-01-02T20:54:37+01:00: removing backup snapshot "/mnt/datastore/Backups/vm/100/2026-01-01T17:00:04Z"
2026-01-02T20:54:37+01:00: percentage done: 49.12% (9/19 groups, 2/6 snapshots in group #10)
2026-01-02T20:54:37+01:00: sync group vm/100 failed - connection reset
Note the 20 minute gap where seemingly nothing happens.

The corresponding task log on the main PBS notes the following.
Code:
026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/ffb0/ffb0d0e6ac16f638b6fd11b182c67260517bf73e141c1a18458c053e81cf4859"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/2306/2306e23cf49e0481b9ccb8f1122715f2cbafb287eb93abab7223df7d36d937ae"
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/424a/424a350bac4c213f39dea17c6aa7d18daa56a032753de7d69867d85a67109674"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/0b7d/0b7d050a00034be1d10f75ff4612349b01d0e5a6ea8b78e4beef75a4e507b75e"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/5db7/5db79553825aafedca116fd1d84e1853bab60b04397cd9825dd4d52bcd0f806b"
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/c0a8/c0a819cf27ca15dedc63165e64058d7f92d19cba2de00440fd2880564429d70d"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/968f/968f7c358ff85e3ff56bd8ed3068556f0ae6e278b340a0969fc822e10215a11f"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/6234/62345bca737dd39ec5f48d7719d4f86d0393cda20d84dcdf1d7c78eb8efb6a85"
2026-01-02T20:52:51+01:00: TASK ERROR: connection error: timed out
It seems to be successfully transferring chunks until something happens and about 20 minutes later a time out.

Previously the remote PBS that was replaced was able to sync. I did open up the firewall to test but it still failed to sync those larger VMs.
As other backups are syncing I don't think there is a connectivity issue.
Both remote PBSes have the same transfer-rate limit so transfer speed inst the issue ether.

In order to eliminate the possibility of there being something funky with the particular chunks being synced I removed the backup that was failing to sync.
However it simply fails on the next backup for that same VM.

I have been unable to find a log with more details as to what exactly is going on.
Does the main stop responding or does the remote stop pulling? And of course, why?

Searching suggested making some changes to the kernel config to increase TCP timeouts. But that didn't solve the problem ether.
All PBSes involved are running version 4.1.1


Some one else seen this before?
Or perhaps some tips how to further troubleshoot backup sync issues?
 
Last edited:
Well after seeing another error about checksums not matching and being unable to parse a blob I removed more backups and the sync problem seems to be resolved that way.
Of course what caused the mess to begin with? That is a question for another thread.
 
What removing the backups accomplished is that there where no new large backups to sync, so no errors.
The problem reoccurs for newly created backups.

What I did not originally notice is that the remote PBS that is syncing fine is running on proxmox 8.4.16 where the remote that is failing to sync is run on 9.1.4.
(PBS is run in a VM in PVE to allow for flexibility deploying remotely)
I am going to update the other PVE as well and see if that remote PBS then also starts failing to sync. It seems unlikely that the PVE could cause a timeout but so far I have no other leeds.
 
As I figured after the upgrade to PVE 9.1.4 the local pulling PBS still pulls backups just fine.
From the logs I understand that the the pulling PBS stops sending requests for chunks.
The number of "GET /chunk" and "download chunk" messages logged is the same.

how can I further trouble shoot the issue?
With this amount of data being sent packet capture is not exactly an option.