Hello,
I have 3 proxmox backup servers, one main and two remotes. the remotes share the same hardware and are configured the same.
One of the remotes has recently been moved off site to replace an existing remote PBS.
Prior to the move sync jobs where succeeding and the PBS was able to pull backups from the main PBS, both where in the same subnet.
The second remote still in the same subnet as the main syncs just fine.
Both remotes have been configured to limit the transfer-speed as they are planned to be behind relatively slow 200Mbit/s connections.
After its move the remote PBS is failing to pull backups from specific VMs.
Smaller containers and VMs sync fine but the larger ones are failing to sync.
The schedule has been staggered so both remotes do not end up pulling from the main PBS as the same time.
I also made sure all PBSes involved sync their time so their clocks match.
in the remote PBS the sync job log states the following (snippet not full log)
Note the 20 minute gap where seemingly nothing happens.
The corresponding task log on the main PBS notes the following.
It seems to be successfully transferring chunks until something happens and about 20 minutes later a time out.
Previously the remote PBS that was replaced was able to sync. I did open up the firewall to test but it still failed to sync those larger VMs.
As other backups are syncing I don't think there is a connectivity issue.
Both remote PBSes have the same transfer-rate limit so transfer speed inst the issue ether.
In order to eliminate the possibility of there being something funky with the particular chunks being synced I removed the backup that was failing to sync.
However it simply fails on the next backup for that same VM.
I have been unable to find a log with more details as to what exactly is going on.
Does the main stop responding or does the remote stop pulling? And of course, why?
Searching suggested making some changes to the kernel config to increase TCP timeouts. But that didn't solve the problem ether.
All PBSes involved are running version 4.1.1
Some one else seen this before?
Or perhaps some tips how to further troubleshoot backup sync issues?
I have 3 proxmox backup servers, one main and two remotes. the remotes share the same hardware and are configured the same.
One of the remotes has recently been moved off site to replace an existing remote PBS.
Prior to the move sync jobs where succeeding and the PBS was able to pull backups from the main PBS, both where in the same subnet.
The second remote still in the same subnet as the main syncs just fine.
Both remotes have been configured to limit the transfer-speed as they are planned to be behind relatively slow 200Mbit/s connections.
After its move the remote PBS is failing to pull backups from specific VMs.
Smaller containers and VMs sync fine but the larger ones are failing to sync.
The schedule has been staggered so both remotes do not end up pulling from the main PBS as the same time.
I also made sure all PBSes involved sync their time so their clocks match.
in the remote PBS the sync job log states the following (snippet not full log)
Code:
2026-01-02T20:33:57+01:00: percentage done: 47.37% (9/19 groups)
2026-01-02T20:33:58+01:00: skipped: 14 snapshot(s) (2025-08-31T16:00:01Z .. 2025-12-30T17:00:00Z) - older than the newest snapshot present on sync target
2026-01-02T20:33:58+01:00: re-sync snapshot vm/100/2025-12-31T17:00:02Z
2026-01-02T20:33:58+01:00: no data changes
2026-01-02T20:33:58+01:00: percentage done: 48.25% (9/19 groups, 1/6 snapshots in group #10)
2026-01-02T20:33:58+01:00: sync snapshot vm/100/2026-01-01T17:00:04Z
2026-01-02T20:33:58+01:00: sync archive qemu-server.conf.blob
2026-01-02T20:33:58+01:00: sync archive drive-scsi1.img.fidx
2026-01-02T20:54:37+01:00: removing backup snapshot "/mnt/datastore/Backups/vm/100/2026-01-01T17:00:04Z"
2026-01-02T20:54:37+01:00: percentage done: 49.12% (9/19 groups, 2/6 snapshots in group #10)
2026-01-02T20:54:37+01:00: sync group vm/100 failed - connection reset
The corresponding task log on the main PBS notes the following.
Code:
026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/ffb0/ffb0d0e6ac16f638b6fd11b182c67260517bf73e141c1a18458c053e81cf4859"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/2306/2306e23cf49e0481b9ccb8f1122715f2cbafb287eb93abab7223df7d36d937ae"
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/424a/424a350bac4c213f39dea17c6aa7d18daa56a032753de7d69867d85a67109674"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/0b7d/0b7d050a00034be1d10f75ff4612349b01d0e5a6ea8b78e4beef75a4e507b75e"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/5db7/5db79553825aafedca116fd1d84e1853bab60b04397cd9825dd4d52bcd0f806b"
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: GET /chunk
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/c0a8/c0a819cf27ca15dedc63165e64058d7f92d19cba2de00440fd2880564429d70d"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/968f/968f7c358ff85e3ff56bd8ed3068556f0ae6e278b340a0969fc822e10215a11f"
2026-01-02T20:36:13+01:00: download chunk "/mnt/datastore/Backup-Storage/.chunks/6234/62345bca737dd39ec5f48d7719d4f86d0393cda20d84dcdf1d7c78eb8efb6a85"
2026-01-02T20:52:51+01:00: TASK ERROR: connection error: timed out
Previously the remote PBS that was replaced was able to sync. I did open up the firewall to test but it still failed to sync those larger VMs.
As other backups are syncing I don't think there is a connectivity issue.
Both remote PBSes have the same transfer-rate limit so transfer speed inst the issue ether.
In order to eliminate the possibility of there being something funky with the particular chunks being synced I removed the backup that was failing to sync.
However it simply fails on the next backup for that same VM.
I have been unable to find a log with more details as to what exactly is going on.
Does the main stop responding or does the remote stop pulling? And of course, why?
Searching suggested making some changes to the kernel config to increase TCP timeouts. But that didn't solve the problem ether.
All PBSes involved are running version 4.1.1
Some one else seen this before?
Or perhaps some tips how to further troubleshoot backup sync issues?
Last edited: