PBS pull sync job slows down about x20 after a while.

rahman · Dec 11, 2025

Hi,

We have 2 PBS servers on same LAN different buildings, connected with 10Gbit fiber. PVE servers backup directly to one PBS and other PBS pull/sync (last 2 snapshots) from this PBS. We have about 10 TiB backup data to sync to remote PBS. The problem is, when I start the sync job it transfers data with about 800 mbit/s to 1.4gbit/s speed rates. Which is OK for us for the initial sync. But after a few hours passes the sync job speed crawls down to 50 - 100 mbit/s. When I stop the sync job and immediately start it again I see speed is normal (800 - 1400 mbit/s) for a couple of hours but eventually it slows down again. This is happening for 3 days now. The job should finished in about full day but after 3 days it's still could not finished with 3-4 TiB data remaining.

So why it stuck in such a slow speed for hours until I intervene (stop/start)? Any idea?

See the attached screenshots:
1: start of sync job
2: when it slowed down and stay at that speed
3: I wonder why it id not finished jet and see its too slow for hours. Then I stop and start the job again
4: It slows down again and stays at that speed
5: I check the job status and stop/start it again.

The job is running at 800-1300 mbit/s for now.

Here is some info About PBS servers.
primary PBS server:
16 x Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2 Sockets) / 192 GB RAM
single datastore, ZFS RaidZ2 with 10x8TB 7200 rpm Sata HDD + mirrored special device SSD + mirrored slog SSD

second PBS server:
32 x Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz (2 Sockets / 320 GB RAM
Dell Perc H710P/1G/BBU Raid10 with 6x8TB 7200 rpm sata HDD.

Eduardo Taboada · Dec 11, 2025

It can be a problem of network congestion (packet loss/retransmissions), high latency or a problem with the buffers of your network equipment. If these problems occurs, then the TCP window size will be reduced.
How are the MTU settings?
Try to do a ping check with large packet size before and after the problem occurs to check packet loss.
Some network equipment has a little buffer, and when this buffer is not able to process the packets quickly, the TCP window will be reduced.

rahman · Dec 11, 2025

Eduardo Taboada said:
It can be a problem of network congestion (packet loss/retransmissions), high latency or a problem with the buffers of your network equipment. If these problems occurs, then the TCP window size will be reduced.
How are the MTU settings?
Try to do a ping check with large packet size before and after the problem occurs to check packet loss.
Some network equipment has a little buffer, and when this buffer is not able to process the packets quickly, the TCP window will be reduced.

This is local network with 0.1-0.2 ms ping latency, All MTU's are default 1500 and iperf can consume 10Gbit/s between two PBS.

Eduardo Taboada · Dec 11, 2025

rahman said:
This is local network with 0.1-0.2 ms ping latency, All MTU's are default 1500 and iperf can consume 10Gbit/s between two PBS.

iperf show performance in a short time, but as i told, if the buffers of the network equipment collapse the speed will be reduced. Also will be a reduced time if the destination PBS cannot write so fast.
I experienced the buffer collapse with mikrotik switches a few weeks ago with a customer

rahman · Dec 11, 2025

OK, But the stable speed continued for about 4-5 hours then 8-9 hours second time without a problem as seen in the graphs. So I'm not sure if this is about network buffers.

fabian · Dec 11, 2025

could you post `proxmox-backup-manager versions --verbose` for both sides, and the sync task log?

rahman · Dec 11, 2025

fabian said:
could you post `proxmox-backup-manager versions --verbose` for both sides, and the sync task log?

Sure.

This is syncing side PBS

Code:

:~# proxmox-backup-manager versions --verbose
proxmox-backup                      4.0.0         running kernel: 6.17.2-2-pve
proxmox-backup-server               4.1.0-1       running version: 4.1.0
proxmox-kernel-helper               9.0.4
proxmox-kernel-6.17.2-2-pve-signed  6.17.2-2
proxmox-kernel-6.17                 6.17.2-2
proxmox-kernel-6.17.2-1-pve-signed  6.17.2-1
proxmox-kernel-6.14.11-4-pve-signed 6.14.11-4
proxmox-kernel-6.14                 6.14.11-4
proxmox-kernel-6.2.16-20-pve        6.2.16-20
proxmox-kernel-6.2                  6.2.16-20
pve-kernel-6.2.16-3-pve             6.2.16-3
ifupdown2                           3.3.0-1+pmx11
libjs-extjs                         7.0.0-5
proxmox-backup-docs                 4.1.0-1
proxmox-backup-client               4.1.0-1
proxmox-mail-forward                1.0.2
proxmox-mini-journalreader          1.6
proxmox-offline-mirror-helper       unknown
proxmox-widget-toolkit              5.1.2
pve-xtermjs                         5.5.0-3
smartmontools                       7.4-pve1
zfsutils-linux                      2.3.4-pve1

This is primary PBS

Code:

:~# proxmox-backup-manager versions --verbose
proxmox-backup                      4.0.0         running kernel: 6.17.2-1-pve
proxmox-backup-server               4.1.0-1       running version: 4.1.0
proxmox-kernel-helper               9.0.4
pve-kernel-5.15                     7.4-4
proxmox-kernel-6.17.2-1-pve-signed  6.17.2-1
proxmox-kernel-6.17                 6.17.2-2
proxmox-kernel-6.14.11-4-pve-signed 6.14.11-4
proxmox-kernel-6.14                 6.14.11-4
proxmox-kernel-6.14.11-1-pve-signed 6.14.11-1
proxmox-kernel-6.8.12-13-pve-signed 6.8.12-13
proxmox-kernel-6.8                  6.8.12-13
pve-kernel-5.15.108-1-pve           5.15.108-1
pve-kernel-5.15.74-1-pve            5.15.74-1
ifupdown2                           3.3.0-1+pmx11
libjs-extjs                         7.0.0-5
proxmox-backup-docs                 4.1.0-1
proxmox-backup-client               4.1.0-1
proxmox-mail-forward                1.0.2
proxmox-mini-journalreader          1.6
proxmox-offline-mirror-helper       unknown
proxmox-widget-toolkit              5.1.2
pve-xtermjs                         5.5.0-3
smartmontools                       7.4-pve1
zfsutils-linux                      2.3.4-pve1

Edit: Added Sync job logs.

fabian · Dec 11, 2025

could you try booting both ends with the 6.14 kernel? there is a known regression that seems to affect some PBS setups and the 6.17 kernels:

H

[SOLVED] Thread 'Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20'

Nov 21, 2025

Hello everyone,
Since yesterday, after all the updates to PVE 9.1.1 and PBS 4.0.20 (all with no-subscription repository), the backup has become super slow.
Obviously, after the updates, I restarted both the nodes and the VMs.

I found my daily backup job still running after hours.
And some VMs even frozen. I had to kill all backup processes, both from the pve nodes and from the PBS.
Once killed, I also had to unlock the VMs that were being backed up and force them to stop.

Then, once normal operation was restored, I launched a manual backup, but for some VMs, the backup is still very slow...

rahman · Dec 11, 2025

fabian said:
could you try booting both ends with the 6.14 kernel? there is a known regression that seems to affect some PBS setups and the 6.17 kernels:

H

[SOLVED] Thread 'Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20'

Nov 21, 2025

Hello everyone,
Since yesterday, after all the updates to PVE 9.1.1 and PBS 4.0.20 (all with no-subscription repository), the backup has become super slow.
Obviously, after the updates, I restarted both the nodes and the VMs.

I found my daily backup job still running after hours.
And some VMs even frozen. I had to kill all backup processes, both from the pve nodes and from the PBS.
Once killed, I also had to unlock the VMs that were being backed up and force them to stop.

Then, once normal operation was restored, I launched a manual backup, but for some VMs, the backup is still very slow...

Heracleos

Replies: 290

Forum: Proxmox Backup: Installation and configuration

Thanks for the hint, I am going to read it. As this is initial sync, I can delete all contents of the datastore, change the kernels and start syncing from scratch for testing.

rahman · Dec 11, 2025

@fabian Ok, I downgraded both kernels to 6.14 with "proxmox-boot-tool kernel pin 6.14.11-4-pve --next-boot", rebooted the servers, "uname -a" to be sure its correct kernels booted. It's multiple times bad with 6.14 kernels, iftop show 0kb - 150 kb traffic, it's like single chunk is transferred per minutes :/. No other job is running.

Edit: Also stopping sync job does not seems to stop it. I see the log in journalctl but GUI shows it's still running:

Code:

Dec 11 16:12:53 pbs1 proxmox-backup-proxy[1339]: received abort request ...

Edit: Fix wrong mention

rahman · Dec 11, 2025

Rebooted both servers with default 6.17 kernels but it's still problematic :/ 0 to 30 mbit/s.

fabian · Dec 11, 2025

okay, so this must be a different issue then!

stopping the sync job does work here, FWIW. does the slow down happen at specific groups/snapshots?

rahman · Dec 11, 2025

fabian said:
okay, so this must be a different issue then!

stopping the sync job does work here, FWIW. does the slow down happen at specific groups/snapshots?

It does not seem to happen on the same snapshot. See the attached screenshot. It is from last job status that started again after rebooting with default kernel on both PBS. Graph does not show the job start but it started at 16:38 with KBit/sec speeds . It seems from the graph, it somehow recovered and speed up around at 17:00, then stalled to kbit/s speeds at around 17:20. Then speed up at around 17:50, then stalled again at around 18:01 but this time not to kbit/s but steady 50-70 mbit/s speeds.

I did not start from scratch while testing 6.14 kernel BTW. I re-run the sync job so it skipped the snapshots that it synced already. I can test with an empty datastore if you think it should make any difference.

Edit: It seems I can't stop the sync job when it seems its stalled. I can cancel it when It speeds up. When it slows down and I press stop button, I see abort request in journalctl logs but job seems to continue pulling chunks at a very very slow speeds. Don't know maybe its the chunks that already requested that it waits to finish.

fabian · Dec 12, 2025

rahman said:
It does not seem to happen on the same snapshot. See the attached screenshot. It is from last job status that started again after rebooting with default kernel on both PBS. Graph does not show the job start but it started at 16:38 with KBit/sec speeds . It seems from the graph, it somehow recovered and speed up around at 17:00, then stalled to kbit/s speeds at around 17:20. Then speed up at around 17:50, then stalled again at around 18:01 but this time not to kbit/s but steady 50-70 mbit/s speeds.

I did not start from scratch while testing 6.14 kernel BTW. I re-run the sync job so it skipped the snapshots that it synced already. I can test with an empty datastore if you think it should make any difference.

Edit: It seems I can't stop the sync job when it seems its stalled. I can cancel it when It speeds up. When it slows down and I press stop button, I see abort request in journalctl logs but job seems to continue pulling chunks at a very very slow speeds. Don't know maybe its the chunks that already requested that it waits to finish.

yes, that sounds like something is blocking waiting for I/O, and the next point for checking the abort state is not reached..

is it possible something is throttling network connections after a certain amount of traffic?

rahman · Dec 12, 2025

fabian said:
yes, that sounds like something is blocking waiting for I/O, and the next point for checking the abort state is not reached..

is it possible something is throttling network connections after a certain amount of traffic?

No, this is direct local network on same VLAN, no routing, firewall etc. The job completed BTW and next sync jobs (2 hour interval) succeeded as the diffs were not big. I will watch it closely if it will stall on big diffs.

fabian · Dec 12, 2025

thanks!

rahman · Dec 19, 2025

Hi,

As I wrote on another thread, I was having FS corruption (twice in a month) on secondary PBS. So I had to install new and start from scratch.

So:
1. Did SMART long tests on all drives and they completed with success
2. I got rid off PERC RAID, flashed IT Mode (HBA) firmware
3. Reinstalled PBS 4.1 from latest ISO, using Mirrored boot disk (2x 240GB SSD) and separate mirrored vdev (6x 8TB HDD) data pool, with metadata only special vdev mirror.
4. apt update && apt dist-upgrade with no-subscription repo

Then I configured every thing as before and run the first sync job. It completed in about ~22 hours, ~12TB total data, without a single stall. The only difference is (apart from PERC RAID -> ZFS) I did not set up LACP/LAGG this time. I only used single 10Gbit interface.

PBS pull sync job slows down about x20 after a while.

Renowned Member

Attachments

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Attachments

Proxmox Staff Member

[SOLVED] Thread 'Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20'

Renowned Member

[SOLVED] Thread 'Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20'

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Attachments

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

We value your privacy