Backup sync performance over VPN

broth-itk · Apr 1, 2025

Hello all!

Given are two PBS servers A and B from where A is configured to pull data from B.
Both servers are connected via an IPsec VPN tunnel. Ping latency is around 10ms and available bandwidth to pull of 100Mbit/s.

Previously when I used Ve**am to copy backups, the link was always used at around maximum bandwidth.

Now with the PBS sync job, there are only around 20Mbit/s used for data transfer.
There is no BW limit configured.

I know that pulling chunks one by one is slow-ish and latency might play a role but does it really need to be so slow?
What can be done to improve performance? I'd like to use my link at full capacity.

Thoughts:

Multistream TCP or having multiple jobs can certainly be done but it might be hard to implement.
Have found that comment in the forum in an older post)

What is PBS actually using in the background to fetch data?
It looks like some king of HTTP/HTTPS requests to a custom port/api.

Is PBS making a new connection on each synchronized chunk?
If yes, can multiple requests get tunneled over the same TCP connection? HTTP Keepalive or even HTTP/2?
Establishing a new TCP connection for each chunk might take a long time (3 way handshake = 30ms).

Be happy and stay safe!
Cheers,
Bernhard

Chris · Apr 2, 2025

Hi,

broth-itk said:
Hello all!

Given are two PBS servers A and B from where A is configured to pull data from B.
Both servers are connected via an IPsec VPN tunnel. Ping latency is around 10ms and available bandwidth to pull of 100Mbit/s.

Previously when I used Ve**am to copy backups, the link was always used at around maximum bandwidth.

Now with the PBS sync job, there are only around 20Mbit/s used for data transfer.
There is no BW limit configured.

I know that pulling chunks one by one is slow-ish and latency might play a role but does it really need to be so slow?
What can be done to improve performance? I'd like to use my link at full capacity.

Thoughts:

Multistream TCP or having multiple jobs can certainly be done but it might be hard to implement.
Have found that comment in the forum in an older post)

What is PBS actually using in the background to fetch data?
It looks like some king of HTTP/HTTPS requests to a custom port/api.

Is PBS making a new connection on each synchronized chunk?
If yes, can multiple requests get tunneled over the same TCP connection? HTTP Keepalive or even HTTP/2?
Establishing a new TCP connection for each chunk might take a long time (3 way handshake = 30ms).

Be happy and stay safe!
Cheers,
Bernhard

this has been discussed in this [0] issue. The sync jobs use a HTTP/2 connection to pull or push the contents from/to the remote, multiple chunks are uploaded/downloaded in parallel over this connection. Can you share the task logs for the source and target as well as the sync job configuration?

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=4182

broth-itk · Apr 2, 2025

Hi Chris, thanks for your feedback!

Here is the sync job configuration and a shortened log from the source attached.

The destination side log do not look very informative:

Code:

2025-04-02T18:38:01+00:00: starting new backup reader datastore 'Backup': "/data"
2025-04-02T18:38:01+00:00: protocol upgrade done
2025-04-02T18:38:01+00:00: GET /download
2025-04-02T18:38:01+00:00: download "/data/ns/LOCAL/ct/112/2025-03-28T20:44:26Z/index.json.blob"
2025-04-02T18:38:01+00:00: GET /download
2025-04-02T18:38:01+00:00: download "/data/ns/LOCAL/ct/112/2025-03-28T20:44:26Z/root.pxar.didx"
2025-04-02T18:38:01+00:00: register chunks in 'root.pxar.didx' as downloadable.

Here is the sync job configuration:

Code:

sync: s-aac8c2e0-4207
    ns UMB-OFFICE
    owner root@pam
    remote UMB-OFFICE
    remote-ns LOCAL
    remote-store Backup
    remove-vanished true
    resync-corrupt false
    schedule 12:00
    store Backup

The parallel-groups option is promising but not easy to find

I'll wait for this to be released in production and try it out.

BTW: PBS 3.3.4 (all updates installed), when clicking on active job, it shows the log from the previous one:

The window title looks fine but the log seems odd.

Best regards,
Bernhard

broth-itk · Apr 3, 2025

I don't know if this is maybe related but using proxmox-backup-client to backup a linux system, it shows nicely parallel upload tasks in the log (i believe four).

Chris · Apr 3, 2025

Well nothing wrong here, the

broth-itk said:
The destination side log do not look very informative:

Please do share these as well, I would be interested in the numbers you get.

The logs you shared form the source side show that there are multiple chunks are being requested and downloaded over the same connection just as expected.

broth-itk said:
I don't know if this is maybe related but using proxmox-backup-client to backup a linux system, it shows nicely parallel upload tasks in the log (i believe four).

If you have multiple backup running at the same time, than yes, each backup will run in its dedicated task. Each task then uploads chunks via the http/2 connection.

Chris · Apr 3, 2025

broth-itk said:
BTW: PBS 3.3.4 (all updates installed), when clicking on active job, it shows the log from the previous one:

Cannot reproduce this here, if I click on a task log (even if still running) I do get the correct log.

broth-itk · Apr 3, 2025

Thanks for your feedback!

This is all I got from the pulling side from the GUI:

... not much more from CLI:

The compessed raw log files from /var/log/proxmox-backup/tasks/4E are around 17MB, too big for the forum.
I uploaded the file here: http://159.69.13.18/pbs_forum_164450_logs.tar.gz
Hope this helps to find the root cause

This is 20% of available capacity which the process uses since 20 hours:

Chris · Apr 4, 2025

broth-itk said:
Both servers are connected via an IPsec VPN tunnel. Ping latency is around 10ms and available bandwidth to pull of 100Mbit/s.

Can you verify that the connection in-between the 2 sites is actually able to achieve these values, e.g. via iperf. Also, double check the route and ping by e.g. mtr.

Nothing strange in the logs and the outbound traffic you show seems just to artificially bound to 20Mbit so worth to double check the network with your current setup.

broth-itk · Apr 4, 2025

Sure, here are my results:

iperf3 UDP test using iperf3 -c 10.200.1.48 -t 10s -u --bidir -b 200M

Code:

[  5][RX-S]   0.00-10.06  sec  58.2 MBytes  48.5 Mbits/sec  0.481 ms  137483/181933 (76%)  receiver
[  7][RX-C]   0.00-10.06  sec  88.1 MBytes  73.5 Mbits/sec  0.188 ms  114450/181720 (63%)  receiver

RX-C is expected since sync is still running in the background.

The tests were performed directly on both pbs instances (they run inside an PVE container).

TCP test is expected to be lower but there is still headroom left:

Code:

[  5][RX-S]   0.00-10.02  sec  58.5 MBytes  49.0 Mbits/sec                  receiver
[  8][TX-S]   0.00-10.02  sec  17.2 MBytes  14.4 Mbits/sec   42             sender

Flood ping using from remote to central site results:

Code:

root@pbs:~# ping -f 10.200.1.48
PING 10.200.1.48 (10.200.1.48) 56(84) bytes of data.
.^C     
--- 10.200.1.48 ping statistics ---
2044 packets transmitted, 2043 received, 0.0489237% packet loss, time 25964ms
rtt min/avg/max/mdev = 7.951/15.801/90.009/4.722 ms, pipe 9, ipg/ewma 12.708/15.627 ms
root@pbs:~#

Summary: I can't identify a potential bottleneck from infrastructure side. No traffic shapers are active on the firewall.
The same VPN was used for Veeam as well with very different results.

I'm going to play around with different TCP congestion algos to see if there are notable differences.

broth-itk · Apr 4, 2025

Quick update: Using bbs instead of cubic improves bandwidth slightly from 21 to 24MBit/s.

Chris · Apr 4, 2025

broth-itk said:
iperf3 -c 10.200.1.48 -t 10s -u --bidir -b 200M

Why UDP and perform the bidirectional test? I would suggest to rather set the pull source as the server and the pull target as the client and measure the download for tcp, as that is what you are interested in? All of which without the sync running in the background if you want to get clean metrics.

broth-itk said:
[ 5][RX-S] 0.00-10.06 sec 58.2 MBytes 48.5 Mbits/sec 0.481 ms 137483/181933 (76%) receiver [ 7][RX-C] 0.00-10.06 sec 88.1 MBytes 73.5 Mbits/sec 0.188 ms 114450/181720 (63%) receiver

You do seem to have a lot of UDP packet loss? However not relevant for the TCP connection...

broth-itk said:
[ 5][RX-S] 0.00-10.02 sec 58.5 MBytes 49.0 Mbits/sec receiver [ 8][TX-S] 0.00-10.02 sec 17.2 MBytes 14.4 Mbits/sec 42 sender

Who is the sender (sync source or target)? As suggested, I would measure the bandwidth without other traffic going over the network to get a clean metric.

broth-itk · Apr 4, 2025

I appreciate your help.
First we need to understand how iperf works and why UDP test is preferred over TCP to get link bandwidth figures.
As you can see in the command example, I've chosen 200MBit/s UDP data stream generator over a 100/120MBit/s link.
Obviously there is packet loss. The results do actually show the possible bandwidth over that link:

- 48Mbit/s TX on remote location
- 93Mbit/s (73 from test + 20Mbit/s background) RX on remote location

Turning onto the TCP test, even with load (background sync job) we get 14Mbit/s.

So in theory the sync job should run at least with 34Mbit/s - which it isn't.

From my point of view the numbers match fine. TCP could be better but the protocol has many dependencies (which is why it's usually not used for bandwidth checking): congestion algorithms, window sizes, link latency and so on....

broth-itk · Apr 4, 2025

Lets go back from our excursion to iperf and IP and focus on PBS.

Tuning linux TCP buffers is doing the trick:

Code:

sysctl -w net.ipv4.tcp_rmem = 8192 262144 536870912
sysctl -w net.ipv4.tcp_wmem = 4096 16384 536870912
sysctl -w net.ipv4.tcp_adv_win_scale = -2
sysctl -w net.ipv4.tcp_notsent_lowat = 131072

What a f**ing surprise:

These values were set on both from within the container and on the host.
Need to check where it's really necessary.

@Chris: These settings could be eventually a candidate to get enabled on GUI and call it "TCP WAN optimization" ;-) What do you think?

Chris · Apr 4, 2025

broth-itk said:
Tuning linux TCP buffers is doing the trick:

sysctl -w net.ipv4.tcp_rmem = 8192 262144 536870912 sysctl -w net.ipv4.tcp_wmem = 4096 16384 536870912 sysctl -w net.ipv4.tcp_adv_win_scale = -2 sysctl -w net.ipv4.tcp_notsent_lowat = 131072

Ah great to hear, and thanks for sharing your networking experience and insights!

broth-itk said:
@Chris: These settings could be eventually a candidate to get enabled on GUI and call it "TCP WAN optimization" ;-) What do you think?

Not sure we do want to have this as an option in the UI, but definitely worth exploring further. Could you add this information to the bugtracker issue https://bugzilla.proxmox.com/show_bug.cgi?id=4182 (don't want to take credit for your finding

)

broth-itk · Apr 4, 2025

You're welcome!
I'll do some more investigation to track down the exact setting.

Then I'll post the findings in the tracker and we can discuss possible solutions.
The more I think about the possible root cause, the more I think the GU option makes no sense.

It should rather be considered to improve the general settings on the host level since this can have a positive effect on other TCP related communication as well (iSCSI as an example)

broth-itk · Apr 4, 2025

At the end it boils down to congestion control. The default cubic algorithm is the root cause of the bandwidth drop to only 20% of the available bandwidth.
When we add some more buffer tuning, my link gets completely saturated with the sync job - great.

To test the settings, one can enter following on your shell.
It will change the congestion control to bbr and adjust buffer sizes.
The maximum TCP window size is set to 64MB (from default 16MB)

Code:

sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr
sysctl -w net.ipv4.tcp_notsent_lowat=16384

sysctl -w net.ipv4.tcp_rmem="8192 262144 67108864"
sysctl -w net.ipv4.tcp_wmem="4096 16384 67108864"

To make this persistent, put this in your /etc/sysctl.conf file:

Code:

net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr
net.ipv4.tcp_notsent_lowat=16384
net.ipv4.tcp_rmem="8192 262144 67108864"
net.ipv4.tcp_wmem="4096 16384 67108864"

Note if you have PBS as container on PVE, you have to split up the settings.
The host will get

net.core.default_qdisc=fq

and in the container you configure

Code:

net.ipv4.tcp_congestion_control=bbr
net.ipv4.tcp_notsent_lowat=16384

net.ipv4.tcp_rmem="8192 262144 67108864"
net.ipv4.tcp_wmem="4096 16384 67108864"

This is what I am using in my environment

Constant and steady data transfer as we like to see.

References:
https://blog.cloudflare.com/http-2-prioritization-with-nginx/
https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency/

broth-itk · Aug 11, 2025

Just a quick update:

Since I upgraded by ISP link to 200 Mbps upstream BW, I actually was never able to use that full bandwidth for my PBS backup job.
After hours of searching the root cause, buffer tuning and iperf testing I found out that two TCP streams were actually capable to saturate my uplink.

Since PBS does not support using multiple TCP streams (I think this was evaluated at some time in the past), me and AI crafted a python script which runs on both ends. It will accept connections from the pulling PBS locally, initiate multiple single TCP connections to the destination and round-robin distribute the packets over them.

I'm quite happy with the results:

I have to tell that this workaround is ugly in the way that I would have preferred to solve the underlying network issue.

Since this particular VPN link is somehow special (IPv4 tunnel over IPv6), I feel that I hit some kind of platform limitation of my infrastructure or bug.
This might take time to get solved so pragmatic-me is eventually happy with the plan B.

Another IPv4 on IPv4 tunnel from the datacenter is actually working very well without any quirks at 500 Mbps.
BBR and kernel tuning is still active on the receiving side.

I'm going to post the python script shortly, maybe it's useful for someone else.

Johannes S · Aug 11, 2025

broth-itk said:
Since PBS does not support using multiple TCP streams (I think this was evaluated at some time in the past), me and AI crafted a python script which runs on both ends. It will accept connections from the pulling PBS locally, initiate multiple single TCP connections to the destination and round-robin distribute the packets over them.

Shouldn't it be possible to put a loadbalancer (e.g. caddy/haproxy) or reverse proxy (nginx ) before it to achieve the same? I wouldn't want to trust some AI-generated code with my backups

broth-itk · Aug 11, 2025

To comment "some AI-generated code" without actually seeing the result is a bit harsh - but thats your educated opinion. I'm doing python for years and I think I know what I'm doing ;-)
Nevertheless, I'd be more than happy to use another proven replacement.

What do you have in mind to load balance a single TCP connection over multiple "backend" links, think of 1:n ?
Please don't forget the details since there is only one source and one destination. A single connection might pull 600GB data from the source.

Johannes S · Aug 11, 2025

broth-itk said:
To comment "some AI-generated code" without actually seeing the result is a bit harsh - but thats your educated opinion. I'm doing python for years and I think I know what I'm doing ;-)

All good. I'm just a little bit worried about the "Vibe-Coding" trend and the influx of AI-generated questions and answers in this forum and elsewhere. Using a AI to help while still doing the heavy lifting on your own is fine in my book

Backup sync performance over VPN

Member

Proxmox Staff Member

Member

Attachments

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Member

Member

Distinguished Member

Member

Distinguished Member

We value your privacy