PBS sync job slow on some connection (ex: Starlink)

biotuxic · Sep 8, 2023

Hello I'm a PVE heavy users for years, and I started using PBS to externalise my backup 2 years ago. before that I was using ZFS snapshot and sync to do the job.
I recently suscribed to a Starlink internet (No fiber connection available)
The connection rocks like a charm, the bandwith is huge and the latency is not that bad.
I started to sync my backup to a local server, and I got the bad reality that the PBS sync process is limited to around 2Mbit/s so I did some iperf on the connection and indeed:
if everything is going through a unique TCP connection the result is 2Mbit/s max
but if I enable paralell connection 8x for testing it ended to get more bandwith

here is my iperf dump:

Code:

Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.53 MBytes  1.28 Mbits/sec   37             sender
[  5]   0.00-10.06  sec  1.34 MBytes  1.12 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  1.52 MBytes  1.28 Mbits/sec   40             sender
[  7]   0.00-10.06  sec  1.34 MBytes  1.12 Mbits/sec                  receiver
[  9]   0.00-10.00  sec  1.59 MBytes  1.34 Mbits/sec   44             sender
[  9]   0.00-10.06  sec  1.41 MBytes  1.18 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec  1.53 MBytes  1.28 Mbits/sec   49             sender
[ 11]   0.00-10.06  sec  1.40 MBytes  1.17 Mbits/sec                  receiver
[ 13]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   42             sender
[ 13]   0.00-10.06  sec  1.33 MBytes  1.11 Mbits/sec                  receiver
[ 15]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   39             sender
[ 15]   0.00-10.06  sec  1.39 MBytes  1.15 Mbits/sec                  receiver
[ 17]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   42             sender
[ 17]   0.00-10.06  sec  1.40 MBytes  1.17 Mbits/sec                  receiver
[ 19]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   46             sender
[ 19]   0.00-10.06  sec  1.36 MBytes  1.14 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  12.2 MBytes  10.2 Mbits/sec  339             sender
[SUM]   0.00-10.06  sec  11.0 MBytes  9.15 Mbits/sec                  receiver
CPU Utilization: local/sender 0.7% (0.1%u/0.6%s), remote/receiver 1.4% (0.3%u/1.1%s)
snd_tcp_congestion cubic
rcv_tcp_congestion newreno

and here is an extract from the sync task log:

Code:

2023-09-08T16:26:29+02:00: sync snapshot vm/9001/2023-08-14T03:36:27Z done
2023-09-08T16:26:29+02:00: percentage done: 87.33% (10/12 groups, 23/48 snapshots in group #11)
2023-09-08T16:26:30+02:00: sync snapshot vm/9001/2023-08-15T03:38:02Z
2023-09-08T16:26:30+02:00: sync archive qemu-server.conf.blob
2023-09-08T16:26:30+02:00: sync archive drive-scsi0.img.fidx
2023-09-08T16:28:10+02:00: downloaded 89011730 bytes (0.85 MiB/s)

The question is, is it possible to enable a kind of TCP multiplexer on the sync job worker, or maybe some simultaneous chunk sync to get the complete sync time reduced ?

Best Regards to the PVE team, and other PVE user ;-)

dcsapak · Sep 12, 2023

no there is currently no way to use multiple tcp connections for a single sync job. (i don't know if it's possible with our current architecture at all) so for now it seems to be simply a limitation of "high" latency connections (you can open a bug report/feature request on https://bugzilla.proxmox.com but no promises if or how we could fix that...)

itNGO · Sep 12, 2023

biotuxic said:
Hello I'm a PVE heavy users for years, and I started using PBS to externalise my backup 2 years ago. before that I was using ZFS snapshot and sync to do the job.
I recently suscribed to a Starlink internet (No fiber connection available)
The connection rocks like a charm, the bandwith is huge and the latency is not that bad.
I started to sync my backup to a local server, and I got the bad reality that the PBS sync process is limited to around 2Mbit/s so I did some iperf on the connection and indeed:
if everything is going through a unique TCP connection the result is 2Mbit/s max
but if I enable paralell connection 8x for testing it ended to get more bandwith
here is my iperf dump:

Code:

Test Complete. Summary Results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.53 MBytes 1.28 Mbits/sec 37 sender [ 5] 0.00-10.06 sec 1.34 MBytes 1.12 Mbits/sec receiver [ 7] 0.00-10.00 sec 1.52 MBytes 1.28 Mbits/sec 40 sender [ 7] 0.00-10.06 sec 1.34 MBytes 1.12 Mbits/sec receiver [ 9] 0.00-10.00 sec 1.59 MBytes 1.34 Mbits/sec 44 sender [ 9] 0.00-10.06 sec 1.41 MBytes 1.18 Mbits/sec receiver [ 11] 0.00-10.00 sec 1.53 MBytes 1.28 Mbits/sec 49 sender [ 11] 0.00-10.06 sec 1.40 MBytes 1.17 Mbits/sec receiver [ 13] 0.00-10.00 sec 1.50 MBytes 1.26 Mbits/sec 42 sender [ 13] 0.00-10.06 sec 1.33 MBytes 1.11 Mbits/sec receiver [ 15] 0.00-10.00 sec 1.50 MBytes 1.26 Mbits/sec 39 sender [ 15] 0.00-10.06 sec 1.39 MBytes 1.15 Mbits/sec receiver [ 17] 0.00-10.00 sec 1.50 MBytes 1.26 Mbits/sec 42 sender [ 17] 0.00-10.06 sec 1.40 MBytes 1.17 Mbits/sec receiver [ 19] 0.00-10.00 sec 1.50 MBytes 1.26 Mbits/sec 46 sender [ 19] 0.00-10.06 sec 1.36 MBytes 1.14 Mbits/sec receiver [SUM] 0.00-10.00 sec 12.2 MBytes 10.2 Mbits/sec 339 sender [SUM] 0.00-10.06 sec 11.0 MBytes 9.15 Mbits/sec receiver CPU Utilization: local/sender 0.7% (0.1%u/0.6%s), remote/receiver 1.4% (0.3%u/1.1%s) snd_tcp_congestion cubic rcv_tcp_congestion newreno

and here is an extract from the sync task log:

Code:

2023-09-08T16:26:29+02:00: sync snapshot vm/9001/2023-08-14T03:36:27Z done 2023-09-08T16:26:29+02:00: percentage done: 87.33% (10/12 groups, 23/48 snapshots in group #11) 2023-09-08T16:26:30+02:00: sync snapshot vm/9001/2023-08-15T03:38:02Z 2023-09-08T16:26:30+02:00: sync archive qemu-server.conf.blob 2023-09-08T16:26:30+02:00: sync archive drive-scsi0.img.fidx 2023-09-08T16:28:10+02:00: downloaded 89011730 bytes (0.85 MiB/s)

The question is, is it possible to enable a kind of TCP multiplexer on the sync job worker, or maybe some simultaneous chunk sync to get the complete sync time reduced ?

Best Regards to the PVE team, and other PVE user ;-)

Maybe create multiple sync jobs and filter by VMid could help?

biotuxic · Sep 12, 2023

dcsapak said:
no there is currently no way to use multiple tcp connections for a single sync job. (i don't know if it's possible with our current architecture at all) so for now it seems to be simply a limitation of "high" latency connections (you can open a bug report/feature request on https://bugzilla.proxmox.com but no promises if or how we could fix that...)

Thanks for this honest answer.
I've been looking to do a software aggregation , with VPN or any kind of proxy, but it seems a lot of tricks to implement.
I've also seen some literature about multipath TCP kernel patch from Google, but it needs to be supported only in some kernel.

Best Regards
If anyone has a proxy solution I can add, I'll be delighted

biotuxic · Sep 12, 2023

itNGO said:
Maybe create multiple sync jobs and filter by VMid could help?

that's actually a damned simple clever tricks ;-)
A bit annoying to monitor, but I'll try

biotuxic · Sep 12, 2023

biotuxic said:
that's actually a damned simple clever tricks ;-)
A bit annoying to monitor, but I'll try

With some Regex in group filter, it's working I divided my sync time by 4

best regards @itNGO

Search

Search

PBS sync job slow on some connection (ex: Starlink)

biotuxic

New Member

dcsapak

Proxmox Staff Member

itNGO

Renowned Member

biotuxic

New Member

biotuxic

New Member

biotuxic

New Member