PBS sync job slow on some connection (ex: Starlink)

biotuxic

New Member
Sep 8, 2023
4
1
3
Hello I'm a PVE heavy users for years, and I started using PBS to externalise my backup 2 years ago. before that I was using ZFS snapshot and sync to do the job.
I recently suscribed to a Starlink internet (No fiber connection available)
The connection rocks like a charm, the bandwith is huge and the latency is not that bad.
I started to sync my backup to a local server, and I got the bad reality that the PBS sync process is limited to around 2Mbit/s so I did some iperf on the connection and indeed:
if everything is going through a unique TCP connection the result is 2Mbit/s max
but if I enable paralell connection 8x for testing it ended to get more bandwith :)
here is my iperf dump:
Code:
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.53 MBytes  1.28 Mbits/sec   37             sender
[  5]   0.00-10.06  sec  1.34 MBytes  1.12 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  1.52 MBytes  1.28 Mbits/sec   40             sender
[  7]   0.00-10.06  sec  1.34 MBytes  1.12 Mbits/sec                  receiver
[  9]   0.00-10.00  sec  1.59 MBytes  1.34 Mbits/sec   44             sender
[  9]   0.00-10.06  sec  1.41 MBytes  1.18 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec  1.53 MBytes  1.28 Mbits/sec   49             sender
[ 11]   0.00-10.06  sec  1.40 MBytes  1.17 Mbits/sec                  receiver
[ 13]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   42             sender
[ 13]   0.00-10.06  sec  1.33 MBytes  1.11 Mbits/sec                  receiver
[ 15]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   39             sender
[ 15]   0.00-10.06  sec  1.39 MBytes  1.15 Mbits/sec                  receiver
[ 17]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   42             sender
[ 17]   0.00-10.06  sec  1.40 MBytes  1.17 Mbits/sec                  receiver
[ 19]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   46             sender
[ 19]   0.00-10.06  sec  1.36 MBytes  1.14 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  12.2 MBytes  10.2 Mbits/sec  339             sender
[SUM]   0.00-10.06  sec  11.0 MBytes  9.15 Mbits/sec                  receiver
CPU Utilization: local/sender 0.7% (0.1%u/0.6%s), remote/receiver 1.4% (0.3%u/1.1%s)
snd_tcp_congestion cubic
rcv_tcp_congestion newreno
and here is an extract from the sync task log:
Code:
2023-09-08T16:26:29+02:00: sync snapshot vm/9001/2023-08-14T03:36:27Z done
2023-09-08T16:26:29+02:00: percentage done: 87.33% (10/12 groups, 23/48 snapshots in group #11)
2023-09-08T16:26:30+02:00: sync snapshot vm/9001/2023-08-15T03:38:02Z
2023-09-08T16:26:30+02:00: sync archive qemu-server.conf.blob
2023-09-08T16:26:30+02:00: sync archive drive-scsi0.img.fidx
2023-09-08T16:28:10+02:00: downloaded 89011730 bytes (0.85 MiB/s)

The question is, is it possible to enable a kind of TCP multiplexer on the sync job worker, or maybe some simultaneous chunk sync to get the complete sync time reduced ?

Best Regards to the PVE team, and other PVE user ;-)
 
no there is currently no way to use multiple tcp connections for a single sync job. (i don't know if it's possible with our current architecture at all) so for now it seems to be simply a limitation of "high" latency connections (you can open a bug report/feature request on https://bugzilla.proxmox.com but no promises if or how we could fix that...)
 
  • Like
Reactions: biotuxic
Hello I'm a PVE heavy users for years, and I started using PBS to externalise my backup 2 years ago. before that I was using ZFS snapshot and sync to do the job.
I recently suscribed to a Starlink internet (No fiber connection available)
The connection rocks like a charm, the bandwith is huge and the latency is not that bad.
I started to sync my backup to a local server, and I got the bad reality that the PBS sync process is limited to around 2Mbit/s so I did some iperf on the connection and indeed:
if everything is going through a unique TCP connection the result is 2Mbit/s max
but if I enable paralell connection 8x for testing it ended to get more bandwith :)
here is my iperf dump:
Code:
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.53 MBytes  1.28 Mbits/sec   37             sender
[  5]   0.00-10.06  sec  1.34 MBytes  1.12 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  1.52 MBytes  1.28 Mbits/sec   40             sender
[  7]   0.00-10.06  sec  1.34 MBytes  1.12 Mbits/sec                  receiver
[  9]   0.00-10.00  sec  1.59 MBytes  1.34 Mbits/sec   44             sender
[  9]   0.00-10.06  sec  1.41 MBytes  1.18 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec  1.53 MBytes  1.28 Mbits/sec   49             sender
[ 11]   0.00-10.06  sec  1.40 MBytes  1.17 Mbits/sec                  receiver
[ 13]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   42             sender
[ 13]   0.00-10.06  sec  1.33 MBytes  1.11 Mbits/sec                  receiver
[ 15]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   39             sender
[ 15]   0.00-10.06  sec  1.39 MBytes  1.15 Mbits/sec                  receiver
[ 17]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   42             sender
[ 17]   0.00-10.06  sec  1.40 MBytes  1.17 Mbits/sec                  receiver
[ 19]   0.00-10.00  sec  1.50 MBytes  1.26 Mbits/sec   46             sender
[ 19]   0.00-10.06  sec  1.36 MBytes  1.14 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  12.2 MBytes  10.2 Mbits/sec  339             sender
[SUM]   0.00-10.06  sec  11.0 MBytes  9.15 Mbits/sec                  receiver
CPU Utilization: local/sender 0.7% (0.1%u/0.6%s), remote/receiver 1.4% (0.3%u/1.1%s)
snd_tcp_congestion cubic
rcv_tcp_congestion newreno
and here is an extract from the sync task log:
Code:
2023-09-08T16:26:29+02:00: sync snapshot vm/9001/2023-08-14T03:36:27Z done
2023-09-08T16:26:29+02:00: percentage done: 87.33% (10/12 groups, 23/48 snapshots in group #11)
2023-09-08T16:26:30+02:00: sync snapshot vm/9001/2023-08-15T03:38:02Z
2023-09-08T16:26:30+02:00: sync archive qemu-server.conf.blob
2023-09-08T16:26:30+02:00: sync archive drive-scsi0.img.fidx
2023-09-08T16:28:10+02:00: downloaded 89011730 bytes (0.85 MiB/s)

The question is, is it possible to enable a kind of TCP multiplexer on the sync job worker, or maybe some simultaneous chunk sync to get the complete sync time reduced ?

Best Regards to the PVE team, and other PVE user ;-)
Maybe create multiple sync jobs and filter by VMid could help?
 
  • Like
Reactions: biotuxic
no there is currently no way to use multiple tcp connections for a single sync job. (i don't know if it's possible with our current architecture at all) so for now it seems to be simply a limitation of "high" latency connections (you can open a bug report/feature request on https://bugzilla.proxmox.com but no promises if or how we could fix that...)
Thanks for this honest answer.
I've been looking to do a software aggregation , with VPN or any kind of proxy, but it seems a lot of tricks to implement.
I've also seen some literature about multipath TCP kernel patch from Google, but it needs to be supported only in some kernel.

Best Regards
If anyone has a proxy solution I can add, I'll be delighted :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!