[SOLVED] Remote Sync Bandwidth Slow

dbayer

Renowned Member
Apr 15, 2016
69
8
73
54
Hello,

I've been dealing with the sync speed to my remote server for over a year. It almost consistently maxes out at exactly 1 MB/s. I've only seen it higher twice, around 3.5 MB/s, which would be great if maintained that. The only detail I noticed when it achieved 3.5 MB/s was that all the VM's were not using bandwidth. As soon as one of the VM's had traffic, the Backup Sync went back down to 1 MB/s.

Over the course of this year, at the remote site, I've gotten a new modern server, changed ISP's and upgraded to 100 MB/s service. None of those things improved the speed.

I've checked if my firewall, or the ProxMox firewall was doing bandwidth limiting, it is NOT.

I'm wondering if ProxMox does some sort of limiting, I'm not aware of.

I've been troubleshooting it randomly over that time. I'm officially out of ideas and I'm hoping someone can give me a new direction to try.

An overview of my setup is below. If you need any other information, let me know.

Thanks in Advance,
Daniel

Local System (Currently)
Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 16 Cores
3 LSI HBA's
ZFS Setup (All Devices are HDD's except the Logs and Cache Devices, they are SSD's)
(DATA POOL)
NAME STATE READ WRITE CKSUM
dpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000cca24d24764 ONLINE 0 0 0
wwn-0x5000cca255262649 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000cca24d350986 ONLINE 0 0 0
wwn-0x5000cca24d361894 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
wwn-0x5000cca24d26237 ONLINE 0 0 0
wwn-0x5000cca2556f2097 ONLINE 0 0 0
logs
mirror-3 ONLINE 0 0 0
wwn-0x55cd2e414d5e2865-part1 ONLINE 0 0 0
wwn-0x55cd2e414d5e2098-part1 ONLINE 0 0 0
cache
wwn-0x55cd2e414d5e0982-part2 ONLINE 0 0 0
wwn-0x55cd2e414d5e1175-part2 ONLINE 0 0 0
spares
wwn-0x5000cca0bd095534 AVAIL
(BACKUP POOL)
bpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000cca2531f0cb0 ONLINE 0 0 0
wwn-0x5000cca25320ca50 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000cca25321ca90 ONLINE 0 0 0
wwn-0x5000cca25321c880 ONLINE 0 0 0

Network:
2 Bonded 10Gig Ports

Internet:
100Mb Up/Down (tested with speedtest.net and iperf to be around 80 Mb/s)

PBS Benchmark Test (From Remote to Local)
proxmox-backup-client benchmark --repository root@pam@local.server.com:8007:nst-store
Password for "root@pam": *
Uploaded 22 chunks in 7 seconds.
Time per request: 339997 microseconds.
TLS speed: 12.34 MB/s
SHA256 speed: 525.27 MB/s
Compression speed: 805.35 MB/s
Decompress speed: 1663.37 MB/s
AES256/GCM speed: 3607.88 MB/s
Verify speed: 404.07 MB/s
┌───────────────────────────────────┬─────────────────────┐
│ Name │ Value │
╞═══════════════════════════════════╪═════════════════════╡
│ TLS (maximal backup upload speed) │ 12.34 MB/s (1%) │
├───────────────────────────────────┼─────────────────────┤
│ SHA256 checksum computation speed │ 525.27 MB/s (26%) │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 compression speed │ 805.35 MB/s (107%) │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 decompression speed │ 1663.37 MB/s (139%) │
├───────────────────────────────────┼─────────────────────┤
│ Chunk verification speed │ 404.07 MB/s (53%) │
├───────────────────────────────────┼─────────────────────┤
│ AES256 GCM encryption speed │ 3607.88 MB/s (99%) │
└───────────────────────────────────┴─────────────────────┘
 
could you also do the benchmark in the other direction?
 
I think I might have used the wrong terminology. My offsite PBS has my business PBS listed as a remote. My business PBS has no remotes listed.

So is it still possible to do a benchmark test from my business PBS to my offsite PBS?

I tried and got this error...

Error: error trying to connect: error connecting to https://offsite.server.com:8007/ - tcp connect error: deadline has elapsed
 
yeah, the benchmark doesn't require any remote configuration at all (it's client -> PBS server ;)). maybe you have a firewall interfering? you can also leave out the repository part and just run proxmox-backup-client benchmark, I was mainly interested in the hashing/crypto performance values.
 
Ok, I just did the test without the repository

SHA256 speed: 297.54 MB/s
Compression speed: 336.21 MB/s
Decompress speed: 789.87 MB/s
AES256/GCM speed: 1750.21 MB/s
Verify speed: 227.84 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name │ Value │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ not tested │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 297.54 MB/s (15%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed │ 336.21 MB/s (45%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed │ 789.87 MB/s (66%) │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed │ 227.84 MB/s (30%) │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed │ 1750.21 MB/s (48%) │
└───────────────────────────────────┴────────────────────┘
 
not amazing, but likely not the bottle neck. how's the CPU load during the sync?

edit: translated, sorry ;)
 
Last edited:
I translated your post. :)

The CPU on the business server barely registers anything during a sync, and averages around 1% IO delay.

The CPU on the off-site server is a little higher, around 2% IO delay.

Essentially none of the CPU's appear stressed.
 
thanks! if you monitor the traffic on the network level, is it a consistent 1-3MB/s or do you see spikes? how's your link's latency? the upload benchmark shows hitting line-speed (100mbit -> 12,5MB/s), but the code path is not exactly the same for download/sync and upload.
 
Latency is around 30 ms average.

It usually is a steady 1 MB/s, once in a while it will get spikes, and those syncs will be less then 1 MB/s.
The couple of times I've seen 3.5 MB/s it was also a steady bandwidth.
 
it seems like the combination of not-too-fast hardware and not-so-fast (both bandwidth and latency wise) link causes you to only get 10% of the theoretically possible through-put. it would be interesting to know if you get better results when running multiple syncs in parallel (e.g., by setting up one more datastore on each end and syncing that pair as well), as that would indicate that you'd possibly benefit from more parallelism in the code.
 
I was getting this same sync speed when I only had a 10 MB/s connection. I would like to think there is some other issue that is the problem. The other thing that has always stuck out to me is that it is exactly 1 MB/s. That seems too much like a throttled setting then just a chance Best Effort Speed.

What does the traffic look like to a firewall? Is it just HTTPS traffic on port 8007? Or could look like something else?

I really need this solved. If something screws up a sync for a couple days, it takes a month to get all the VM's synced again.
 
it's a mix of regular HTTP (queries like which groups and snapshots exist and so on) and HTTP2 (chunk downloading) over TLS on port 8007. the upload benchmark is H2 as well, but in the other direction.
 
I just wanted to update everyone on how this was finally resolved.

It turns out our firewall was limiting this traffic. Not through any particular setting, but just from being overloaded. We upgraded our Firewall hardware and our Bandwidth jumped up dramatically to between 6 MB/s and 9 MB/s.

I appreciate Fabian's help in tracking this down, by proving it was not PBS or Proxmox that was the problem.
 
  • Like
Reactions: fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!