Slow backup performance

Did you try this benchmark?
proxmox-backup-client benchmark --repository root@pam@IP:REMOTE_STORAGE

Mine is showing around 7MB/s (TLS) performance.
https://forum.proxmox.com/threads/is-pbs-using-single-thread-only.73612/

Below is the result to the remote PBS. To local, TLS is much higher - around 250MB.

Uploaded 10 chunks in 21 seconds.
Time per request: 2157121 microseconds.
TLS speed: 1.94 MB/s
SHA256 speed: 459.57 MB/s
Compression speed: 1249.56 MB/s
Decompress speed: 3983.17 MB/s
AES256/GCM speed: 2114.12 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name │ Value │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 1.94 MB/s (0%) │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum comptation speed │ 459.57 MB/s (22%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed │ 1249.56 MB/s (58%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed │ 3983.17 MB/s (49%) │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed │ 2114.12 MB/s (56%) │
└───────────────────────────────────┴────────────────────┘
 
I'm reading some stuff about how http/2 has it's own flow control. Perhaps over low latency the existing PBS code works well, but with added latency of the WAN connection (in my case 35 ms) it doesn't work well?

Poking around in the code, I saw some mention regarding various flow control parameters:
Code:
src/api2/backup.rs:                // increase window size: todo - find optiomal size
src/api2/reader.rs:                        // increase window size: todo - find optiomal size
 
Last edited:
are all of your affected systems at Hetzner? could you post the specs of both client and server, as well as what the connection in-between looks like? thanks!
 
are all of your affected systems at Hetzner? could you post the specs of both client and server, as well as what the connection in-between looks like? thanks!

yes all of my affected systems are at hetzner

server:
SX132:

Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
128 GB RAM
10x ST10000NM0568 10TB SATA + 2 SAMSUNG_MZQLB960HAJR DC SSDS
Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (Intel X520)
zpool in raidz2 mode

client:
AX61-NVME

AMD Ryzen 9 3900 12-Core Processor
128 GB RAM
Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
2x SAMSUNG_MZQLB1T9HAJR

zpool in mirror mode

Both Servers are in the same Proxmox Cluster connected via the hetzner vswitch. But tests were made over the vswitch and over the interface with the public ip

vswitch ist configured with a mtu of 1400

Sample /etc/network/interfaces:

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
#source-directory /etc/network/interfaces.d

auto lo
iface lo inet loopback

auto enp35s0
iface enp35s0 inet static
address publicipv4/26
gateway publicipv4gateway

iface enp35s0 inet6 static
address publicipv6/64
gateway fe80::1

auto enp35s0.4000
iface enp35s0.4000 inet manual
up ifconfig enp35s0.4000 0.0.0.0 up
post-up ip link set enp35s0.4000 mtu 1400
vlan-raw-device enp35s0

auto vmbr4
iface vmbr4 inet static
mtu 1400
address 10.23.40.9/21
bridge_ports enp35s0.4000
bridge_stp off
bridge_fd 0

auto vmbr6
iface vmbr6 inet6 static
address ipv6netforvms/117
bridge_ports none
bridge_stp off
bridge_fd 0

Best regards
 
Last edited:
I'm reading some stuff about how http/2 has it's own flow control. Perhaps over low latency the existing PBS code works well, but with added latency of the WAN connection (in my case 35 ms) it doesn't work well?

Poking around in the code, I saw some mention regarding various flow control parameters:
Code:
src/api2/backup.rs:                // increase window size: todo - find optiomal size
src/api2/reader.rs:                        // increase window size: todo - find optiomal size

My ping between the machines are 1ms over WAN. Still getting 7MB/s TLS performance benchmark.
 
My ping between the machines are 1ms over WAN. Still getting 7MB/s TLS performance benchmark.

as indicated above, please answer the following questions:
- are (all of) your affected systems at Hetzner?
- could you post the specs of both client and server?
-what does the connection in-between look like?

thanks!
 
@fabian to be clear as possible. the results so far.

2 hetzner servers with 1Gbe network card = no problem, fast backup and fast restore speeds over vswitch and non vswitch connection
2 hetzner servers one with 1Gbe network card and one with 10Gbe network card (Intel X520) = only fast backup speed with nginx (stream or http proxy mode) in front of the proxmox-backup-proxy and still slow restore speed (2-3MB/s)
 
I can confirm the issue, and the mitigation.
I have an upstream bandwidth limit of 40Mbps.

Direct connection to PBS:

~$ proxmox-backup-client benchmark --repository 10.X.Y.9:filestore Uploaded 10 chunks in 18 seconds. Time per request: 1861884 microseconds. TLS speed: 2.25 MB/s

Connection over NGINX LB in separate container on the same host:


~$ proxmox-backup-client benchmark --repository 10.X.Y.2:filestore Uploaded 13 chunks in 11 seconds. Time per request: 873551 microseconds. TLS speed: 4.80 MB/s

NGINX stream config:

stream { upstream backend { server 10.X.Y.9:8007; } server { listen 8007; proxy_connect_timeout 1s; proxy_timeout 3s; proxy_pass backend; proxy_next_upstream on; } }

Remote host is at Hetzner, connectivity between hosts runs over wireguard tunnels between vyos guests on the hosts themselves. MTU 1420, MSS clamping to 1380.
Resource consumption is also much lower on the PBS host. Memory consumption with only vyos and PBS running used to hover around 50% while uploading, and is now at 3%.
PBS host specs: Xeon 1650v3, 128GB RAM, 2 x 512GB NVME as root, 2 x 2TB SATA as backup store, Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
 
Last edited:
  • Like
Reactions: jayg30
Thanks for that post, @ceejay. I tried nginx with that config, created a new PBS storage that points to it, but it doesn't seem to be helping in my case. I'm still getting the same slow speed doing a backup, a little over 2MB/s. :(
 
as indicated above, please answer the following questions:
- are (all of) your affected systems at Hetzner?
- could you post the specs of both client and server?
-what does the connection in-between look like?

thanks!

Could you try to setup a PBS with a VPS
I can confirm the issue, and the mitigation.
I have an upstream bandwidth limit of 40Mbps.

Direct connection to PBS:

~$ proxmox-backup-client benchmark --repository 10.X.Y.9:filestore Uploaded 10 chunks in 18 seconds. Time per request: 1861884 microseconds. TLS speed: 2.25 MB/s

Connection over NGINX LB in separate container on the same host:


~$ proxmox-backup-client benchmark --repository 10.X.Y.2:filestore Uploaded 13 chunks in 11 seconds. Time per request: 873551 microseconds. TLS speed: 4.80 MB/s

NGINX stream config:

stream { upstream backend { server 10.X.Y.9:8007; } server { listen 8007; proxy_connect_timeout 1s; proxy_timeout 3s; proxy_pass backend; proxy_next_upstream on; } }

Remote host is at Hetzner, connectivity between hosts runs over wireguard tunnels between vyos guests on the hosts themselves. MTU 1420, MSS clamping to 1380.
Resource consumption is also much lower on the PBS host. Memory consumption with only vyos and PBS running used to hover around 50% while uploading, and is now at 3%.
PBS host specs: Xeon 1650v3, 128GB RAM, 2 x 512GB NVME as root, 2 x 2TB SATA as backup store, Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

It is still pretty slow after using nginx proxy?
 
It is still pretty slow after using nginx proxy?

That's true, but now I'm limited by my ISP's policing instead of suboptimal TCP throughput.
My upload is limited to 40Mpbs, and I get sustained tranfer rates that are >95% of that limit.

Code:
Interfaces                     │ RX bps       pps     %│ TX bps       pps     %
>wg2110                       │  38.79Mb    3.34K     │   1.00Mb    1.56K

Before that backups ran at 17-18 Mbps, so I'm pretty pleased with that result.

I have yet to capture and analyze some traffic for RCA, but this seems to indicate that PBS uses a TCP profile that is highly optimized for maximizing throughput over low-latency links, while the NGINX default TCP server profile is tuned more towards maximizing throughput over higer latency links (delayed ACK's/SACK, window size/scaling, nagling, ...).

UPDATE

<TLDR>

Code:
echo 'net.ipv4.tcp_adv_win_scale=2' >> /etc/sysctl.conf
sysctl -p
</TLDR>

Analysis of packet captures does indeed reveal that the receiving host (PBS) does not scale the TCP window as one would expect. Even though the host signals its TCP window scaling capability in the SYN/ACK, it never uses it, and so bandwidth suffers on higher latency links. This might be because of my specific setup (the TCP session terminates on a Linux bridge without any physical NIC connected to that bridge), I can't verify that at this time.

A bit of Google-fu produced this link: https://www.acc.umu.se/~maswan/linux-netperf.txt, which states
Code:
TCP performance is limited by latency and window size (and overhead, which
reduces the effective window size)
...
The overhead is: window/2^tcp_adv_win_scale (tcp_adv_win_scale default is 2)
Debian 10 has that parameter set at 1 - as in 'on' I always figured - but apparently the value dictates how much of the TCP receive buffer can not be used for windowing by the formula stated above. A value of 1 means that only half the TCP buffer memory can be used to grow the TCP window. Setting the value to '2' changes that ratio from half the buffer to three quarters of the buffer, and that has in fact solved my throughput issues on my high latency link.

I now get to use the full upload capacity of my ISP connection for backups to PBS, without a reverse proxy/loadbalancer in between:

Code:
~$ proxmox-backup-client benchmark --repository 10.X.Y.9:filestore
Uploaded 13 chunks in 11 seconds.
Time per request: 858702 microseconds.
TLS speed: 4.88 MB/s
SHA256 speed: 432.77 MB/s
Compression speed: 1417.94 MB/s
Decompress speed: 6037.03 MB/s
AES256/GCM speed: 3010.37 MB/s
 
Last edited:
  • Like
Reactions: Luckyj and jayg30
@ceejay very interesting information. I tweaked that sysctl and did get more throughput. Though only about double, 4MB/s vs 2. So still terrible for me on a 1Gbps link. I'll double check shortly though, to make sure I did that right.
 
Maybe not relevant but FYI: Hetzners vswitches are currently limited to about 1G withing a vswitch vlan.

This problems seem very interesting, usually there are nearby zero networking issues with Hetzner at multiple projects we maintain in their infra.
 
as i wrote a few posts earlier. the problem at least for us is between a host with 1Gbe and 10Gbe network card. if all your hosts are connected via 1gbe there is as far as i know no problem
 
@ceejay very interesting information. I tweaked that sysctl and did get more throughput. Though only about double, 4MB/s vs 2. So still terrible for me on a 1Gbps link. I'll double check shortly though, to make sure I did that right.


Did you try higher values, ie 4?
What is the output of this command?

cat /proc/sys/net/ipv4/tcp_rmem

What kind of latency do you see between these hosts?
Do you have a packet capture available (including TCP handshake)?
 
Did you try higher values, ie 4?
What is the output of this command?

cat /proc/sys/net/ipv4/tcp_rmem

What kind of latency do you see between these hosts?
Do you have a packet capture available (including TCP handshake)?

Yes, I tried higher values, and the transfer rate went up a little more, but not much. If I use rsync to push a local pbs datastore to the remote one, just as a test, with no attempt to optimize, I get around 60MB/s. Same hosts involved in doing a pve backup to pbs.

Latency is about 35-38ms average.

tcp_rmem is: 4096 131072 6291456

I have root on all systems, so can do tcpdumps as needed.
 
how to solve proxmox backup slow tls performance nginx @fichte method without recompile proxmox backup:
/etc/sysctl.conf:
net.ipv4.ip_forward = 1

Code:
sysctl -p

iptables:
Code:
iptables -t nat -A PREROUTING -p tcp -i eno1-or-your-network-interface --dport 8007 -j DNAT --to-destination your-public-ip:8006

iptables -A FORWARD -p tcp -d  your-public-ip --dport 8006 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT



apt install iptables-persistent

cp /etc/iptables/rules.v4 /etc/iptables/rules.v4.backup
iptables-save > /etc/iptables/rules.v4

nginx.conf:
Code:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
        worker_connections 20480;
        # multi_accept on;
}

stream {

    server {
        listen 8006;
        proxy_pass 127.0.0.1:8007;
    }

}



before the changes:
4mb/s

after changes:
33mb/s

Regards,
 
Last edited:
  • Like
Reactions: proxmoxrks
does this also fix slow restore speed?

cheers
fichte

how to solve proxmox backup slow tls performance nginx @fichte method without recompile proxmox backup:
/etc/sysctl.conf:
net.ipv4.ip_forward = 1

Code:
sysctl -p

iptables:
Code:
iptables -t nat -A PREROUTING -p tcp -i eno1-or-your-network-interface --dport 8007 -j DNAT --to-destination your-public-ip:8006

iptables -A FORWARD -p tcp -d  your-public-ip --dport 8006 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT



apt install iptables-persistent

cp /etc/iptables/rules.v4 /etc/iptables/rules.v4.backup
iptables-save > /etc/iptables/rules.v4

nginx.conf:
Code:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
        worker_connections 20480;
        # multi_accept on;
}

stream {

    server {
        listen 8006;
        proxy_pass 127.0.0.1:8007;
    }

}



before the changes:
4mb/s

after changes:
33mb/s

Regards,
 
does this also fix slow restore speed?

cheers
fichte

hello,

this improves speed of backups, is a temporaly fix Until they solve it but restores still slow :'c i'm trying to solve but now still slow restores

Regards,
 
I decided to try PBS over long distance WAN (35ms ping), with 1Gb/s connection on both sides.
I was getting only 1.8MB/s and then saw the @fichte post about nginx in stream mode! I tried it with:

I have NAT rule on my FW to redirect send traffic from 8007 to 8107, and then configured nginx as follows:
NGINX:
stream {
  server {
    listen 8107;
    proxy_pass 127.0.0.1:8007;
  }
}

and the speed improved to 20MB/s! About the same I get with iperf3 single connection.

before: 1.8MB/s
now: 20MB/s
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!