PBS parallel backups & PBS performance

Oct 30, 2025
30
1
8
Hi,

referencing this post: https://forum.proxmox.com/threads/how-to-enable-concurrent-pbs-backups.181357/

In my case I have 25G network and my source and destination storage is NVMe. Yet, my backups run only with around 3gbit/s.
Both network throughput and disk write/read is tested and confirmed to be faster than 3gbit/s.

In my opinion this is because the CPUs in the PVE hosts don't support sha-ni.

Code:
SHA256 speed: 459.98 MB/s
Compression speed: 471.19 MB/s
Decompress speed: 779.01 MB/s
AES256/GCM speed: 3378.67 MB/s
Verify speed: 289.79 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ not tested         │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 459.98 MB/s (23%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 471.19 MB/s (63%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 779.01 MB/s (65%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 289.79 MB/s (38%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 3378.67 MB/s (93%) │
└───────────────────────────────────┴────────────────────┘

During "proxmox-backup-client benchmark" or actual VM-backups the CPU (Xeon Gold 6226R) is never saturated.

So now I'm thinking that I could easily run multiple backups at once, since neither CPU, network or storage are saturated during backups.

thanks
 
Interesting. Network throughput during backup is strangely close to 1gbit. VM's have a different bond.
Bash:
while true; do   T1=$(cat /sys/class/net/bond0/statistics/tx_bytes);   sleep 1;   T2=$(cat /sys/class/net/bond0/statistics/tx_bytes);   echo "TX: $(( (T2-T1) / 1024 / 1024 )) MB/s"; done
TX: 106 MB/s
TX: 110 MB/s
TX: 103 MB/s
TX: 119 MB/s
TX: 92 MB/s
TX: 139 MB/s

But ipferf looks fine.

Bash:
Connecting to host x, port 5202
[  5] local x port 53930 connected to x port 5202
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   947 MBytes  7.94 Gbits/sec  2694    874 KBytes
[  5]   1.00-2.00   sec   933 MBytes  7.82 Gbits/sec  1651    660 KBytes
[  5]   2.00-3.00   sec   964 MBytes  8.09 Gbits/sec  1504   1.06 MBytes
[  5]   3.00-4.00   sec   967 MBytes  8.11 Gbits/sec  486   1.18 MBytes
[  5]   4.00-5.00   sec   969 MBytes  8.13 Gbits/sec  683   1.03 MBytes
[  5]   5.00-6.00   sec   993 MBytes  8.32 Gbits/sec  488   1.31 MBytes
[  5]   6.00-7.00   sec   777 MBytes  6.52 Gbits/sec  3119    559 KBytes
[  5]   7.00-8.00   sec   960 MBytes  8.06 Gbits/sec  1354    822 KBytes
[  5]   8.00-9.00   sec   953 MBytes  8.00 Gbits/sec  136   1.19 MBytes
[  5]   9.00-10.00  sec   971 MBytes  8.14 Gbits/sec  555    587 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  9.21 GBytes  7.91 Gbits/sec  12670            sender
[  5]   0.00-10.00  sec  9.21 GBytes  7.91 Gbits/sec                  receiver

And just to be complete, this is the read spead on PVE
Bash:
Timing buffered disk reads: 4522 MB in  3.00 seconds = 1506.02 MB/sec

And write on PBS
Bash:
dd if=/dev/zero of=/mnt/datastore/testfile bs=1M count=10000 oflag=direct status=progress
10245636096 bytes (10 GB, 9.5 GiB) copied, 8 s, 1.3 GB/s 
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 8.14487 s, 1.3 GB/s

Taskinfo
Bash:
INFO:   0% (784.0 MiB of 127.0 GiB) in 3s, read: 261.3 MiB/s, write: 233.3 MiB/s
INFO:   1% (1.4 GiB of 127.0 GiB) in 7s, read: 172.0 MiB/s, write: 172.0 MiB/s
INFO:   2% (2.7 GiB of 127.0 GiB) in 14s, read: 184.0 MiB/s, write: 184.0 MiB/s
INFO:   3% (3.8 GiB of 127.0 GiB) in 20s, read: 196.7 MiB/s, write: 196.7 MiB/s
INFO:   4% (5.1 GiB of 127.0 GiB) in 27s, read: 187.4 MiB/s, write: 187.4 MiB/s
INFO:   5% (6.4 GiB of 127.0 GiB) in 34s, read: 186.9 MiB/s, write: 186.9 MiB/s
INFO:   6% (7.7 GiB of 127.0 GiB) in 41s, read: 187.4 MiB/s, write: 187.4 MiB/s
INFO:   7% (8.9 GiB of 127.0 GiB) in 48s, read: 183.4 MiB/s, write: 183.4 MiB/s
INFO:   8% (10.2 GiB of 127.0 GiB) in 55s, read: 181.1 MiB/s, write: 181.1 MiB/s
INFO:   9% (11.5 GiB of 127.0 GiB) in 1m 1s, read: 217.3 MiB/s, write: 186.0 MiB/s
INFO:  10% (12.7 GiB of 127.0 GiB) in 1m 8s, read: 185.1 MiB/s, write: 182.3 MiB/s
 
  • Like
Reactions: news
Bash:
Uploaded 187 chunks in 5 seconds.
Time per request: 27121 microseconds.
TLS speed: 154.65 MB/s
SHA256 speed: 318.66 MB/s
Compression speed: 358.02 MB/s
Decompress speed: 502.67 MB/s
AES256/GCM speed: 1150.99 MB/s
Verify speed: 203.95 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 154.65 MB/s (13%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 318.66 MB/s (16%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 358.02 MB/s (48%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 502.67 MB/s (42%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 203.95 MB/s (27%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 1150.99 MB/s (32%) │
└───────────────────────────────────┴────────────────────┘

The CPU in my PBS Testserver is very old. 4 Sockets with Xeon(R) CPU E5-4627 v2.
In current times it is difficult to aquire test hardware.
But it supports AES-ni. Is it missing anything that PBS needs for reasonable throughput?
 
Can someone with a beefy PBS-Host do this Benchmark? I ran it now on a VM with a Xeon 6226R CPU, but the results are still bad.
I found the comparison results here, but those are without TLS, which seems to be the lowest denominator. https://pve.proxmox.com/wiki/PBS_Client_CPU_Benchmark

Code:
root@pbsprimary:/# proxmox-backup-client benchmark --repository pbs-backup
Uploaded 114 chunks in 5 seconds.
Time per request: 44650 microseconds.
TLS speed: 93.94 MB/s
SHA256 speed: 342.77 MB/s
Compression speed: 322.73 MB/s
Decompress speed: 469.63 MB/s
AES256/GCM speed: 429.54 MB/s
Verify speed: 230.05 MB/s
┌───────────────────────────────────┬───────────────────┐
│ Name                              │ Value             │
╞═══════════════════════════════════╪═══════════════════╡
│ TLS (maximal backup upload speed) │ 93.94 MB/s (8%)   │
├───────────────────────────────────┼───────────────────┤
│ SHA256 checksum computation speed │ 342.77 MB/s (17%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 compression speed    │ 322.73 MB/s (43%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 decompression speed  │ 469.63 MB/s (39%) │
├───────────────────────────────────┼───────────────────┤
│ Chunk verification speed          │ 230.05 MB/s (30%) │
├───────────────────────────────────┼───────────────────┤
│ AES256 GCM encryption speed       │ 429.54 MB/s (12%) │
└───────────────────────────────────┴───────────────────┘
 
Last edited:
What speeds do you expect?
I think that 3GB/s per second is already pretty fast.
What is your target speed you aim at?
Is the current speed a problem? Or are you just surprised that you can't reach line speed and suspect an issue with your setup?

Either way, here is my not so beefy setup.

PVE is a 12 x Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz.
PBS is remote, some cheap consumer AMD 12 x AMD Ryzen 5 8500G, with only 1GBit/s WAN.
So TLS speeds above 120MB/s would be impossible.

PVE:
Code:
root@pve:~# proxmox-backup-client benchmark --repository \
'test@pbs@[redactedIPv6]:8007:poolBackup'
Password for "test@pbs": *********
Uploaded 134 chunks in 5 seconds.
Time per request: 41668 microseconds.
TLS speed: 100.66 MB/s
SHA256 speed: 611.79 MB/s
Compression speed: 563.96 MB/s
Decompress speed: 858.65 MB/s
AES256/GCM speed: 4709.50 MB/s
Verify speed: 355.68 MB/s
┌───────────────────────────────────┬─────────────────────┐
│ Name                              │ Value               │
╞═══════════════════════════════════╪═════════════════════╡
│ TLS (maximal backup upload speed) │ 100.66 MB/s (8%)    │
├───────────────────────────────────┼─────────────────────┤
│ SHA256 checksum computation speed │ 611.79 MB/s (30%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 compression speed    │ 563.96 MB/s (75%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 decompression speed  │ 858.65 MB/s (72%)   │
├───────────────────────────────────┼─────────────────────┤
│ Chunk verification speed          │ 355.68 MB/s (47%)   │
├───────────────────────────────────┼─────────────────────┤
│ AES256 GCM encryption speed       │ 4709.50 MB/s (129%) │
└───────────────────────────────────┴─────────────────────┘

PBS:

Code:
root@pbs:~# proxmox-backup-client benchmark
SHA256 speed: 2492.09 MB/s   
Compression speed: 725.79 MB/s   
Decompress speed: 991.39 MB/s   
AES256/GCM speed: 11307.45 MB/s   
Verify speed: 711.66 MB/s   
┌───────────────────────────────────┬──────────────────────┐
│ Name                              │ Value                │
╞═══════════════════════════════════╪══════════════════════╡
│ TLS (maximal backup upload speed) │ not tested           │
├───────────────────────────────────┼──────────────────────┤
│ SHA256 checksum computation speed │ 2492.09 MB/s (123%)  │
├───────────────────────────────────┼──────────────────────┤
│ ZStd level 1 compression speed    │ 725.79 MB/s (97%)    │
├───────────────────────────────────┼──────────────────────┤
│ ZStd level 1 decompression speed  │ 991.39 MB/s (83%)    │
├───────────────────────────────────┼──────────────────────┤
│ Chunk verification speed          │ 711.66 MB/s (94%)    │
├───────────────────────────────────┼──────────────────────┤
│ AES256 GCM encryption speed       │ 11307.45 MB/s (310%) │
└───────────────────────────────────┴──────────────────────┘

Maybe I am wrong, since I am not that expirienced with the benchmark tool, so take the next part with a huge grain of salt.

According to your benchmark, it is pretty clear that the TLS upload speed is your bottleneck. Why? Well, it is the lowest number you got ;)
Problem with that TLS? It does not test one thing, but many things.

First of all, it benchmarks how fast you can encrypt tls on PVE. This is an easy task. Speed should be the same as the last row AES256, since my guess is that this is what it uses.

Second part is the network. Or to be even more precise, the peering from PVE to PBS. You can simply rule out that bottleneck by running iperf.
I some cases, there might be some firewall or NAT shenanigans going on. So it could be possible that iperf performs differently from the real world. Just like this tool, iperf is only a benchmark afterall :) In German we say, "wer misst, misst Mist". Who measures, measures crap.

Third part is the chunk verfication speed on PBS. Emphasis on PBS, not PVE. You get 290MB/s, so that is a CPU bottleneck on the PBS part as well.

Fourth part would be storage. PBS writes random according to @Chris , not sequential.
Althoug since the chunks are around +2MB, I don't think that is as bad as writing random 4k (which is another, often used benchmark).
I don't know if it does sync or async writes. My guess is that it writes async, since the verfiy option would catch a lost write.
I also don't know, if it does write in parallel. My guess is yes.
So instead of random 4k Q1D1 write, a more true to life benchmark would probably be 4K Q64T8. Which is another bottleneck.
 
Last edited:
  • Like
Reactions: Johannes S
I'm currently comparing against Veeam, which is achieving around 10gbit/s. So Veeam is 3 times faster, which would put my backup window, if we choose to switch to PBS, in danger.

Thanks for the test! Can you run the same test directly on the PBS?

I already posted some iperf results earlier, between PVE and PBS. No issues here.

I think its quite likely that the CPU is the culprit - I just can't find any data supporting it. When I run a backup and I check the Core usage with "mpstat -P ALL 1" no core is every fully utilized. Not on PBS and not on PVE. On PVE a single core sometimes spikes to 50% but then quickly goes back to low utilization.

The source and target storage is a Pure X50 system. I really can't imagine this being a bottleneck.

I found this post where someone with quite a similar setup is experiencing the same problems.
He is saying that having CPU's that support sha-ni give a big bump in throughput. Sadly I don't have access to a PVE nor PBS system with such a CPU.
Maybe I try with a current generation desktop PC.

Edit: Also all this discussion leads me back to the question "can I run multiple VM backups at the same time without splitting up in multiple jobs".

Edit2: The conclusion of the old post basically was that you need a newer CPU with sha-ni to reach speeds over 200mb/s:
It was an eternity ago and still the same issue today.
9374F -> 1gb/s limit
Xeon 4210R -> 200mb/s limit
 
Last edited:
Thanks for the test! Can you run the same test directly on the PBS?
Second test was on the PBS.
Not sure if I can follow.
I think its quite likely that the CPU is the culprit - I just can't find any data supporting it.
I would even argue that the culprit is both CPUs.
The first clear bottleneck seems to be TLS.
I gave you 4 parts of TLS.

We can rule out 1.

According to you, we can also rule out 2.

For 3, you get 203MB/s. That is IMHO a clear bottleneck.
My cheap consumer AMD CPU gets 700MB/s.

So if your goal is to reach 10GBit/s, you need at least 1250MB/s in that benchmark.

And of course you also need to account for #4, which is storage speed. We don't know what you got there.

But if your goal is 10GBit/s, you will run into multiple bottlenecks, both on PVE and PBS. As soon as you got the PBS Chunk verification speed
out of the way (which will in return fix your PVE TLS speed), your next bottleneck will be SHA256 checksum computation speed on PVE and ZStd level 1 compression speed on PVE.

Making stuff parallel on the same hosts will not change any of that.

PS: Remember that this is all based on my unproven assumptions on how the process works in my previous post.
 
Last edited:
  • Like
Reactions: news
Second test was on the PBS.
Not sure if I can follow.
Sorry, I mean't if you can use the --repository handle on the PBS aswell.
Also, I didn't fully understand your post initially, but now I grasped it better.

I don't fully understand how you came to that conclusion. The first bottleneck seems to be TLS.
I came to this conclusion from this Thread, especially this post. Someone with a very similar setup to mine gets similar results. And they improve, after changing to newer CPU's (that support sha-ni).


For 3, you get 203MB/s. That is IMHO a clear bottleneck.
My cheap consumer AMD CPU gets 700MB/s.

So if your goal is to reach 10GBit/s, you need at least 1250MB/s in that benchmark.
Your cheap consumer AMD CPU probably is much much newer, and therefore has different instruction sets, especially sha-ni (I think AMD has that since Zen 4).

And of course you also need to account for #4, which is storage speed. We don't know what you got there.

But if your goal is 10GBit/s, you will run into multiple bottlenecks, both on PVE and PBS. As soon as you got the PBS Chunk verification speed
out of the way (which will in return fix your PVE TLS speed), your next bottleneck will be SHA256 checksum computation speed on PVE and ZStd level 1 compression speed on PVE.
My storage is a Pure X50 NVMe Allflash. On both ends. Connected over 16gb FC.
Making stuff parallel on the same hosts will not change any of that.
Thats good input. My assumption was that this is some kind of single core limitation which gets worsened by my very old CPU and missing CPU instruction sets. And my assumption was, that by running multiple backups at the same time, I can work around this limitation.

Just to mention this - IF we decide to switch to PBS it will get new hardware. Currently planning with Xeon® 6515P as CPU. But PVE for now will stay on Xeon Gold 6226R CPUs, which don't support sha-ni.

Since I don't have the means to make a better POC with newer hardware. If you can make an estimation - if neither storage nor network will be limiting, what can you expect with the given setup?
It would probably be limited by the SHA256 Checksum performance, so ~450MB/s in my case?
 
My assumption was that this is some kind of single core limitation
AFIK you can easely test this. Just start two backups at the same time. Then run it again. Now it should be 100% dublicates and you are only benchmarking SHA256

It would probably be limited by the SHA256 Checksum performance, so ~450MB/s in my case?
That is exactly my guess.
3,6GBit/s is pretty far away from your goal of 10GBit/s.

Alternatively, you could get a faster setup by offloading files to some kind of file based backup instead of blockstorage.
So that the total amount of data that you have to backup with "only" 3,6GBit/s is lower.