Abysmally slow restore from backup

damarrin

New Member
Oct 11, 2022
7
0
1
Hello,

My woes with my HP Microserver Gen8 continue (here's my previous thread, now thankfully solved... mostly), as I'm now getting terrible performance with backups.

I'm restoring backups to lvm-data that's located on a 7200rpm HDD. The SATA controller is this:

Code:
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)

The backups are stored on PBS connected with gigabit ethernet. The restore starts fine, with write performance around 20-30 MB/s, which is what I'd expect. After a few minutes, write speed falls to single megabytes, at times as low as a few kilobytes, as measured by iotop.

I have ipire in a vm (with virtio-scsi-single and iothread and aio=threads set as per the other thread) as a router and pihole in a container as a DNS server running on this machine as well. Ipfire continues working, but pihole running in the container becomes unresponsive and I basically get no internet.

As a result, a backup that took 5 minutes to make (my PBS is running on an ancient Core 2 Quad machine with a couple of WD 7200 HDDs and everything is super fine there) will need an hour to restore on this machine. To boot, after it reports 100% restored, it sits frozen for another 10 minutes before TASK OK appears. During all this time, IO delay is very high (60-90%). Presumably, it continues writing to the HDD from cache. Once that's done, VMs and containers become responsive again.

I guess it must be a problem with how the SATA controller is configured by Proxmox. I will be very grateful for ideas I could try to make it perform well. Thanks.
 
Hello , this is my post regarding this subject: https://forum.proxmox.com/threads/backup-speed-vs-restore-speed.106873/#post-466581
My general conclusion was:
- backup speed is very satisfactory - 90% saturation of 1G net work and this is considered expected ... no matter backing up 1 VM or more VMs from more hosts
- restore speed of 1VM goes down to 20%+ (cca 200Mbs+ ) , BUT ... if we restore more VMs in the same time (no matter restoring to the same host or to more hosts in cluster) newtork link wil be satureted up to 80-90% and no vm-restore affects/slows down any other ..
Conclusion: PBS can serve out 4 concurrent restores occupying 1G link at 90% , BUT when restoring one VM restore speed cannot get faster but 200+Mbs
Literaly, when restoring 4 VMs , PBS reads 4 times more chunks in the same time than when restoring 1 VM ... with the same sata 7200 disks.

How come that raid10 (4 x 2T sata 7200rpm) is behaving very slow when restoring 1VM, but suddenly very fast when restoring 4 VMs at the same time , saturating whole bandwidth so that we can say that bottleneck is network itself and not slow sata disks trying to read many little chunks ?

Involving metadata special cache(disks) did not help much ...

At the end , we are facing system that is (much) faster in writting than reading which is not that usual

I think there is still space for design-improvement ... or I'm very wrong regarding this experience and description?
 
Last edited:
  • Like
Reactions: lucius_the
I use the latest version of Proxmox 8.1.4 and server backup 3.1. I encountered a similar problem.
On a machine with virtual machines I use 6 nvme in raid 6 mdadm, on a backup server 36 SAS raidz2.
Also the biggest problem is the low recovery speed of one virtual machine (network speed is approximately 40Mb/sec).
But if you run 6 simultaneous virtual machines, you can get a speed of 200 MB/sec. Network between backup
server and cluster - 10G. Network latency does not exceed 3ms.
Is there a solution to the problem with the low recovery speed of one VM?
 
  • Like
Reactions: lucius_the
I believe there are some serious issues in how PBS is working. I was testing a restore from PBS to PVE today (10 Gbps LAN link / low-latency, NVME is used on both sides -> restore speed is barely 60 MB/s). Backup is faster than this. Verify is faster than this. Tested with proxmox-backup-client benchmark on both sides, everything should be faster. storage is not the bottleneck, CPU is not the bottleneck, network also. Sp I have no explanation besides: something is wrong with PBS. PBS restore speed is not acceptable. I will try Veeam B&R on same hardware and see what I'll get.
 
I believe there are some serious issues in how PBS is working. I was testing a restore from PBS to PVE today (10 Gbps LAN link / low-latency, NVME is used on both sides -> restore speed is barely 60 MB/s). Backup is faster than this. Verify is faster than this. Tested with proxmox-backup-client benchmark on both sides, everything should be faster. storage is not the bottleneck, CPU is not the bottleneck, network also. Sp I have no explanation besides: something is wrong with PBS. PBS restore speed is not acceptable. I will try Veeam B&R on same hardware and see what I'll get.
could you maybe provide more details? what kind of backup, how are you restoring it, if possible, also post the restore log/output (client/PVE side) and the reader task log (server/PBS side)
 
Hi @fabian,
I'll see if I can make some time to collect all details for you. I lost a couple of days (plus a whole night yesterday) testing PBS performance... I currently have many other things to tend to. I will come back and report details, but what I know so far about "the bad" in PBS - that the teams knows of and so far hasn't done anything I'll post here:

- PBS is using HTTP/2 for chunk transfer, which causes a single TCP connection for transmission -> there are more cases where this is counter productive then there are cases where this is beneficial: congested links (where multiple streams help a lot), LACP bond with layer3+4 hashing (can't utilize available bond fully because it's a single TCP stream), or any network situation that benefits from parallel/multiple streams (in real life there are many such) - and I don't really know where this is beneficial. Is that to avoid TCP slow start ? When using multiple TCP connections don't open a new one for every chunk transfer, individual chunks are rather smallish, but stream new chunks over existing TCP connection - this will avoid any issues with TCP slow start. With that one out, is there any other real benefit to using for a single connection ? Well make it configurable so people can type in the number of connection they want for transfers. Then whoever wants 1 TCP stream they can have it. I suspect most will not want that.

- using SHA-256 checksums for chunks -> xxHash-128 would be much better choice technically. Besides being more robust in detecting bitflips (it's designed for that) it is also MUCH faster to calculate. That would cause speedup in all operations - backup, verify and restore. Using a cryptographic algorithm here is unnecessary, it's slow and inefficient. I'd suggest xxHash because I recently made a study on which one to use and xxHash came out as a winner. Of course there are other, non crypto hashing algs for this as well. To keep the new code compatible with existing chunks, simply add a field to index file that defines what hashing alg is used (then PBS can support both).

- there are virtually no "configuration knobs" for PBS. I can't choose compression algorithm, checksum algorithm, can't configure a number of threads PBS will use, can't configure the numbery of TCP streams for transfer, etc. PBS is way too "fixed" to defaults, everything is hardcoded, no options. There is no such thing as a one-size-fits-all. Options are needed.

- in my testing NVME/SSD vs HDD also doesn't matter really. Default suggestions all over the are to use all-flash storage for PBS. Such suggestions are too generic and superficial. That kind of information can be very misleading and cause a lot of customer grief, after they spend a lot of money on flash only to find out that things are not going faster. PBS client benchmark tool is good, but not perfect, it's also a bit too generic, not clear on details.

Please note I'm not making these things up from some random reading on the web - I am a developer with 25+ years of experience, owning a company whose core business is software development. Some of the above points are very real design change elements that WILL make a difference in performance. PBS has bottlenecks in places where there shouldn't be any. Your team should rather acknowledge this, instead of using the "throw more hardware at it" that I've seen on the forums. This is open source - so yes I - could help with development directly. Problem is I work in .NET solely, so can't contribute to code. I can help with design suggestions only. And so far I've spotted a few obvious ones. I'm not alone and unfortunately there's not been much movement around these issues.

I will send you numbers around restore operations, as soon as I get make some more time so we can continue the journey of "getting to the bottom" so you can fix the issues and improve the end product -> if you are willing to go that route, that is. The moment I see another "your CPU is too old" or "you should be using all-flash" - I'm leaving discussion and putting my efforts back to Veeam instead - at least Gostev there (the product manager) has listened to my inputs, agreed and implemented the changes (on more than one occasion) and made very positive impacts. My point is: they listened and you should do so as well - don't dismiss suggestions coming from power users, work with them to make the product better. There are open things in bugzilla regarding PBS performance, that are being ignored - that's not good.
 
could you maybe provide more details? what kind of backup, how are you restoring it, if possible, also post the restore log/output (client/PVE side) and the reader task log (server/PBS side)
This is a VM backup. I'm restoring to PVE server, located in same LAN as PBS server, connected via 10G (via switch).

The PVE server is a newer machine, PBS is an older machine. Both are using nvme storage (PBS datastore is on nvme, currenty XFS, tried previously with ZFS - same effect; on PVE the nvme storage a ZFS mirror over two nvme PCIEx cards). During restore the CPU and network are not even close to saturation on either sides. storage is nvme, so plenty of IOPS left.

root@pve-main:~# proxmox-backup-client benchmark
SHA256 speed: 1231.86 MB/s
Compression speed: 503.83 MB/s
Decompress speed: 801.84 MB/s
AES256/GCM speed: 1847.84 MB/s
Verify speed: 485.58 MB/s

root@kom1:~# proxmox-backup-client benchmark
SHA256 speed: 215.00 MB/s
Compression speed: 245.70 MB/s
Decompress speed: 314.62 MB/s
AES256/GCM speed: 533.46 MB/s
Verify speed: 125.45 MB/s

new volume ID is 'nvme_pool:vm-905-disk-0'
new volume ID is 'nvme_pool:vm-905-disk-1'
new volume ID is 'nvme_pool:vm-905-disk-2'
restore proxmox backup image: /usr/bin/pbs-restore --repository test@pbs@10.10.40.31:bazen_pbsstore --ns PVE-DELL vm/123/2024-11-05T22:09:22Z drive-efidisk0.img.fidx /dev/zvol/nvme_pool/vm-905-disk-0 --verbose --format raw --skip-zero
connecting to repository 'test@pbs@10.10.40.31:bazen_pbsstore'
open block backend for target '/dev/zvol/nvme_pool/vm-905-disk-0'
starting to restore snapshot 'vm/123/2024-11-05T22:09:22Z'
download and verify backup index
progress 100% (read 540672 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=540672, duration=0.00s, speed=226.00MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository test@pbs@10.10.40.31:bazen_pbsstore --ns PVE-DELL vm/123/2024-11-05T22:09:22Z drive-tpmstate0-backup.img.fidx /dev/zvol/nvme_pool/vm-905-disk-1 --verbose --format raw --skip-zero
connecting to repository 'test@pbs@10.10.40.31:bazen_pbsstore'
open block backend for target '/dev/zvol/nvme_pool/vm-905-disk-1'
starting to restore snapshot 'vm/123/2024-11-05T22:09:22Z'
download and verify backup index
progress 100% (read 4194304 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=4194304, duration=0.01s, speed=442.37MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository test@pbs@10.10.40.31:bazen_pbsstore --ns PVE-DELL vm/123/2024-11-05T22:09:22Z drive-virtio0.img.fidx /dev/zvol/nvme_pool/vm-905-disk-2 --verbose --format raw --skip-zero
connecting to repository 'test@pbs@10.10.40.31:bazen_pbsstore'
open block backend for target '/dev/zvol/nvme_pool/vm-905-disk-2'
starting to restore snapshot 'vm/123/2024-11-05T22:09:22Z'
download and verify backup index
progress 1% (read 1073741824 bytes, zeroes = 6% (67108864 bytes), duration 7 sec)
progress 2% (read 2147483648 bytes, zeroes = 3% (83886080 bytes), duration 14 sec)
progress 3% (read 3221225472 bytes, zeroes = 2% (92274688 bytes), duration 22 sec)
progress 4% (read 4294967296 bytes, zeroes = 2% (104857600 bytes), duration 28 sec)
progress 5% (read 5368709120 bytes, zeroes = 2% (142606336 bytes), duration 35 sec)
progress 6% (read 6442450944 bytes, zeroes = 3% (197132288 bytes), duration 42 sec)
progress 7% (read 7516192768 bytes, zeroes = 3% (268435456 bytes), duration 49 sec)
progress 8% (read 8589934592 bytes, zeroes = 3% (289406976 bytes), duration 56 sec)
progress 9% (read 9663676416 bytes, zeroes = 3% (310378496 bytes), duration 64 sec)
progress 10% (read 10737418240 bytes, zeroes = 3% (390070272 bytes), duration 73 sec)
progress 11% (read 11811160064 bytes, zeroes = 3% (452984832 bytes), duration 82 sec)
progress 12% (read 12884901888 bytes, zeroes = 9% (1249902592 bytes), duration 84 sec)
progress 13% (read 13958643712 bytes, zeroes = 16% (2323644416 bytes), duration 84 sec)
progress 14% (read 15032385536 bytes, zeroes = 22% (3397386240 bytes), duration 84 sec)
progress 15% (read 16106127360 bytes, zeroes = 25% (4102029312 bytes), duration 86 sec)
progress 16% (read 17179869184 bytes, zeroes = 24% (4131389440 bytes), duration 94 sec)
progress 17% (read 18253611008 bytes, zeroes = 22% (4169138176 bytes), duration 101 sec)
progress 18% (read 19327352832 bytes, zeroes = 21% (4227858432 bytes), duration 108 sec)
progress 19% (read 20401094656 bytes, zeroes = 20% (4282384384 bytes), duration 115 sec)
progress 20% (read 21474836480 bytes, zeroes = 20% (4299161600 bytes), duration 123 sec)
progress 21% (read 22548578304 bytes, zeroes = 19% (4345298944 bytes), duration 130 sec)
progress 22% (read 23622320128 bytes, zeroes = 18% (4353687552 bytes), duration 139 sec)
progress 23% (read 24696061952 bytes, zeroes = 17% (4366270464 bytes), duration 146 sec)
progress 24% (read 25769803776 bytes, zeroes = 16% (4366270464 bytes), duration 154 sec)
progress 25% (read 26843545600 bytes, zeroes = 16% (4504682496 bytes), duration 160 sec)
progress 26% (read 27917287424 bytes, zeroes = 16% (4647288832 bytes), duration 167 sec)
progress 27% (read 28991029248 bytes, zeroes = 16% (4894752768 bytes), duration 173 sec)
progress 28% (read 30064771072 bytes, zeroes = 16% (5108662272 bytes), duration 180 sec)
progress 29% (read 31138512896 bytes, zeroes = 17% (5440012288 bytes), duration 185 sec)
progress 30% (read 32212254720 bytes, zeroes = 17% (5700059136 bytes), duration 190 sec)
progress 31% (read 33285996544 bytes, zeroes = 17% (5762973696 bytes), duration 198 sec)
progress 32% (read 34359738368 bytes, zeroes = 16% (5800722432 bytes), duration 207 sec)
progress 33% (read 35433480192 bytes, zeroes = 16% (5876219904 bytes), duration 211 sec)
progress 34% (read 36507222016 bytes, zeroes = 16% (6035603456 bytes), duration 213 sec)
progress 35% (read 37580963840 bytes, zeroes = 16% (6262095872 bytes), duration 219 sec)
progress 36% (read 38654705664 bytes, zeroes = 17% (6631194624 bytes), duration 224 sec)
progress 37% (read 39728447488 bytes, zeroes = 16% (6631194624 bytes), duration 233 sec)
progress 38% (read 40802189312 bytes, zeroes = 16% (6698303488 bytes), duration 242 sec)
progress 39% (read 41875931136 bytes, zeroes = 16% (6941573120 bytes), duration 249 sec)
progress 40% (read 42949672960 bytes, zeroes = 16% (6987710464 bytes), duration 258 sec)
progress 41% (read 44023414784 bytes, zeroes = 16% (7285506048 bytes), duration 264 sec)
progress 42% (read 45097156608 bytes, zeroes = 16% (7373586432 bytes), duration 273 sec)
progress 43% (read 46170898432 bytes, zeroes = 16% (7470055424 bytes), duration 280 sec)
progress 44% (read 47244640256 bytes, zeroes = 15% (7507804160 bytes), duration 288 sec)
progress 45% (read 48318382080 bytes, zeroes = 15% (7511998464 bytes), duration 296 sec)
progress 46% (read 49392123904 bytes, zeroes = 15% (7532969984 bytes), duration 304 sec)
progress 47% (read 50465865728 bytes, zeroes = 14% (7562330112 bytes), duration 312 sec)
progress 48% (read 51539607552 bytes, zeroes = 14% (7562330112 bytes), duration 321 sec)
progress 49% (read 52613349376 bytes, zeroes = 14% (7562330112 bytes), duration 334 sec)
progress 50% (read 53687091200 bytes, zeroes = 14% (7650410496 bytes), duration 343 sec)
progress 51% (read 54760833024 bytes, zeroes = 13% (7650410496 bytes), duration 347 sec)
progress 52% (read 55834574848 bytes, zeroes = 13% (7667187712 bytes), duration 355 sec)
progress 53% (read 56908316672 bytes, zeroes = 13% (7725907968 bytes), duration 363 sec)
progress 54% (read 57982058496 bytes, zeroes = 13% (7876902912 bytes), duration 369 sec)
progress 55% (read 59055800320 bytes, zeroes = 13% (7910457344 bytes), duration 375 sec)
progress 56% (read 60129542144 bytes, zeroes = 13% (8229224448 bytes), duration 380 sec)
progress 57% (read 61203283968 bytes, zeroes = 13% (8430551040 bytes), duration 388 sec)
progress 58% (read 62277025792 bytes, zeroes = 13% (8610906112 bytes), duration 396 sec)
progress 59% (read 63350767616 bytes, zeroes = 13% (8610906112 bytes), duration 406 sec)
progress 60% (read 64424509440 bytes, zeroes = 13% (8610906112 bytes), duration 416 sec)
progress 61% (read 65498251264 bytes, zeroes = 13% (8682209280 bytes), duration 424 sec)
progress 62% (read 66571993088 bytes, zeroes = 13% (8917090304 bytes), duration 430 sec)
progress 63% (read 67645734912 bytes, zeroes = 13% (9357492224 bytes), duration 433 sec)
progress 64% (read 68719476736 bytes, zeroes = 13% (9382658048 bytes), duration 443 sec)
progress 65% (read 69793218560 bytes, zeroes = 13% (9386852352 bytes), duration 451 sec)
progress 66% (read 70866960384 bytes, zeroes = 13% (9386852352 bytes), duration 460 sec)
progress 67% (read 71940702208 bytes, zeroes = 13% (9386852352 bytes), duration 468 sec)
progress 68% (read 73014444032 bytes, zeroes = 12% (9386852352 bytes), duration 479 sec)
progress 69% (read 74088185856 bytes, zeroes = 12% (9403629568 bytes), duration 488 sec)
progress 70% (read 75161927680 bytes, zeroes = 12% (9407823872 bytes), duration 496 sec)
progress 71% (read 76235669504 bytes, zeroes = 12% (9542041600 bytes), duration 503 sec)
progress 72% (read 77309411328 bytes, zeroes = 12% (9764339712 bytes), duration 509 sec)
progress 73% (read 78383153152 bytes, zeroes = 12% (9848225792 bytes), duration 516 sec)
progress 74% (read 79456894976 bytes, zeroes = 12% (10041163776 bytes), duration 523 sec)
progress 75% (read 80530636800 bytes, zeroes = 12% (10313793536 bytes), duration 528 sec)
progress 76% (read 81604378624 bytes, zeroes = 12% (10586423296 bytes), duration 534 sec)
progress 77% (read 82678120448 bytes, zeroes = 12% (10611589120 bytes), duration 540 sec)
progress 78% (read 83751862272 bytes, zeroes = 13% (11169431552 bytes), duration 542 sec)
progress 79% (read 84825604096 bytes, zeroes = 13% (11525947392 bytes), duration 548 sec)
progress 80% (read 85899345920 bytes, zeroes = 13% (12008292352 bytes), duration 551 sec)
progress 81% (read 86973087744 bytes, zeroes = 14% (12670992384 bytes), duration 553 sec)
progress 82% (read 88046829568 bytes, zeroes = 15% (13304332288 bytes), duration 556 sec)
progress 83% (read 89120571392 bytes, zeroes = 15% (13514047488 bytes), duration 563 sec)
progress 84% (read 90194313216 bytes, zeroes = 15% (13941866496 bytes), duration 569 sec)
progress 85% (read 91268055040 bytes, zeroes = 15% (14558429184 bytes), duration 571 sec)
progress 86% (read 92341796864 bytes, zeroes = 16% (15468593152 bytes), duration 572 sec)
progress 87% (read 93415538688 bytes, zeroes = 17% (16508780544 bytes), duration 572 sec)
progress 88% (read 94489280512 bytes, zeroes = 18% (17062428672 bytes), duration 574 sec)
progress 89% (read 95563022336 bytes, zeroes = 17% (17075011584 bytes), duration 578 sec)
progress 90% (read 96636764160 bytes, zeroes = 18% (17641242624 bytes), duration 580 sec)
progress 91% (read 97710505984 bytes, zeroes = 19% (18714984448 bytes), duration 580 sec)
progress 92% (read 98784247808 bytes, zeroes = 19% (19709034496 bytes), duration 580 sec)
progress 93% (read 99857989632 bytes, zeroes = 20% (20761804800 bytes), duration 580 sec)
progress 94% (read 100931731456 bytes, zeroes = 21% (21713911808 bytes), duration 581 sec)
progress 95% (read 102005473280 bytes, zeroes = 22% (22741516288 bytes), duration 581 sec)
progress 96% (read 103079215104 bytes, zeroes = 23% (23815258112 bytes), duration 581 sec)
progress 97% (read 104152956928 bytes, zeroes = 23% (24582815744 bytes), duration 584 sec)
progress 98% (read 105226698752 bytes, zeroes = 24% (25652363264 bytes), duration 584 sec)
progress 99% (read 106300440576 bytes, zeroes = 24% (26340229120 bytes), duration 587 sec)
progress 100% (read 107374182400 bytes, zeroes = 24% (26398949376 bytes), duration 599 sec)
restore image complete (bytes=107374182400, duration=599.15s, speed=170.91MB/s)
rescan volumes...
TASK OK

The logs from PBS side are attached in the ZIP file.
 

Attachments

Test from PVE to PBS:

Code:
root@pve-main:~# proxmox-backup-client benchmark --repository test@pbs@10.10.40.31:8007:bazen_pbsstore
Uploaded 148 chunks in 5 seconds.
Time per request: 35468 microseconds.
TLS speed: 118.25 MB/s
SHA256 speed: 1242.12 MB/s
Compression speed: 502.76 MB/s
Decompress speed: 798.85 MB/s
AES256/GCM speed: 1860.20 MB/s
Verify speed: 489.31 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 118.25 MB/s (10%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1242.12 MB/s (61%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 502.76 MB/s (67%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 798.85 MB/s (67%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 489.31 MB/s (65%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 1860.20 MB/s (51%) │
└───────────────────────────────────┴────────────────────┘

This is showing unrealistic TLS speeds - TLS is not nearly this slow, there something more going on in this test, but I don't know what's in it. This is what I previously meant by benchmark not being too informative / clear on details. Also these percentages - "10%", "61%" - this is in relation to what ?
 
Last edited:
The PVE server is a newer machine, PBS is an older machine.
How old is your PBS machine? Maybe it uses a CPU with lack of AES?
Even on my old Xeon X5570 PBS I have a TLS speed of 140MB/s. Without AES support...
 
Last edited:
How old is your PBS machine? Maybe it uses a CPU with lack of AES?
Even on my old X5570 PBS I have a TLS speed of 140MB/s. Without AES support...
It's old, has an E5-2620. The CPU has AES support though...

That doesn't explain the slow restore speed. I did (just now) discover that I wasn't using actually using the 10G interface :( tested with iperf. Wrong interface port was selected. Now that I'm actually using 10G these are my results;

Code:
root@pve-main:~# proxmox-backup-client benchmark --repository test@pbs@10.10.40.31:8007:bazen_pbsstore
Uploaded 382 chunks in 5 seconds.
Time per request: 13256 microseconds.
TLS speed: 316.39 MB/s
SHA256 speed: 1239.57 MB/s
Compression speed: 498.69 MB/s
Decompress speed: 792.66 MB/s
AES256/GCM speed: 1835.27 MB/s
Verify speed: 475.49 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 316.39 MB/s (26%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1239.57 MB/s (61%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 498.69 MB/s (66%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 792.66 MB/s (66%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 475.49 MB/s (63%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 1835.27 MB/s (50%) │
└───────────────────────────────────┴────────────────────┘

Ok, that was my fault.
But that doesn't explain the 60 MB/s wire transfer speed when restoring. Even if it was using 1G link (yes, turns out it actually was using 1G adapter instead of 10G adapter) it still had half the bandwidth unused. I will test restore again and post results here.
 
progress 100% (read 107374182400 bytes, zeroes = 24% (26398949376 bytes), duration 599 sec)
restore image complete (bytes=107374182400, duration=599.15s, speed=170.91MB/s)
Without the zeroes it is 135 MB/s. Exactly 1Gb/s network speed I would say.
 
Last edited:
Restore (now over 10G link) results:

new volume ID is 'nvme_pool:vm-905-disk-0'
new volume ID is 'nvme_pool:vm-905-disk-1'
new volume ID is 'nvme_pool:vm-905-disk-2'
restore proxmox backup image: /usr/bin/pbs-restore --repository test@pbs@10.10.40.31:bazen_pbsstore --ns PVE-DELL vm/123/2024-11-05T22:09:22Z drive-efidisk0.img.fidx /dev/zvol/nvme_pool/vm-905-disk-0 --verbose --format raw --skip-zero
connecting to repository 'test@pbs@10.10.40.31:bazen_pbsstore'
open block backend for target '/dev/zvol/nvme_pool/vm-905-disk-0'
starting to restore snapshot 'vm/123/2024-11-05T22:09:22Z'
download and verify backup index
progress 100% (read 540672 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=540672, duration=0.00s, speed=223.88MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository test@pbs@10.10.40.31:bazen_pbsstore --ns PVE-DELL vm/123/2024-11-05T22:09:22Z drive-tpmstate0-backup.img.fidx /dev/zvol/nvme_pool/vm-905-disk-1 --verbose --format raw --skip-zero
connecting to repository 'test@pbs@10.10.40.31:bazen_pbsstore'
open block backend for target '/dev/zvol/nvme_pool/vm-905-disk-1'
starting to restore snapshot 'vm/123/2024-11-05T22:09:22Z'
download and verify backup index
progress 100% (read 4194304 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=4194304, duration=0.01s, speed=436.42MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository test@pbs@10.10.40.31:bazen_pbsstore --ns PVE-DELL vm/123/2024-11-05T22:09:22Z drive-virtio0.img.fidx /dev/zvol/nvme_pool/vm-905-disk-2 --verbose --format raw --skip-zero
connecting to repository 'test@pbs@10.10.40.31:bazen_pbsstore'
open block backend for target '/dev/zvol/nvme_pool/vm-905-disk-2'
starting to restore snapshot 'vm/123/2024-11-05T22:09:22Z'
download and verify backup index
progress 1% (read 1073741824 bytes, zeroes = 6% (67108864 bytes), duration 4 sec)
progress 2% (read 2147483648 bytes, zeroes = 3% (83886080 bytes), duration 9 sec)
progress 3% (read 3221225472 bytes, zeroes = 2% (92274688 bytes), duration 14 sec)
progress 4% (read 4294967296 bytes, zeroes = 2% (104857600 bytes), duration 18 sec)
progress 5% (read 5368709120 bytes, zeroes = 2% (142606336 bytes), duration 22 sec)
progress 6% (read 6442450944 bytes, zeroes = 3% (197132288 bytes), duration 26 sec)
progress 7% (read 7516192768 bytes, zeroes = 3% (268435456 bytes), duration 30 sec)
progress 8% (read 8589934592 bytes, zeroes = 3% (289406976 bytes), duration 35 sec)
progress 9% (read 9663676416 bytes, zeroes = 3% (310378496 bytes), duration 40 sec)
progress 10% (read 10737418240 bytes, zeroes = 3% (390070272 bytes), duration 46 sec)
progress 11% (read 11811160064 bytes, zeroes = 3% (452984832 bytes), duration 51 sec)
progress 12% (read 12884901888 bytes, zeroes = 9% (1249902592 bytes), duration 52 sec)
progress 13% (read 13958643712 bytes, zeroes = 16% (2323644416 bytes), duration 52 sec)
progress 14% (read 15032385536 bytes, zeroes = 22% (3397386240 bytes), duration 52 sec)
progress 15% (read 16106127360 bytes, zeroes = 25% (4102029312 bytes), duration 53 sec)
progress 16% (read 17179869184 bytes, zeroes = 24% (4131389440 bytes), duration 58 sec)
progress 17% (read 18253611008 bytes, zeroes = 22% (4169138176 bytes), duration 62 sec)
progress 18% (read 19327352832 bytes, zeroes = 21% (4227858432 bytes), duration 67 sec)
progress 19% (read 20401094656 bytes, zeroes = 20% (4282384384 bytes), duration 71 sec)
progress 20% (read 21474836480 bytes, zeroes = 20% (4299161600 bytes), duration 76 sec)
progress 21% (read 22548578304 bytes, zeroes = 19% (4345298944 bytes), duration 81 sec)
progress 22% (read 23622320128 bytes, zeroes = 18% (4353687552 bytes), duration 85 sec)
progress 23% (read 24696061952 bytes, zeroes = 17% (4366270464 bytes), duration 90 sec)
progress 24% (read 25769803776 bytes, zeroes = 16% (4366270464 bytes), duration 96 sec)
progress 25% (read 26843545600 bytes, zeroes = 16% (4504682496 bytes), duration 100 sec)
progress 26% (read 27917287424 bytes, zeroes = 16% (4647288832 bytes), duration 104 sec)
progress 27% (read 28991029248 bytes, zeroes = 16% (4894752768 bytes), duration 107 sec)
progress 28% (read 30064771072 bytes, zeroes = 16% (5108662272 bytes), duration 111 sec)
progress 29% (read 31138512896 bytes, zeroes = 17% (5440012288 bytes), duration 114 sec)
progress 30% (read 32212254720 bytes, zeroes = 17% (5700059136 bytes), duration 117 sec)
progress 31% (read 33285996544 bytes, zeroes = 17% (5762973696 bytes), duration 122 sec)
progress 32% (read 34359738368 bytes, zeroes = 16% (5800722432 bytes), duration 127 sec)
progress 33% (read 35433480192 bytes, zeroes = 16% (5876219904 bytes), duration 131 sec)
progress 34% (read 36507222016 bytes, zeroes = 16% (6035603456 bytes), duration 132 sec)
progress 35% (read 37580963840 bytes, zeroes = 16% (6262095872 bytes), duration 136 sec)
progress 36% (read 38654705664 bytes, zeroes = 17% (6631194624 bytes), duration 139 sec)
progress 37% (read 39728447488 bytes, zeroes = 16% (6631194624 bytes), duration 144 sec)
progress 38% (read 40802189312 bytes, zeroes = 16% (6698303488 bytes), duration 150 sec)
progress 39% (read 41875931136 bytes, zeroes = 16% (6941573120 bytes), duration 154 sec)
progress 40% (read 42949672960 bytes, zeroes = 16% (6987710464 bytes), duration 159 sec)
progress 41% (read 44023414784 bytes, zeroes = 16% (7285506048 bytes), duration 163 sec)
progress 42% (read 45097156608 bytes, zeroes = 16% (7373586432 bytes), duration 168 sec)
progress 43% (read 46170898432 bytes, zeroes = 16% (7470055424 bytes), duration 172 sec)
progress 44% (read 47244640256 bytes, zeroes = 15% (7507804160 bytes), duration 176 sec)
progress 45% (read 48318382080 bytes, zeroes = 15% (7511998464 bytes), duration 181 sec)
progress 46% (read 49392123904 bytes, zeroes = 15% (7532969984 bytes), duration 185 sec)
progress 47% (read 50465865728 bytes, zeroes = 14% (7562330112 bytes), duration 189 sec)
progress 48% (read 51539607552 bytes, zeroes = 14% (7562330112 bytes), duration 194 sec)
progress 49% (read 52613349376 bytes, zeroes = 14% (7562330112 bytes), duration 200 sec)
progress 50% (read 53687091200 bytes, zeroes = 14% (7650410496 bytes), duration 204 sec)
progress 51% (read 54760833024 bytes, zeroes = 13% (7650410496 bytes), duration 207 sec)
progress 52% (read 55834574848 bytes, zeroes = 13% (7667187712 bytes), duration 211 sec)
progress 53% (read 56908316672 bytes, zeroes = 13% (7725907968 bytes), duration 216 sec)
progress 54% (read 57982058496 bytes, zeroes = 13% (7876902912 bytes), duration 219 sec)
progress 55% (read 59055800320 bytes, zeroes = 13% (7910457344 bytes), duration 222 sec)
progress 56% (read 60129542144 bytes, zeroes = 13% (8229224448 bytes), duration 225 sec)
progress 57% (read 61203283968 bytes, zeroes = 13% (8430551040 bytes), duration 228 sec)
progress 58% (read 62277025792 bytes, zeroes = 13% (8610906112 bytes), duration 233 sec)
progress 59% (read 63350767616 bytes, zeroes = 13% (8610906112 bytes), duration 238 sec)
progress 60% (read 64424509440 bytes, zeroes = 13% (8610906112 bytes), duration 243 sec)
progress 61% (read 65498251264 bytes, zeroes = 13% (8682209280 bytes), duration 247 sec)
progress 62% (read 66571993088 bytes, zeroes = 13% (8917090304 bytes), duration 251 sec)
progress 63% (read 67645734912 bytes, zeroes = 13% (9357492224 bytes), duration 252 sec)
progress 64% (read 68719476736 bytes, zeroes = 13% (9382658048 bytes), duration 257 sec)
progress 65% (read 69793218560 bytes, zeroes = 13% (9386852352 bytes), duration 261 sec)
progress 66% (read 70866960384 bytes, zeroes = 13% (9386852352 bytes), duration 266 sec)
progress 67% (read 71940702208 bytes, zeroes = 13% (9386852352 bytes), duration 271 sec)
progress 68% (read 73014444032 bytes, zeroes = 12% (9386852352 bytes), duration 276 sec)
progress 69% (read 74088185856 bytes, zeroes = 12% (9403629568 bytes), duration 280 sec)
progress 70% (read 75161927680 bytes, zeroes = 12% (9407823872 bytes), duration 284 sec)
progress 71% (read 76235669504 bytes, zeroes = 12% (9542041600 bytes), duration 288 sec)
progress 72% (read 77309411328 bytes, zeroes = 12% (9764339712 bytes), duration 291 sec)
progress 73% (read 78383153152 bytes, zeroes = 12% (9848225792 bytes), duration 295 sec)
progress 74% (read 79456894976 bytes, zeroes = 12% (10041163776 bytes), duration 299 sec)
progress 75% (read 80530636800 bytes, zeroes = 12% (10313793536 bytes), duration 302 sec)
progress 76% (read 81604378624 bytes, zeroes = 12% (10586423296 bytes), duration 305 sec)
progress 77% (read 82678120448 bytes, zeroes = 12% (10611589120 bytes), duration 309 sec)
progress 78% (read 83751862272 bytes, zeroes = 13% (11169431552 bytes), duration 310 sec)
progress 79% (read 84825604096 bytes, zeroes = 13% (11525947392 bytes), duration 313 sec)
progress 80% (read 85899345920 bytes, zeroes = 13% (12008292352 bytes), duration 315 sec)
progress 81% (read 86973087744 bytes, zeroes = 14% (12670992384 bytes), duration 316 sec)
progress 82% (read 88046829568 bytes, zeroes = 15% (13304332288 bytes), duration 318 sec)
progress 83% (read 89120571392 bytes, zeroes = 15% (13514047488 bytes), duration 322 sec)
progress 84% (read 90194313216 bytes, zeroes = 15% (13941866496 bytes), duration 325 sec)
progress 85% (read 91268055040 bytes, zeroes = 15% (14558429184 bytes), duration 326 sec)
progress 86% (read 92341796864 bytes, zeroes = 16% (15468593152 bytes), duration 327 sec)
progress 87% (read 93415538688 bytes, zeroes = 17% (16508780544 bytes), duration 327 sec)
progress 88% (read 94489280512 bytes, zeroes = 18% (17062428672 bytes), duration 328 sec)
progress 89% (read 95563022336 bytes, zeroes = 17% (17075011584 bytes), duration 331 sec)
progress 90% (read 96636764160 bytes, zeroes = 18% (17641242624 bytes), duration 332 sec)
progress 91% (read 97710505984 bytes, zeroes = 19% (18714984448 bytes), duration 332 sec)
progress 92% (read 98784247808 bytes, zeroes = 19% (19709034496 bytes), duration 332 sec)
progress 93% (read 99857989632 bytes, zeroes = 20% (20761804800 bytes), duration 332 sec)
progress 94% (read 100931731456 bytes, zeroes = 21% (21713911808 bytes), duration 333 sec)
progress 95% (read 102005473280 bytes, zeroes = 22% (22741516288 bytes), duration 333 sec)
progress 96% (read 103079215104 bytes, zeroes = 23% (23815258112 bytes), duration 333 sec)
progress 97% (read 104152956928 bytes, zeroes = 23% (24582815744 bytes), duration 334 sec)
progress 98% (read 105226698752 bytes, zeroes = 24% (25652363264 bytes), duration 334 sec)
progress 99% (read 106300440576 bytes, zeroes = 24% (26340229120 bytes), duration 336 sec)
progress 100% (read 107374182400 bytes, zeroes = 24% (26398949376 bytes), duration 341 sec)
restore image complete (bytes=107374182400, duration=341.81s, speed=299.58MB/s)
rescan volumes...
TASK OK

Restore speed is now around 110-130 MB/s (I'm taking measured ethernet transfer speed, not the numbers that PBS is putting out after dedupe, decompression, etc). So that has doubled now, by going from 1G to 10G ? That makes no sense. 1G is able to reach 120 MB/s, why doesn't it ?

Why doesn't it go any faster. CPU usage on both sides is negligable. And I seriously doubt my NVME card is not able to provide more.
 
I guess it is the slow encryption speed of your old CPU.
Would be great to have the option to deactivate the transport encryption, but no, there is no option.
 
  • Like
Reactions: lucius_the
Without the zeroes it is 135 MB/s. Exactly 1Gb/s network speed I would say.
Not really. That's after dedupe, decompression, etc. This number doesn't show ethernet speed. It was actually around 60 MB/s (I was monitoring it via bmon).

Also on the second test, wire speed was cca 130 MB/s. Restore job reports 299 MB/s. That's not network speed, that's... the calculated "speed of restore" after data reaches the disk on the destination. Somewhat misleading (like the numbers in PBS that are showing only the "snapshot size", not the actual size of all referenced chunks on PBS datastore, which is the size of the actual backup copy - something I'd be interested in knowing).

I'm not really interested in the numbers that restore job is reporting. I am interested in identifying where the bottleneck is when doing restore. If the source and destination disks disk can provide higher speeds, if CPU on both sides have enough time to spare, if network is not touching its limits - then why is this not going faster ? This hardware can certainly do better.

Question is not "is this speed good enough ?". Question is "why is this not faster?" and "what can we do to make it faster?". There are bottlenecks that shouldn't be here.

So... TLS could be the reason. Why do I need to use TLS encrypted transfer - in my isolated network segment between two hosts in the same room, 1 m apart from each other ? Why can't I decide when to use it and when not to use it ? These are the "knobs" I want to see in PBS, so I can tune them. SHA-256 could be another reason (xxHash will make things faster, without scarifying on data integrity). All these things combined - are probably the reason for slow restore speeds.

I am pretty certain this could go much faster if we could tune these things. But we can not. Someone decided that having all that and no option to tune in on anything is the right thing for everyone... I (strongly) disagree with that - and hope I can help make a change.
 
I guess it is the slow encryption speed of your old CPU.
Would be great to have the option to deactivate the transport encryption, but no, there is no option.

That's - a guess. To make that guess go away, I was monitoring CPU usage and: no core was ever near 100% (on either side - PBS or PVE).
So, there is plenty CPU resources, why aren't they being utilized ?
 
Why do I need to use TLS encrypted transfer - in my isolated network segment between two hosts in the same room...
Same for me. My PBS is in the same rack and the whole network transfers for backups and restores are in a separated network.

PBS doesn't use all cores. Neither for backups nor for the restores. Another "knob" to adjust...
 
  • Like
Reactions: lucius_the
I know. That's why I'm trying to raise awareness here and have the development or product management teams acknowledge the fact that there are quite some optimization potential left in PBS that no one seems to care for much. As to better use the hardware that's available to it, before telling anyone to go change the server, go full-flash and similar suggestions. Some of these optimizations are "low hanging fruit", really. We'll see if they start listening. This should get some momentum.

I don't want anyone to get me wrong - I love PBS. What they did with dedupe especially is quite smart, efficient and worth of every applause. The product is well though out, there are just these... early / childhood issues plaguing it still. I don't understand why more energy is not put into that. PBS has the potential to truly rock the scene.

Now Veeam has gotten in and will lap them in the race, unless they start moving ASAP. Pity. I think PBS has the potential to be a better backup solution than Veeam (for PVE hosts at least) for many customers who don't needed all the advanced features Veeam relentlessly pushes you to buy. And like 95% of all core work is already done on PBS - and now they stopped investing 5% from the finish line. That last 5% needs to go in, to polish up the few issues it has to make it truly amazing - and have many companies pay support for PBS instead of Veeam. They are so close... but seem to be giving up, instead of finishing that 5% that's left to do. I don't understand it.
 
Last edited:
I'll answer the initial points and fold in some of the things you posted afterwards..

- PBS is using HTTP/2 for chunk transfer, which causes a single TCP connection for transmission -> there are more cases where this is counter productive then there are cases where this is beneficial: congested links (where multiple streams help a lot), LACP bond with layer3+4 hashing (can't utilize available bond fully because it's a single TCP stream), or any network situation that benefits from parallel/multiple streams (in real life there are many such) - and I don't really know where this is beneficial. Is that to avoid TCP slow start ? When using multiple TCP connections don't open a new one for every chunk transfer, individual chunks are rather smallish, but stream new chunks over existing TCP connection - this will avoid any issues with TCP slow start. With that one out, is there any other real benefit to using for a single connection ? Well make it configurable so people can type in the number of connection they want for transfers. Then whoever wants 1 TCP stream they can have it. I suspect most will not want that.

HTTP/2 as only transport mechanism for the backup and reader sessions is not set in stone. But any replacement/additional transport must have clear benefits without blowing up code complexity. We are aware that HTTP/2 performs (a lot) worse over higher latency connections, because in order to handle multiple streams over a single connection, round trips to handle stream capacity are needed. Why your TLS benchmark speed is so low I cannot say - it might be that your CPU just can't handle the throughput for the chosen cipher/crypto?

- using SHA-256 checksums for chunks -> xxHash-128 would be much better choice technically. Besides being more robust in detecting bitflips (it's designed for that) it is also MUCH faster to calculate. That would cause speedup in all operations - backup, verify and restore. Using a cryptographic algorithm here is unnecessary, it's slow and inefficient. I'd suggest xxHash because I recently made a study on which one to use and xxHash came out as a winner. Of course there are other, non crypto hashing algs for this as well. To keep the new code compatible with existing chunks, simply add a field to index file that defines what hashing alg is used (then PBS can support both).

xxHash is not a valid choice for this application - we do actually really need a cryptographic hash, because the digest is not just there to protect against bit rot. There might be ways to improve performance, but all of them require a format break which is not something we do lightly. See https://bugzilla.proxmox.com/show_bug.cgi?id=5859 to discuss this further ;)

- there are virtually no "configuration knobs" for PBS. I can't choose compression algorithm, checksum algorithm, can't configure a number of threads PBS will use, can't configure the numbery of TCP streams for transfer, etc. PBS is way too "fixed" to defaults, everything is hardcoded, no options. There is no such thing as a one-size-fits-all. Options are needed.

Options for some of these are planned - e.g., PBS under the hood has the option to disable compressing chunks, it's just not wired up to the "outside". Compression or checksum algorithms changes mean breaking existing clients (and servers), so those will only happen if we find something that offers substantial improvements.

- in my testing NVME/SSD vs HDD also doesn't matter really. Default suggestions all over the are to use all-flash storage for PBS. Such suggestions are too generic and superficial. That kind of information can be very misleading and cause a lot of customer grief, after they spend a lot of money on flash only to find out that things are not going faster. PBS client benchmark tool is good, but not perfect, it's also a bit too generic, not clear on details.

That's not true at all. For many of the tasks PBS does, spinning rust is orders of magnitudes slower than flash. You might not notice the difference for small incremental backup runs, where very little actual I/O happens, but PBS basically does random I/O with 1-4MB of data when writing/accessing backup contents, and random I/O on tons of small files for things like GC and verify.

Please note I'm not making these things up from some random reading on the web - I am a developer with 25+ years of experience, owning a company whose core business is software development. Some of the above points are very real design change elements that WILL make a difference in performance. PBS has bottlenecks in places where there shouldn't be any. Your team should rather acknowledge this, instead of using the "throw more hardware at it" that I've seen on the forums. This is open source - so yes I - could help with development directly. Problem is I work in .NET solely, so can't contribute to code. I can help with design suggestions only. And so far I've spotted a few obvious ones. I'm not alone and unfortunately there's not been much movement around these issues.

I just want to point out - what might seem "obvious" to you might actually be factually wrong (see the hashsum algorithm point above). If things seem suboptimal, it's usually not because we are too lazy to do a simple change, but because fixing them is more complicated than it looks ;)

That being said - feedback is always welcome as long as it's brought forward in a constructive manner.

I will send you numbers around restore operations, as soon as I get make some more time so we can continue the journey of "getting to the bottom" so you can fix the issues and improve the end product -> if you are willing to go that route, that is. The moment I see another "your CPU is too old" or "you should be using all-flash" - I'm leaving discussion and putting my efforts back to Veeam instead - at least Gostev there (the product manager) has listened to my inputs, agreed and implemented the changes (on more than one occasion) and made very positive impacts. My point is: they listened and you should do so as well - don't dismiss suggestions coming from power users, work with them to make the product better. There are open things in bugzilla regarding PBS performance, that are being ignored - that's not good.

I am not saying this to make you angry - but your CPU is actually rather old, and I suspect, the bottle neck in this case.

I can give you some more details why the "TLS benchmark" results are faster than your real world restore "line speed":

The TLS benchmark only uploads the same blob of semi-random data over and over and measure throughput - there is no processing done on either end. An actual restore chunk request has to find and load the chunk from disk, send its contents over the wire, parse, decode and verify the chunk on the client side, write the chunk to the target disk.

While our code tries to do things concurrently, there is always some overhead involved. In particular with restoring VMs, there was some limitation within Qemu that forces us to process one chunk after the other because writing from multiple threads to the same disk wasn't possible - I am not sure whether that has been lifted in the meantime. you could try figuring out whether a plain proxmox-backup-client restore of the big disk in your example is faster.

The numbers in the benchmark output refer to a baseline from 2020, an AMD Ryzen 7 2700X workstation running the benchmark against itself. My current work station beats it almost across the board:

Code:
Uploaded 1703 chunks in 5 seconds.
Time per request: 2939 microseconds.
TLS speed: 1427.05 MB/s
SHA256 speed: 2636.03 MB/s
Compression speed: 804.75 MB/s
Decompress speed: 1062.70 MB/s
AES256/GCM speed: 5793.18 MB/s
Verify speed: 757.18 MB/s
┌───────────────────────────────────┬─────────────────────┐
│ Name                              │ Value               │
╞═══════════════════════════════════╪═════════════════════╡
│ TLS (maximal backup upload speed) │ 1427.05 MB/s (116%) │
├───────────────────────────────────┼─────────────────────┤
│ SHA256 checksum computation speed │ 2636.03 MB/s (130%) │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 compression speed    │ 804.75 MB/s (107%)  │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 decompression speed  │ 1062.70 MB/s (89%)  │
├───────────────────────────────────┼─────────────────────┤
│ Chunk verification speed          │ 757.18 MB/s (100%)  │
├───────────────────────────────────┼─────────────────────┤
│ AES256 GCM encryption speed       │ 5793.18 MB/s (159%) │
└───────────────────────────────────┴─────────────────────┘

We did find and fix a bug in the AES benchmark a while back, so the numbers for that are not really comparable before and after.
 
I am not saying this to make you angry - but your CPU is actually rather old, and I suspect, the bottle neck in this case.
I really don't like this argument. Even if the single thread performance is low, which is in this case true, parallelizing should improve things, yet I know that encryption is seldomly parallelizable and as reported the cpu utilization / load is very low and far from cpu bound.

I have to say that the numbers on my newer, yet maybe still as old refered to CPU (E5-2667 v4) has very similar numbers, which are roughly a third of what you shared from your workstation.

Code:
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 118.15 MB/s (10%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 407.15 MB/s (20%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 391.26 MB/s (52%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 586.81 MB/s (49%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 235.18 MB/s (31%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 1163.03 MB/s (32%) │
└───────────────────────────────────┴────────────────────┘

Will have to check on newer hardware.
 
Even if the single thread performance is low, which is in this case true, parallelizing should improve things, yet I know that encryption is seldomly parallelizable ...
My PBS running in a container on PVE (Ryzen 5950X) gives for TLS 348 MB/s (28%) with 1 core, 726 MB/s (59%) with 2 cores and 839 MB/s (68%) with 3 cores or more. I guess there is some parallelization going on.
The other parts of the benchmark don't appear to depend on the cpulimit setting of the container. I always assumed that my datastore 2.5" laptop drives are the cause of the slow restores.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!