PBS Performance improvements (enterprise all-flash)

Ramalama · Jul 10, 2024

fabian said:
thanks for the additional numbers! that does indeed look like there is some severe bottle neck happening that we should get to the bottom of..

I suspect that there is something in the compression pipeline.
Because like i mentioned in the Bug Report, even if the way the Backup works with PBS compared to Local-Storage Backup, the limitations are all exactly the same.

Or the speeds are identical between Local and PBS as soon as Compression is enabled (no matter which compression or how many zstd multitasking threads).
With disabling Compression, it feels like every Limitation is removed, the Backup-Speeds go up to 5GB/s here.
With enabled Compression (no matter which/multitasking etc), no matter if Local or PBS, i can't pass 1GB/s.

And in my opinion it has nothing todo with Clockspeeds either because i Monitored the CPU-Utilization during Backuping (Local and PBS), and i don't see any Cores reaching 100%.
But im not entirely sure, since on other side, the Clockspeeds are definitively important, i get on Servers that can reach higher Clockspeeds faster Backup Speeds.

I took me 3 days of trying to find out what the Bottleneck is, as i created my Bugreport...
And i tryed to make PBS-VM Instances and even reinstalled one of both Genoa-Servers with PBS, to have the fastest possible PVE+PBS on the planet, tryed with Local Storage and all tuning options possible on Compression, etc...
Monitored CPU and IO etc...
And couldn't find out what the Bottleneck is and gaved up.

On the returning side of things, it seems like im still Archieving the fastest PBS Backup-Speeds with 1GB/s even with SAS (HDD) Drives on PBS, while others are using NVME Drives and have troubles to reach 500mb/s.

Cheers

fabian · Jul 17, 2024

tomstephens89 said:

Just as bad, or worse.

Code:

Upload image '/dev/mapper/pve-vm--100--disk--0' to 'root@pam@10.226.10.10:8007:pbs-primary' as tomtest.img.fidx
tomtest.img: had to backup 62.832 GiB of 80 GiB (compressed 42.404 GiB) in 673.35s
tomtest.img: average backup speed: 95.552 MiB/s
tomtest.img: backup was done incrementally, reused 17.168 GiB (21.5%)
Duration: 676.53s
End Time: Tue Jul  9 12:59:22 2024

FWIW, I very likely found the issue with this particular invocation, and might have some test package (if you are willing!) to try for image backups using proxmox-backup-client.

tomstephens89 · Jul 17, 2024

fabian said:
FWIW, I very likely found the issue with this particular invocation, and might have some test package (if you are willing!) to try for image backups using proxmox-backup-client.

Please share.

fabian · Jul 23, 2024

https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=deb237a28883bba0584766129b01997ccd63c4fe

since there is another patch pending with performance implications, I'll wait for that one before doing a test build:

https://lists.proxmox.com/pipermail/pbs-devel/2024-July/010330.html

Ramalama · Jul 30, 2024

fabian said:
https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=deb237a28883bba0584766129b01997ccd63c4fe

since there is another patch pending with performance implications, I'll wait for that one before doing a test build:

https://lists.proxmox.com/pipermail/pbs-devel/2024-July/010330.html

Is there any timeframe when it reaches the pve-test repo to check out the impact?

Nerion · Dec 23, 2025

Any news here? We have also a performance issue with NVMe Disks on Proxmox Backup Server.

news · Dec 23, 2025

Nerion said:
Any news here? We have also a performance issue with NVMe Disks on Proxmox Backup Server.

And what will you tell us?
only 10 Mb/s thought write?
Use other enterprice NVMe disks in ZFS like Raid 10 Setup on fast new hardware.

Nerion · Dec 23, 2025

There was something change in the git with "input buffer size"
The question is was this generally changed in future pbs versions or is this something we must change manually to see if this solves our problem.

VictorSTS · Dec 23, 2025

Nerion said:
There was something change in the git with "input buffer size"

Don't know what are you referring to. Do you have a link to the change you mention? Also, resurrecting a year+ old post without providing details isn't that useful. In the mean time there has been improvements for both restores and verification tasks and you don't mention what kind of "performance issue" you have.

The changes mentioned above [1] were released with PBS3.3 [2]

[1] https://forum.proxmox.com/threads/p...ments-enterprise-all-flash.150514/post-688915
[2] https://pbs.proxmox.com/wiki/Roadmap#Proxmox_Backup_Server_3.3

Ramalama · Dec 24, 2025

From my Part, or the mentioned bugreport...

Nothing is fixed since this thread was opened. PBS backups are still single core tls limited.

It was an eternity ago and still the same issue today.
9374F -> 1gb/s limit
Xeon 4210R -> 200mb/s limit

So there is absolutely nothing fixed in this thread. I think that people just gaved up on this.

andylee01 · Apr 24, 2026

We're in the process of implementing PVE and PBS in our environment and have been stuck on this issue for a few days. Here's a quick writeup of our experience and testing. TLDR at the end.

The setup

PVE hosts are Dell R640's with these specs:

2x Xeon Gold 6132
512GB RAM
4x 10G interfaces (2 for data in a bond, 2 for multipathed storage traffic)

PBS is run as a VM on one of the PVE hosts with this setup:

8 vCPU
200GB RAM
Multiple network interfaces so backup & storage traffic bypasses firewalls

Storage:

VM uses all flash block storage via NVMe-TCP (Pure Storage FlashArray)
PBS datastore is backed by all flash object/file storage via NFS (Pure Storage FlashBlade)

Diagram:

Code:

     +--------> PVE (SRC)
     |NVMe-TCP    ^
     V            |
FlashArray        |PBS Backup
     ^            |
     |NVMe-TCP    V            NFS
     +--------> PVE [PBS VM] <-----> FlashBlade

Backup Performance

from separate PVE host to PBS was running at ~130MB/s
from the same PVE host that PBS runs on was running at ~190MB/s

Troubleshooting & changes
fio from PVE to the FlashArray was getting ~2GB/s as expected
fio from PBS to the FlashBlade was getting ~1GB/s as expected for a single thread
iperf from PVE source host to PBS was getting ~1GB/s as expected for a single thread

PBS VM
from 2 socket/4cores to 1 socket/8 cores -> no change
disabled numa -> no change
spectre mitigations -> no change
reverted kernel from 6.17 to 6.14 -> no change

PBS benchmark

Code:

root@pbs-01:~# proxmox-backup-client benchmark --repository purefb-02-pbs-nfs
Uploaded 319 chunks in 5 seconds.
Time per request: 15778 microseconds.
TLS speed: 265.83 MB/s   
SHA256 speed: 459.42 MB/s   
Compression speed: 380.41 MB/s   
Decompress speed: 540.43 MB/s   
AES256/GCM speed: 446.25 MB/s   
Verify speed: 246.51 MB/s   
┌───────────────────────────────────┬───────────────────┐
│ Name                              │ Value             │
╞═══════════════════════════════════╪═══════════════════╡
│ TLS (maximal backup upload speed) │ 265.83 MB/s (22%) │
├───────────────────────────────────┼───────────────────┤
│ SHA256 checksum computation speed │ 459.42 MB/s (23%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 compression speed    │ 380.41 MB/s (51%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 decompression speed  │ 540.43 MB/s (45%) │
├───────────────────────────────────┼───────────────────┤
│ Chunk verification speed          │ 246.51 MB/s (33%) │
├───────────────────────────────────┼───────────────────┤
│ AES256 GCM encryption speed       │ 446.25 MB/s (12%) │
└───────────────────────────────────┴───────────────────┘

Compared the numbers here to the benchmark wiki page and was surprised our ~5 year old CPUs were performing on par ~10 year old CPUs.

That realisation sent me looking into CPU instruction sets.
change PBS CPU type to host -> AES256 GCM speed increased from 446MB/s to ~3300MB/s! Great! No change to SHA256 speeds which are running about 20% of what I would expect...

Looked into SHA instruction sets and found our intel Cascade Lake CPUs don't have SHA extensions... Apparently they became generally available on Ice Lake CPUs.

So now I'm on the hunt for a physical host with new CPUs that support SHA instruction sets...

I think this is a very important piece of information that gets added to the PBS system requirements documentation, but I'm not sure how to get it in there?

TLDR: The CPUs on the physical servers we run PBS on don't have instruction sets to accelerate SHA calculations, meaning it's being processed entirely in software. This is bottle necking the entire backup traffic path as the SHA256 calculations are used for dedupe.

andylee01 · Apr 24, 2026

So it seems I was only looking at half the picture. I happened to run a backup of another VM in another cluster and suddenly the backup was peaking at ~500MB/s and averaging about 300MB/s.

This got me looking at the source PVE node closer and I found its running a Xeon Gold 6526Y which has Intel's SHA extensions.

PBS benchmark numbers for reference:

Host	PBS VM	Physical node where PBS VM runs	Physical node where source VM runs
CPU	Intel Xeon Gold 6132	Intel Xeon Gold 6132	Intel Xeon Gold 6526Y
Chunks uploaded in 5s	922	232	252
Time per request (us)	5433	21894	20523
TLS speed (MB/s)	772	192	204
SHA256 speed (MB/s)	459	462	1688
Compression speed (MB/s)	440	424	519
Decompress speed (MB/s)	702	627	814
AES256/GCM speed (MB/s)	3324	3370	10131
Verify speed (MB/s)	275	268	548

You can see:
SHA increased by 3.6x
AES increased by 3x
SSL was basically unchanged

More digging with my good friend Mr GPT, suggests:

For backups the PVE node with the source VM is responsible for:

chunking of the VM data
SHA256 hashing of chunks
compression (zstd)
encryption (AES-GCM if enabled)
sends only chunks that don't already exist on PBS over TLS

and the PBS server is responsible for:

stores incoming chunks
updates indexes
verify chunks (SHA256)

During restores, the roles flip. The PBS server is responsible for:

reads the chunk from the datastore
decrypts the chunk
decompresses the chunk
verifies the chunk
reassembles the data
send data to the target PVE node over TLS

and the PVE node is responsible for:

receiving the data stream
writing the data to VM disk

Again this was AI generated, but all seems to make sense. I haven't found any documentation which confirms this and I'm not inclined to scour through source code at the moment to validate it.

Happy to be correct by anyone that knows better.

TLDR: During backup, the CPU on the source PVE node is more important than CPU on PBS for throughput and vice versa during restores.

Nerion · Apr 24, 2026

Hello,
this is exactly the problem we had with our pbs. The only solution is to run pbs on dedicated server with no hardware raid, you must use ZFS with the physical disks. PBS must be able to handle/address the physical disks directly. You can use a zfs "raid". This is the only way how pbs can run with good performance it fully solved our issue and we never had backup performance issues again.

Best Regards
Marco

kaliszad · Apr 24, 2026

andylee01 said:
So it seems I was only looking at half the picture. I happened to run a backup of another VM in another cluster and suddenly the backup was peaking at ~500MB/s and averaging about 300MB/s.

This got me looking at the source PVE node closer and I found its running a Xeon Gold 6526Y which has Intel's SHA extensions.

PBS benchmark numbers for reference:

Host PBS VM Physical node where PBS VM runs Physical node where source VM runs
CPU Intel Xeon Gold 6132 Intel Xeon Gold 6132 Intel Xeon Gold 6526Y
Chunks uploaded in 5s

922

232

252
Time per request (us)

5433

21894

20523
TLS speed (MB/s)

772

192

204
SHA256 speed (MB/s)

459

462

1688
Compression speed (MB/s)

440

424

519
Decompress speed (MB/s)

702

627

814
AES256/GCM speed (MB/s)

3324

3370

10131
Verify speed (MB/s)

275

268

548

You can see:
SHA increased by 3.6x
AES increased by 3x
SSL was basically unchanged

More digging with my good friend Mr GPT, suggests:

For backups the PVE node with the source VM is responsible for:

chunking of the VM data

SHA256 hashing of chunks

compression (zstd)

encryption (AES-GCM if enabled)

sends only chunks that don't already exist on PBS over TLS

and the PBS server is responsible for:

stores incoming chunks

updates indexes

verify chunks (SHA256)

During restores, the roles flip. The PBS server is responsible for:

reads the chunk from the datastore

decrypts the chunk

decompresses the chunk

verifies the chunk

reassembles the data

send data to the target PVE node over TLS

and the PVE node is responsible for:

receiving the data stream

writing the data to VM disk

Again this was AI generated, but all seems to make sense. I haven't found any documentation which confirms this and I'm not inclined to scour through source code at the moment to validate it.

Happy to be correct by anyone that knows better.

TLDR: During backup, the CPU on the source PVE node is more important than CPU on PBS for throughput and vice versa during restores.

Yes, hardware acceleration helps. The TLS speed is probably affected by latency a lot -> that is solved by buffering/ doing more work in parallel if you can. The benchmark is only a small part of the whole picture, don't rely on those numbers too much.

fabian · Apr 24, 2026

During restores, the roles flip. The PBS server is responsible for:

reads the chunk from the datastore

decrypts the chunk

decompresses the chunk

verifies the chunk

reassembles the data

send data to the target PVE node over TLS

and the PVE node is responsible for:

receiving the data stream

writing the data to VM disk

that's not quite true, it really looks more like this:

The PBS server is responsible for:

reads the chunk from the datastore
send data to the target PVE node over TLS

The client/PVE node is responsible for:

receiving the data stream
decrypts the chunk
decompresses the chunk
verifies the chunk
reassembles the data
writing the data to VM disk

the PBS server does parse each index (the fidx/didx files) once when restoring (to know which chunks are part of this backup snapshot and thus allowed to be accessed), but that is fairly cheap.

when doing a backup, both sides will construct the indices in parallel and verify the results match. and the server will calculate the CRC and if unencrypted, verifty the digest of the uploaded chunks, so yes, for a backup the PBS server has to do a bit more compute work.

andylee01 · Apr 24, 2026

kaliszad said:
Yes, hardware acceleration helps. The TLS speed is probably affected by latency a lot -> that is solved by buffering/ doing more work in parallel if you can. The benchmark is only a small part of the whole picture, don't rely on those numbers too much.

Yes I appreciate the benchmark targets very specific stages of the backup data pipeline, however I believe the under performance of backups in my environment lie in these stages, so I'm using it to help guide my investigation.

fabian said:
that's not quite true, it really looks more like this:

The PBS server is responsible for:

reads the chunk from the datastore

send data to the target PVE node over TLS

The client/PVE node is responsible for:

receiving the data stream

decrypts the chunk

decompresses the chunk

verifies the chunk

reassembles the data

writing the data to VM disk

the PBS server does parse each index (the fidx/didx files) once when restoring (to know which chunks are part of this backup snapshot and thus allowed to be accessed), but that is fairly cheap.

when doing a backup, both sides will construct the indices in parallel and verify the results match. and the server will calculate the CRC and if unencrypted, verifty the digest of the uploaded chunks, so yes, for a backup the PBS server has to do a bit more compute work.

Thanks for the insight Fabian!

Because I'm not 100% clear based on your post, could you help shed light on which host (source PVE or target PBS) is performing SHA256 hashing during the backup of VM data from the source PVE node to the target PBS?

As in my previous post, backup transfer rates improved drastically when backing up a VM from a PVE node with SHA instruction sets in the CPU vs a a PVE node without (and therefore calculating SHA hashing in software).
Along with the difference in benchmark numbers indicates to me SHA performance is the primary bottleneck in my environment. Therefore I'm trying to understand if there would likely be any benefit to SHA hashing performance by moving PBS to a server with CPUs does hardware accelerated SHA hasing.

andylee01 · Apr 27, 2026

I've done some digging in the code to answer my question:

PBS client - backup: encrypts, compresses and finally sha256 checksums the original raw data right before uploading the encrypted & compressed data to PBS server 'backup' API
https://git.proxmox.com/?p=proxmox-...ff3fdda0327f25f322ea7ddaea2a8ef7d6096;hb=HEAD

PBS server - backup: 'backup' API receives the chunk from client and writes it to the datastore. no mention of sha256 in the process
https://git.proxmox.com/?p=proxmox-...50002d776b9882bf7971fe600a4ece703af85;hb=HEAD
https://git.proxmox.com/?p=proxmox-...c49487cb951aeb9cdcca8051e49cf2b2aa0e1;hb=HEAD

PBS server - restore: 'reader' API reads chunk from datastore and sends it to client. no mention of sha256 in the process
https://git.proxmox.com/?p=proxmox-...4ba5f7022d2a51db83118bc69ac960d5dbe45;hb=HEAD

PBS client - restore: decompresses, decrypts then does sha256 checksum verify of the chunk after download from PBS server's 'reader' API
https://git.proxmox.com/?p=proxmox-...ba599b90f7104af3333f8e0a746dca27e504c;hb=HEAD

I'm not a developer at the best of times, but my understanding is:

Encryption (AES GCM), compression (ZSTD) and checksumming (SHA256) of chunks during a backup occurs on the PVE node just before upload to PBS.
The index checksum is calculated once at the end of the backup by the PVE node just before upload to PBS.
The PBS server doesn't appear to encrypt, compress or checksum the backup data or index at all as part of a backup job. It simply receives the data and stores it away.
During a verify, the PBS server calculates checksums on unencrypted chunks - PBS doesn't have the encryption key to checksum encrypted chunks.
During a sync, the receiving PBS server calculates checksums of the received chunks and compares with the expected hash.

From all that, my conclusions are:

The CPU on the PVE node does most of the heavy lifting during a backup/restore job, therefore data transfer rates are heavily dependant on the CPU of the PVE node.
The CPU of PBS impacts verify and sync jobs of unencrypted backups.
TLS performance is dependant on both the PVE node and the PBS server as one side encrypts and the other side must decrypt.

Of course I'm assuming storage performance on both sides and network performance all round are up to the task

IsThisThingOn · Apr 27, 2026

I am not 100% certain either. But since we use LLMs to make wild guesses, here are my wild human made guesses:

andylee01 said:
The PBS server doesn't appear to encrypt, compress or checksum the backup data

Course not, incoming data is already optionally encrypted and or compressed. Checksums (verify) are probably not important at this point in time.

andylee01 said:
During a verify, the PBS server calculates checksums on unencrypted chunks - PBS doesn't have the encryption key to checksum encrypted chunks.

Since the PBS does not have the encryption key, it can only checksum the encrypted chunks. That is why both PVE and PBS will calculate that after encryption. Probably also after compression?

andylee01 said:
During a sync, the receiving PBS server calculates checksums of the received chunks and compares with the expected hash.

I don't think so. I think that verify is always "async" and only the exotic(?) option "resync-corrupt" might does run a verfiy job beforehand?

andylee01 said:
The CPU on the PVE node does most of the heavy lifting during a backup/restore job, therefore data transfer rates are heavily dependant on the CPU of the PVE node.

And the PVE storage, and your connection and the PBS storage. But yeah, that is also seen in the benchmark tool. Besides the TLS, which actually does upload a file to PBS, all benchmarks happen on PVE. And even that part is depending on PVE in terms of AES-256-GCM(?) encryption.

andylee01 said:
The CPU of PBS impacts verify

Sure, but not important, because it runs during off times. Verify will most likely be limited by storage, not CPU.

andylee01 said:
sync jobs of unencrypted backups.

No. See above. Has nothing to do with encrypted or not.

andylee01 said:
TLS performance is dependant on both the PVE node and the PBS server as one side encrypts and the other side must decrypt.

Yes and no. Yes it depends on that. But since both sides use AES-256-GCM where even a single core can easely do 10GB/s, that is like saying both sides depend on electricity. Not wrong, but...

fabian · Apr 27, 2026

andylee01 said:
I've done some digging in the code to answer my question:

PBS client - backup: encrypts, compresses and finally sha256 checksums the original raw data right before uploading the encrypted & compressed data to PBS server 'backup' API
https://git.proxmox.com/?p=proxmox-...ff3fdda0327f25f322ea7ddaea2a8ef7d6096;hb=HEAD

PBS server - backup: 'backup' API receives the chunk from client and writes it to the datastore. no mention of sha256 in the process
https://git.proxmox.com/?p=proxmox-...50002d776b9882bf7971fe600a4ece703af85;hb=HEAD
https://git.proxmox.com/?p=proxmox-...c49487cb951aeb9cdcca8051e49cf2b2aa0e1;hb=HEAD

like I wrote earlier - the server will calculate and set the CRC for all incoming chunks, and will verify the digest if the chunk is unencrypted (verifying the digest of an encrypted chunk would require the encryption key, which only the client has).

https://git.proxmox.com/?p=proxmox-...2d776b9882bf7971fe600a4ece703af85;hb=HEAD#l77

andylee01 said:
PBS server - restore: 'reader' API reads chunk from datastore and sends it to client. no mention of sha256 in the process
https://git.proxmox.com/?p=proxmox-...4ba5f7022d2a51db83118bc69ac960d5dbe45;hb=HEAD

PBS client - restore: decompresses, decrypts then does sha256 checksum verify of the chunk after download from PBS server's 'reader' API
https://git.proxmox.com/?p=proxmox-...ba599b90f7104af3333f8e0a746dca27e504c;hb=HEAD

I'm not a developer at the best of times, but my understanding is:

Encryption (AES GCM), compression (ZSTD) and checksumming (SHA256) of chunks during a backup occurs on the PVE node just before upload to PBS.

The index checksum is calculated once at the end of the backup by the PVE node just before upload to PBS.

The PBS server doesn't appear to encrypt, compress or checksum the backup data or index at all as part of a backup job. It simply receives the data and stores it away.

no, see above. it also does the index calculations on its own:

https://git.proxmox.com/?p=proxmox-...fc70c99c653761f55427db5cb4ab5d0e;hb=HEAD#l539
https://git.proxmox.com/?p=proxmox-...fc70c99c653761f55427db5cb4ab5d0e;hb=HEAD#l648

andylee01 said:
During a verify, the PBS server calculates checksums on unencrypted chunks - PBS doesn't have the encryption key to checksum encrypted chunks.

yes. it will always verify the CRC to detect storage corruption, but that is fairly cheap to calculate.

andylee01 said:
During a sync, the receiving PBS server calculates checksums of the received chunks and compares with the expected hash.

yes (if the chunks are not encrypted).

andylee01 said:
From all that, my conclusions are:

The CPU on the PVE node does most of the heavy lifting during a backup/restore job, therefore data transfer rates are heavily dependant on the CPU of the PVE node.

see above

andylee01 said:
The CPU of PBS impacts verify and sync jobs of unencrypted backups.

TLS performance is dependant on both the PVE node and the PBS server as one side encrypts and the other side must decrypt.

Of course I'm assuming storage performance on both sides and network performance all round are up to the task

Johannes S · Apr 27, 2026

I think I missed something but how does verification of encrypted chunks during a verify-job work?

PBS Performance improvements (enterprise all-flash)

Famous Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Famous Member

Member

Famous Member

Member

Distinguished Member

Famous Member

New Member

New Member

Member

Member

Proxmox Staff Member

New Member

New Member

Renowned Member

Proxmox Staff Member

Distinguished Member

We value your privacy