Abysmally slow restore from backup

New binaries with even more stats for debugging and a choice for concurrency:
8-way concurrency in fetching chunks:

8a4ab1a531f2508aad764f3fa5cf93a62cdf96c3552a5b428a9706e88879b63d libproxmox_backup_qemu.so_19062025_1402_8concur
https://notnullmakers.com/public/media/libproxmox_backup_qemu.so_19062025_1402_8concur

4-way concurrency in fetching chunks:

2fa480f4f030c7dd3318292494e5e0a0ddedc8629931645fa32f2ed94402ce09 libproxmox_backup_qemu.so_19062025_1406_4concur
https://notnullmakers.com/public/media/libproxmox_backup_qemu.so_19062025_1406_4concur

16-way concurrency in fetching chunks:
c5ada6ef32af6463f68a0018eea89aa83709d89f893f9332f6d5191543e45899 libproxmox_backup_qemu.so_19062025_1441_16concur
https://notnullmakers.com/public/media/libproxmox_backup_qemu.so_19062025_1441_16concur

Again, of course I recommend you do a backup of the original /usr/lib/libproxmox_backup_qemu.so.0 binary and check e.g. with VirusTotal or other tools.

You can check the current state of changes here: https://github.com/NOT-NULL-Makers/...mmit/af01a18e5672b3e72a8b2f876fddd10edf71a975
 
Ok, as requested on the mailing list, I have made some effort to make the number of threads configurable:

The number of threads makes sense between 1 and 32 (this is max_blocking_threads limit of the BackupSetup/ Tokio thread pool).

The SHA256 is: 2f88110f07cdf3dd7e70b937629adda29e1728146f7b1ec1171fbac4cecbe59f
https://notnullmakers.com/public/media/libproxmox_backup_qemu.so_27062025_configurable-concur

You can run this like
Code:
PBS_RESTORE_CONCURRENCY=16 qmrestore hel2:backup/vm/101/2025-06-09T10:04:25Z 101 --force
for instance. If you don't provide the environment variable, 4 threads are used currently.

Again, of course I recommend you do a backup of the original /usr/lib/libproxmox_backup_qemu.so.0 binary and check e.g. with VirusTotal or other tools.
 
  • Like
Reactions: spirit and _gabriel
@kaliszad In your testing have you been able to achieve more than 1GB/s on restores without a lot of zeros in them? So far the highest I've seen is around 1.1GB/s. Is that the API bottleneck you were referring to?
 
@kaliszad In your testing have you been able to achieve more than 1GB/s on restores without a lot of zeros in them? So far the highest I've seen is around 1.1GB/s. Is that the API bottleneck you were referring to?
Yes, this is really bothering me. I am able to achieve ~3 GBps using a simple mbuffer transfer of a huge random file with a huge buffer of about 100 MB but that's still one TCP connection that's just really well saturated. So doing the hardware can definitely sustain a much larger load. Encryption shouldn't slow things down too much at these speeds. Even if, the difference of ~2 GBps between what the Proxmox restore achieves and the transfer is something I would like to have an explanation for.

If I do a CPU flamegraph on the new version that looks like it saturates some resources it shows about 33.9 % of stacktraces in libcrypto.so.3.

That being said our client is happy with the speedup for now so we are looking for customers that would need further optimizations to fullfil their SLA or something like that. There might also be some areas in other parts of the project which could help to speed up Proxmox backup if that's what is taking too long for you.
 
Please tell me more about this. What's the problem on what setup?
There were requests to pulling/(pushing) groups in parallel for sync jobs from/to remote PBS instances. See the discussion on https://bugzilla.proxmox.com/show_bug.cgi?id=4182 and the latest patch series https://lore.proxmox.com/pbs-devel/20250404134936.425392-1-c.ebner@proxmox.com/T/

Unfortunately didn't had the time to look into the influences of congestion control yet and the argument with not wanting to expose config flags for parallelization for each task still holds.
 
The current state of things seems promising, @dcsapak rewrote the patch into using more of the async infrastructure already present.

Here is the most recent form of the patch compiled into a binary...

SHA256
7442911f2f890f3cdb9c5aaef08911eb7b731689ce9ec3d17083cf82bc95f83b

https://notnullmakers.com/public/media/libproxmox_backup_qemu.so_14072025_async-v3

You can change the parallelism using these environment variables:

PBS_RESTORE_FETCH_CONCURRENCY=16 PBS_RESTORE_MAX_THREADS=4 before the qmrestore command.

Again, of course I recommend you do a backup of the original /usr/lib/libproxmox_backup_qemu.so.0 binary and check e.g. with VirusTotal or other tools.

The code is here: https://github.com/NOT-NULL-Makers/proxmox-backup-qemu/blob/async-chunk-reader-v3/src/restore.rs
 
Last edited:
Can we hope that the proxmox-backup-client ( for restoring entire disk ) will have this improvement too in the future ?
 
  • Like
Reactions: Dark26
This code look pretty similar to the code we have already changed:

https://github.com/proxmox/proxmox-...1b537/proxmox-backup-client/src/main.rs#L1109

So without any further analysis I would say it is likely it could help.

In what context do you restore the entire disk? What numbers on what hardware do you see?
We use PBS with proxmox backup client for the backup of openstack instance / physical hardware. We backup the entire disk , so it's very big in size, the speed for the backup is good ( thanks dedup / compression ), but the restore is horrible.

I think the culprit is similar in the proxmox backup client.
 
  • Like
Reactions: kaliszad
We use PBS with proxmox backup client for the backup of openstack instance / physical hardware. We backup the entire disk , so it's very big in size, the speed for the backup is good ( thanks dedup / compression ), but the restore is horrible.

I think the culprit is similar in the proxmox backup client.
Hi, I have written a direct message to you.
 
  • Like
Reactions: Dark26