Abysmally slow restore from backup

Using environment variables probably means running the restore command in shell, right ?
That would be a let down. This should be easily adjusted i.e. at datacenter.cfg so a normal restore from the web UI applies them (i.e. non root user(s) with enough privileges to restore their VMs). I would be ok if you had to use cli to set those values, although IMO they should be available in the webUI too.
I would love to see this backported to PVE8.4.x
 
  • Like
Reactions: lucius_the
Using environment variables probably means running the restore command in shell, right ?

Not if you set the variable system wide, then when the actual command is ran by the system it'll just use those values.

Basically saying, $threadcount=14

and then itstead of having a hard coded number, it references the $threadcount in the system.
 
Not if you set the variable system wide, then when the actual command is ran by the system it'll just use those values.

Basically saying, $threadcount=14

and then itstead of having a hard coded number, it references the $threadcount in the system.
Right. That's because of my (somewhat limited) Linux knowledge... I thought that env vars are per-user, but forgot about the existence of system wide env variables ;) so that's should do. Thanks !

So basically I put them in /etc/environment.
 
Last edited:
Nope, I put this in /etc/environment:
PBS_RESTORE_FETCH_CONCURRENCY=8
PBS_RESTORE_MAX_THREADS=2

I did that both on PBS side and on PVE side, just to be sure (although I think this should only be set on PVE side). But it's still using 16 and 4 respectively, when I try restore via GUI... I must be missing something.

Edit: after some digging: it seems the restore session, where the restore command runs, doesn't take this system env var into account. I guess this can only be set when running restore via command line for now. Unless I'm completely missing something.
 
Last edited:
Oh, found a way that works. In case anyone is interested, on PVE host doing the restore:
> systemctl edit pvedaemon

Add the lines:

[Service]
Environment="PBS_RESTORE_FETCH_CONCURRENCY=8"
Environment="PBS_RESTORE_MAX_THREADS=2"

(or whatever numbers you want)

Save & exit, then:
> systemctl daemon-reexec
> systemctl restart pvedaemon

Now, when doing a restore via GUI, it takes these variables into account.

Explanation (as far as i understand it): pvedaemon service is responsible for running the restore commands when doing restores from GUI. So it needs to have these env vars set. This is creating a systemd unit override, that adds the required env vars on top of existing unit definition. Why it's not working when env vars are just put in /etc/environment I'm not sure, perhaps that file is not consulted by systemd - I don't know really. But when env vars are set to this service it works.
 
Last edited:
Glad you found it. Had a half written post suggesting you to check pvedaemon systemd unit, as any process started by a systemd service inherits the environment of the service itself + any other envars set when starting the process. Here is where pvedaemon could read a setting from datacenter.cfg and pass the variables along to proxmox-backup-client to process the recovery.


> systemctl daemon-reexec
Use systemctl daemon-reload instead, you don't need to restart initd, just reload the unit's files.


Why it's not working when env vars are just put in /etc/environment I'm not sure
Because /etc/environment is used only for login shells, which a systemd service, by default, doesn't use.
 
Thanks for clarification @VictorSTS - new things learned ;)

So... great progress in PBS, around VM restore speed. What remains:
  • verify speed (important in case you have sync jobs, that may even be cascaded below the main PBS to other PBS-es -> if verifies could complete more quickly, sync jobs could use already verified chucks and complete the whole backup dance within a backup window time - without the need to verify chunks later, removing redundant processing from other PBS instances that are pulling chunks from main PBS)
  • LXC restore speed (not affected with this - and I use of CT-s a lot)
  • someone in this thread also mentioned host-based backup restore speed
We can hopefully now agree that code can actually be improved - and that CPU, slow storage and whatnot - was not the first candidate to blame.
I am wandering if PBS maintainers could now take a look to see if this code can be adopted to these other points of interest.

I would really be interested in making an effort on my side and producing a pull request, problem is I have zero experience with the language this is written in and such effort would likely prove either very unproductive, or produce some substandard code at best.
 
Last edited:
  • Like
Reactions: Johannes S
fyi, i'll probably look into doing similar changes to other parts of the code.

e.g. i already sent a patch to try to improve verify speed: https://lore.proxmox.com/pbs-devel/20250707132706.2854973-1-d.csapak@proxmox.com/
it's not configurable yet (the biggest hurdle here is that we don't simply want to have a large number of knobs the admin must use, so we'd like to have a better system for that) but i wanted to start sending something that shows we can improve the perf by increasing the number of work done in parallel (e.g. here the loading of chunks from disk)
 
This is great news !
I've read through the lore, this looks very promising and shows double throughput in your setup (I assume it could be more on different storage). Looking forward to trying it out when it's ready to be released.

Btw, don't run away from knobs... Most people will never change defaults anyway and those who have a need to will cherish the possibility.
 
I agree with @lucius_the regarding the "knobs", as it would allow getting the most of any hardware configuration. We all know both PVE and PBS are run in a very broad spectrum of hardware. At the very least something that can be set on datacenter.cfg, even if it's just CLI. To me having to edit systemd services seems cumbersome, error prone and easy to lost track (i.e. add new host to cluster). As Parner feels as hard to support as any other system customization.
 
fyi, i'll probably look into doing similar changes to other parts of the code.

e.g. i already sent a patch to try to improve verify speed: https://lore.proxmox.com/pbs-devel/20250707132706.2854973-1-d.csapak@proxmox.com/
it's not configurable yet (the biggest hurdle here is that we don't simply want to have a large number of knobs the admin must use, so we'd like to have a better system for that) but i wanted to start sending something that shows we can improve the perf by increasing the number of work done in parallel (e.g. here the loading of chunks from disk)
This would be highly appreciated. I have looked at the patches and that approach looks like a clear improvement.
 
fyi, i'll probably look into doing similar changes to other parts of the code.

Totally taking a shot in the dark here, is the ESXi to PVE done in a similar manner? I haven't tried it in a while, but I remember it being an incredibly long process for even small VMs over a 10GbE connection from flash to flash. It may have improved since then, but wonder if there's some applicable changes that can be made?


____

Running an import from the ESXi to PVE with the import tool with 10GbE, it's awful slow.
1753473615175.png
1753473627824.png
15 mins in at 0% still, averages about 1MB/s.

I'll run a test from another PVE host to verify this is not an isolated thing, but I don't remember it being this slow.
 
Last edited:
15 mins in at 0% still, averages about 1MB/s.

I'll run a test from another PVE host to verify this is not an isolated thing, but I don't remember it being this slow.

As far as I remember, when I migrated off of ESXi, I only used built-in procedure for very small VM-s. For larger ones, I actually shut them down in ESXi then copied VMDK-s over to PVE via NFS and then used the built in PVE tools to import VMDK disks to PVE storage (previously configuring the new VMs on PVE manually, then importing/converting disks - or something along those lines). That was much faster. Part of the reason could be that on ESXi side the management interface is used for transfer and there was some bandwidth limit (max 1 Gbps I think). Anyway, try using NFS to copy the VMDK-s over.

Or use the free Veeam agent to backup the original VM and then restore it to the new Proxmox VM (just use Veeam recovery ISO to boot the new VM and restore from there). That works well if your VM-s are Windows (Linux should work too, but didn't try). You can use SMB or NFS to store Veeam backups on. It's quite simple and you're more likely to get a bootable system on first try.

If you have more questions perhaps open a new thread so we don't stray away from the topic here.
 
Last edited:
  • Like
Reactions: PwrBank
Did a test right now with production level hardware (Epyc Gen4, many cores, much ram, 5 node cluster + Ceph 3/2 pool on NVMe drives, 25G networks and PBS with a 8 HDD raid10 + special device 74% full, nearly 15000 snapshots):

libproxmox-backup-qemu0 v1.5.1
Code:
progress 100% (read 80530636800 bytes, zeroes = 29% (23983030272 bytes), duration 442 sec)
restore image complete (bytes=80530636800, duration=442.68s, speed=173.49MB/s)

Did a test right now with production level hardware:

libproxmox-backup-qemu0 v1.5.2 (installed just this package from no-subscription repo):
Code:
progress 100% (read 80530636800 bytes, zeroes = 29% (23983030272 bytes), duration 45 sec)
restore image complete (bytes=80530636800, duration=45.22s, speed=1698.21MB/s)

I would call it as a win! ;) Thanks everyone involved! (and please add some knobs to finetune this :)).
 
Did a test right now with production level hardware (Epyc Gen4, many cores, much ram, 5 node cluster + Ceph 3/2 pool on NVMe drives, 25G networks and PBS with a 8 HDD raid10 + special device 74% full, nearly 15000 snapshots):

libproxmox-backup-qemu0 v1.5.1
Code:
progress 100% (read 80530636800 bytes, zeroes = 29% (23983030272 bytes), duration 442 sec)
restore image complete (bytes=80530636800, duration=442.68s, speed=173.49MB/s)

Did a test right now with production level hardware:

libproxmox-backup-qemu0 v1.5.2 (installed just this package from no-subscription repo):
Code:
progress 100% (read 80530636800 bytes, zeroes = 29% (23983030272 bytes), duration 45 sec)
restore image complete (bytes=80530636800, duration=45.22s, speed=1698.21MB/s)

I would call it as a win! ;) Thanks everyone involved! (and please add some knobs to finetune this :)).
Is that with the default configuration? Looks pretty great to me!

Have you tried adjusting the env vars for the restore command? e.g.
Code:
PBS_RESTORE_FETCH_CONCURRENCY=64 PBS_RESTORE_MAX_THREADS=8 qmrestore
or
Code:
PBS_RESTORE_FETCH_CONCURRENCY=128 PBS_RESTORE_MAX_THREADS=16 qmrestore
with the appropriate parameters.
 
  • Like
Reactions: waltar
Is that with the default configuration?
Yes, full default settings. Install package, do the restore from webUI. This is the full log, which shows it used 4 restore threads, 16 parallel chunks:

Code:
new volume ID is 'Ceph_VMs:vm-5002-disk-0'
restore proxmox backup image: [REDACTED]
connecting to repository '[REDACTED]'
using up to 4 threads
open block backend for target 'rbd:Ceph_VMs/vm-5002-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph_VMs.keyring'
starting to restore snapshot 'vm/5000/2025-07-29T13:18:46Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 1% (read 805306368 bytes, zeroes = 3% (29360128 bytes), duration 0 sec)
progress 2% (read 1610612736 bytes, zeroes = 1% (29360128 bytes), duration 1 sec)
progress 3% (read 2415919104 bytes, zeroes = 1% (37748736 bytes), duration 1 sec)
progress 4% (read 3221225472 bytes, zeroes = 1% (37748736 bytes), duration 2 sec)
progress 5% (read 4026531840 bytes, zeroes = 0% (37748736 bytes), duration 3 sec)
progress 6% (read 4831838208 bytes, zeroes = 1% (50331648 bytes), duration 4 sec)
progress 7% (read 5637144576 bytes, zeroes = 0% (50331648 bytes), duration 4 sec)
progress 8% (read 6442450944 bytes, zeroes = 0% (58720256 bytes), duration 5 sec)
progress 9% (read 7247757312 bytes, zeroes = 0% (58720256 bytes), duration 5 sec)
progress 10% (read 8053063680 bytes, zeroes = 0% (58720256 bytes), duration 6 sec)
progress 11% (read 8858370048 bytes, zeroes = 0% (75497472 bytes), duration 7 sec)
progress 12% (read 9663676416 bytes, zeroes = 0% (75497472 bytes), duration 8 sec)
progress 13% (read 10468982784 bytes, zeroes = 0% (75497472 bytes), duration 8 sec)
progress 14% (read 11274289152 bytes, zeroes = 0% (92274688 bytes), duration 9 sec)
progress 15% (read 12079595520 bytes, zeroes = 0% (92274688 bytes), duration 10 sec)
progress 16% (read 12884901888 bytes, zeroes = 0% (117440512 bytes), duration 10 sec)
progress 17% (read 13690208256 bytes, zeroes = 0% (121634816 bytes), duration 11 sec)
progress 18% (read 14495514624 bytes, zeroes = 0% (121634816 bytes), duration 11 sec)
progress 19% (read 15300820992 bytes, zeroes = 0% (146800640 bytes), duration 12 sec)
progress 20% (read 16106127360 bytes, zeroes = 0% (146800640 bytes), duration 13 sec)
progress 21% (read 16911433728 bytes, zeroes = 0% (146800640 bytes), duration 14 sec)
progress 22% (read 17716740096 bytes, zeroes = 0% (176160768 bytes), duration 14 sec)
progress 23% (read 18522046464 bytes, zeroes = 0% (176160768 bytes), duration 15 sec)
progress 24% (read 19327352832 bytes, zeroes = 1% (218103808 bytes), duration 15 sec)
progress 25% (read 20132659200 bytes, zeroes = 1% (222298112 bytes), duration 16 sec)
progress 26% (read 20937965568 bytes, zeroes = 1% (222298112 bytes), duration 17 sec)
progress 27% (read 21743271936 bytes, zeroes = 1% (255852544 bytes), duration 18 sec)
progress 28% (read 22548578304 bytes, zeroes = 1% (255852544 bytes), duration 18 sec)
progress 29% (read 23353884672 bytes, zeroes = 1% (255852544 bytes), duration 19 sec)
progress 30% (read 24159191040 bytes, zeroes = 1% (289406976 bytes), duration 20 sec)
progress 31% (read 24964497408 bytes, zeroes = 1% (318767104 bytes), duration 20 sec)
progress 32% (read 25769803776 bytes, zeroes = 1% (327155712 bytes), duration 21 sec)
progress 33% (read 26575110144 bytes, zeroes = 1% (331350016 bytes), duration 22 sec)
progress 34% (read 27380416512 bytes, zeroes = 1% (331350016 bytes), duration 22 sec)
progress 35% (read 28185722880 bytes, zeroes = 1% (343932928 bytes), duration 23 sec)
progress 36% (read 28991029248 bytes, zeroes = 1% (343932928 bytes), duration 24 sec)
progress 37% (read 29796335616 bytes, zeroes = 1% (343932928 bytes), duration 24 sec)
progress 38% (read 30601641984 bytes, zeroes = 1% (364904448 bytes), duration 25 sec)
progress 39% (read 31406948352 bytes, zeroes = 1% (364904448 bytes), duration 26 sec)
progress 40% (read 32212254720 bytes, zeroes = 1% (377487360 bytes), duration 26 sec)
progress 41% (read 33017561088 bytes, zeroes = 1% (377487360 bytes), duration 27 sec)
progress 42% (read 33822867456 bytes, zeroes = 1% (377487360 bytes), duration 28 sec)
progress 43% (read 34628173824 bytes, zeroes = 1% (394264576 bytes), duration 29 sec)
progress 44% (read 35433480192 bytes, zeroes = 1% (394264576 bytes), duration 29 sec)
progress 45% (read 36238786560 bytes, zeroes = 1% (394264576 bytes), duration 30 sec)
progress 46% (read 37044092928 bytes, zeroes = 1% (436207616 bytes), duration 30 sec)
progress 47% (read 37849399296 bytes, zeroes = 1% (436207616 bytes), duration 31 sec)
progress 48% (read 38654705664 bytes, zeroes = 1% (469762048 bytes), duration 32 sec)
progress 49% (read 39460012032 bytes, zeroes = 1% (469762048 bytes), duration 32 sec)
progress 50% (read 40265318400 bytes, zeroes = 1% (503316480 bytes), duration 33 sec)
progress 51% (read 41070624768 bytes, zeroes = 1% (553648128 bytes), duration 33 sec)
progress 52% (read 41875931136 bytes, zeroes = 1% (553648128 bytes), duration 34 sec)
progress 53% (read 42681237504 bytes, zeroes = 1% (553648128 bytes), duration 34 sec)
progress 54% (read 43486543872 bytes, zeroes = 1% (633339904 bytes), duration 35 sec)
progress 55% (read 44291850240 bytes, zeroes = 1% (633339904 bytes), duration 36 sec)
progress 56% (read 45097156608 bytes, zeroes = 1% (683671552 bytes), duration 36 sec)
progress 57% (read 45902462976 bytes, zeroes = 1% (763363328 bytes), duration 37 sec)
progress 58% (read 46707769344 bytes, zeroes = 1% (763363328 bytes), duration 37 sec)
progress 59% (read 47513075712 bytes, zeroes = 1% (893386752 bytes), duration 38 sec)
progress 60% (read 48318382080 bytes, zeroes = 2% (981467136 bytes), duration 38 sec)
progress 61% (read 49123688448 bytes, zeroes = 1% (981467136 bytes), duration 39 sec)
progress 62% (read 49928994816 bytes, zeroes = 2% (1111490560 bytes), duration 39 sec)
progress 63% (read 50734301184 bytes, zeroes = 2% (1111490560 bytes), duration 40 sec)
progress 64% (read 51539607552 bytes, zeroes = 2% (1166016512 bytes), duration 40 sec)
progress 65% (read 52344913920 bytes, zeroes = 2% (1275068416 bytes), duration 41 sec)
progress 66% (read 53150220288 bytes, zeroes = 2% (1367343104 bytes), duration 41 sec)
progress 67% (read 53955526656 bytes, zeroes = 3% (1824522240 bytes), duration 41 sec)
progress 68% (read 54760833024 bytes, zeroes = 4% (2445279232 bytes), duration 42 sec)
progress 69% (read 55566139392 bytes, zeroes = 5% (3183476736 bytes), duration 42 sec)
progress 70% (read 56371445760 bytes, zeroes = 6% (3531603968 bytes), duration 42 sec)
progress 71% (read 57176752128 bytes, zeroes = 6% (3917479936 bytes), duration 42 sec)
progress 72% (read 57982058496 bytes, zeroes = 7% (4617928704 bytes), duration 42 sec)
progress 73% (read 58787364864 bytes, zeroes = 9% (5385486336 bytes), duration 42 sec)
progress 74% (read 59592671232 bytes, zeroes = 10% (6111100928 bytes), duration 43 sec)
progress 75% (read 60397977600 bytes, zeroes = 11% (6916407296 bytes), duration 43 sec)
progress 76% (read 61203283968 bytes, zeroes = 12% (7562330112 bytes), duration 43 sec)
progress 77% (read 62008590336 bytes, zeroes = 13% (8350859264 bytes), duration 43 sec)
progress 78% (read 62813896704 bytes, zeroes = 14% (9151971328 bytes), duration 43 sec)
progress 79% (read 63619203072 bytes, zeroes = 15% (9944694784 bytes), duration 43 sec)
progress 80% (read 64424509440 bytes, zeroes = 16% (10611589120 bytes), duration 43 sec)
progress 81% (read 65229815808 bytes, zeroes = 16% (11085545472 bytes), duration 43 sec)
progress 82% (read 66035122176 bytes, zeroes = 17% (11312037888 bytes), duration 43 sec)
progress 83% (read 66840428544 bytes, zeroes = 17% (11576279040 bytes), duration 44 sec)
progress 84% (read 67645734912 bytes, zeroes = 18% (12222201856 bytes), duration 44 sec)
progress 85% (read 68451041280 bytes, zeroes = 18% (12603883520 bytes), duration 44 sec)
progress 86% (read 69256347648 bytes, zeroes = 18% (12872318976 bytes), duration 45 sec)
progress 87% (read 70061654016 bytes, zeroes = 19% (13602127872 bytes), duration 45 sec)
progress 88% (read 70866960384 bytes, zeroes = 20% (14403239936 bytes), duration 45 sec)
progress 89% (read 71672266752 bytes, zeroes = 21% (15208546304 bytes), duration 45 sec)
progress 90% (read 72477573120 bytes, zeroes = 22% (16013852672 bytes), duration 45 sec)
progress 91% (read 73282879488 bytes, zeroes = 22% (16810770432 bytes), duration 45 sec)
progress 92% (read 74088185856 bytes, zeroes = 23% (17616076800 bytes), duration 45 sec)
progress 93% (read 74893492224 bytes, zeroes = 24% (18421383168 bytes), duration 45 sec)
progress 94% (read 75698798592 bytes, zeroes = 25% (19226689536 bytes), duration 45 sec)
progress 95% (read 76504104960 bytes, zeroes = 26% (20031995904 bytes), duration 45 sec)
progress 96% (read 77309411328 bytes, zeroes = 26% (20833107968 bytes), duration 45 sec)
progress 97% (read 78114717696 bytes, zeroes = 27% (21575499776 bytes), duration 45 sec)
progress 98% (read 78920024064 bytes, zeroes = 28% (22380806144 bytes), duration 45 sec)
progress 99% (read 79725330432 bytes, zeroes = 29% (23186112512 bytes), duration 45 sec)
progress 100% (read 80530636800 bytes, zeroes = 29% (23983030272 bytes), duration 45 sec)
restore image complete (bytes=80530636800, duration=45.22s, speed=1698.21MB/s)
rescan volumes...
TASK OK
 
  • Like
Reactions: waltar and kaliszad