Abysmally slow restore from backup

PwrBank · Mar 10, 2025

I will try this on a bigger VM, but I'm not really having any issues with PBS restoring moderately sized VMs.

Code:

new volume ID is 'local-zfs:vm-170-disk-0'
new volume ID is 'local-zfs:vm-170-disk-1'
restore proxmox backup image: /usr/bin/pbs-restore --repository root@pam@10.60.110.155:truenas --ns DER vm/170/2025-03-03T20:21:36Z drive-sata0.img.fidx /dev/zvol/rpool/data/vm-170-disk-0 --verbose --format raw --skip-zero
connecting to repository 'root@pam@10.60.110.155:truenas'
open block backend for target '/dev/zvol/rpool/data/vm-170-disk-0'
starting to restore snapshot 'vm/170/2025-03-03T20:21:36Z'
download and verify backup index
progress 1% (read 12582912 bytes, zeroes = 0% (0 bytes), duration 0 sec)
progress 2% (read 25165824 bytes, zeroes = 0% (0 bytes), duration 0 sec)
progress 3% (read 33554432 bytes, zeroes = 0% (0 bytes), duration 0 sec)
progress 4% (read 46137344 bytes, zeroes = 18% (8388608 bytes), duration 0 sec)
progress 5% (read 54525952 bytes, zeroes = 23% (12582912 bytes), duration 0 sec)
...truncated...
progress 98% (read 1031798784 bytes, zeroes = 12% (130023424 bytes), duration 54 sec)
progress 99% (read 1044381696 bytes, zeroes = 12% (130023424 bytes), duration 54 sec)
progress 100% (read 1052770304 bytes, zeroes = 12% (134217728 bytes), duration 54 sec)
restore image complete (bytes=1052770304, duration=54.83s, speed=18.31MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository root@pam@10.60.110.155:truenas --ns DER vm/170/2025-03-03T20:21:36Z drive-sata1.img.fidx /dev/zvol/rpool/data/vm-170-disk-1 --verbose --format raw --skip-zero
connecting to repository 'root@pam@10.60.110.155:truenas'
open block backend for target '/dev/zvol/rpool/data/vm-170-disk-1'
starting to restore snapshot 'vm/170/2025-03-03T20:21:36Z'
download and verify backup index
progress 1% (read 859832320 bytes, zeroes = 13% (117440512 bytes), duration 10 sec)
...truncated...
progress 97% (read 83328237568 bytes, zeroes = 95% (79666610176 bytes), duration 73 sec)
progress 98% (read 84188069888 bytes, zeroes = 95% (80526442496 bytes), duration 73 sec)
progress 99% (read 85047902208 bytes, zeroes = 95% (81386274816 bytes), duration 73 sec)
progress 100% (read 85902491648 bytes, zeroes = 95% (82237718528 bytes), duration 73 sec)
restore image complete (bytes=85902491648, duration=73.93s, speed=1108.06MB/s)
rescan volumes...
TASK OK

80GB over 54 seconds isn't terrible, but I could see this being quite the issue when you are pulling a couple of TBs.

lucius_the · Mar 10, 2025

Yeah, except that's not actually 80 GBs of data you read from backup files, as most of that is probably unused space... and if it was backed up recently it could have been read mostly from cache, as well. I see that you're using TrueNAS via iSCSI - and we know nothing about the TrueNAS setup of yours to give those numbers any meaning.

lucius_the · Mar 10, 2025

Check my older posts in this thread... Veeam reads about 10x faster than PBS, when doing restore. That's on same hardware, with PBS and Veeam both using a ZFS dataset as it's backing store (configured with same parameters, on the same pool, on the same server).

PwrBank · Mar 10, 2025

I'll setup Veeam to do a comparison then...

lucius_the · Mar 10, 2025

Please explain your setup too, especially what is being used to store backups (what hard drives, what FS, what RAID level, ZFS details if using that, etc).

PwrBank · Mar 10, 2025

It is a TrueNAS 25.04 system with PBS running as a container with all available system resources accessible.

2x Intel(R) Xeon(R) CPU E5-2670 v3
160GB of DDR4 ECC
5x 1.92TB Samsung PM893 SATA SSD in RAIDz1
A 1.6TB NVMe SLOG

ZFS details on the dataset:
Type: FILESYSTEM
Sync: STANDARD
Compression Level: Inherit (LZ4)
Enable Atime: OFF
ZFS Deduplication: OFF
Case Sensitivity: ON
Path: ssd/pbs
Record Size: 128KiB

The TrueNAS and PVE are at totally different sites, using a site-to-site connection to backup and restore from. So it will max out at 1gbps no matter what. I can try some local restores once there is some Proxmox infra on that site, but currently there isn't.

The on-prem storage is a Pure X20 array attached with 2x25GbE links with iSCSI. It can currently max out the combined 25GbE connections.

Currently running a PBS backup of a 300GB VM that is 100% full (for testing). While running the backup I ran a fio test on the Pure storage to make sure it's not bottlenecking, I'd say it isn't.

Current status of the backup

Code:

INFO: starting new backup job: vzdump 103 --remove 0 --notes-template '{{guestname}}' --mode snapshot --node der-pve3 --notification-mode auto --storage gar-truenas
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2025-03-10 08:24:15
INFO: status = running
INFO: VM Name: ubuntu
INFO: include disk 'scsi0' 'pure:vm-103-disk-0' 300G
Info :: Waiting (0s) for map volume "vm-103-disk-0"...
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/103/2025-03-10T17:24:15Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '62f959e2-e125-44f3-a36f-58ee0e7928ff'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: created new
INFO:   0% (2.6 GiB of 300.0 GiB) in 3s, read: 873.3 MiB/s, write: 66.7 MiB/s
INFO:   2% (6.1 GiB of 300.0 GiB) in 7s, read: 910.0 MiB/s, write: 50.0 MiB/s
INFO:   3% (9.1 GiB of 300.0 GiB) in 13s, read: 504.7 MiB/s, write: 38.0 MiB/s
INFO:   4% (12.2 GiB of 300.0 GiB) in 22s, read: 353.8 MiB/s, write: 24.9 MiB/s
INFO:   5% (16.1 GiB of 300.0 GiB) in 42s, read: 202.6 MiB/s, write: 17.6 MiB/s
INFO:   6% (18.0 GiB of 300.0 GiB) in 1m 18s, read: 53.1 MiB/s, write: 52.3 MiB/s
INFO:   7% (21.0 GiB of 300.0 GiB) in 2m 42s, read: 36.7 MiB/s, write: 35.6 MiB/s
INFO:   8% (24.6 GiB of 300.0 GiB) in 4m 4s, read: 44.4 MiB/s, write: 18.5 MiB/s
...truncated...
INFO:  33% (99.0 GiB of 300.0 GiB) in 48m 17s, read: 12.5 MiB/s, write: 12.1 MiB/s
INFO:  34% (102.0 GiB of 300.0 GiB) in 50m 6s, read: 28.6 MiB/s, write: 28.3 MiB/s

Current load on the PBS and PVE

So no resource contesting on the systems as a whole, but on the PVE server you can see one thread is totally maxed out

lucius_the · Mar 10, 2025

A very fast storage subsystem, all flash... I'd be interested in final results.
You could also use recordsize=1M on the recordset being used for PBS chunk store.

PwrBank said:
The TrueNAS and PVE are at totally different sites, using a site-to-site connection to backup and restore from. So it will max out at 1gbps no matter what. I can try some local restores once there is some Proxmox infra on that site, but currently there isn't.

I suppose 1 Gbps link will be limiting factor here, but if you don't get restores up to max speed than software is the limit...
Please make sure to check actual network speed, because the numbers that PBS is giving you are... ehm... calculated, to give you more impressive numbers. So check what you get on the wire. If you get >100 MB/s (120 MB/s actually) than network was the limit, not PBS. And vice versa.

PwrBank · Mar 11, 2025

Info dump time. This is an incomplete backup test, but just wanted to get it out there for now. I will do a separate test on restores.

The 300GB backup over the WAN finished with PBS

Code:

INFO: starting new backup job: vzdump 103 --remove 0 --notes-template '{{guestname}}' --mode snapshot --node der-pve3 --notification-mode auto --storage gar-truenas
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2025-03-10 08:24:15
INFO: status = running
INFO: VM Name: ubuntu
INFO: include disk 'scsi0' 'pure:vm-103-disk-0' 300G
Info :: Waiting (0s) for map volume "vm-103-disk-0"...
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/103/2025-03-10T17:24:15Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '62f959e2-e125-44f3-a36f-58ee0e7928ff'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: created new
INFO: 0% (2.6 GiB of 300.0 GiB) in 3s, read: 873.3 MiB/s, write: 66.7 MiB/s
INFO: 2% (6.1 GiB of 300.0 GiB) in 7s, read: 910.0 MiB/s, write: 50.0 MiB/s
...truncated...
INFO: 91% (273.0 GiB of 300.0 GiB) in 3h 26m 30s, read: 40.8 MiB/s, write: 39.1 MiB/s
INFO: 92% (276.2 GiB of 300.0 GiB) in 3h 28m 27s, read: 28.0 MiB/s, write: 25.9 MiB/s
INFO: 93% (279.1 GiB of 300.0 GiB) in 3h 31m 38s, read: 15.4 MiB/s, write: 14.8 MiB/s
INFO: 94% (282.1 GiB of 300.0 GiB) in 3h 33m 11s, read: 33.0 MiB/s, write: 30.6 MiB/s
INFO: 95% (285.1 GiB of 300.0 GiB) in 3h 35m 30s, read: 21.8 MiB/s, write: 20.8 MiB/s
INFO: 96% (288.1 GiB of 300.0 GiB) in 3h 37m 10s, read: 31.2 MiB/s, write: 28.8 MiB/s
INFO: 97% (291.1 GiB of 300.0 GiB) in 3h 38m 8s, read: 51.9 MiB/s, write: 49.4 MiB/s
INFO: 98% (294.0 GiB of 300.0 GiB) in 3h 39m 20s, read: 41.8 MiB/s, write: 40.1 MiB/s
INFO: 99% (297.0 GiB of 300.0 GiB) in 3h 40m 27s, read: 46.4 MiB/s, write: 42.7 MiB/s
INFO: 100% (300.0 GiB of 300.0 GiB) in 3h 41m 52s, read: 35.7 MiB/s, write: 34.2 MiB/s
INFO: backup is sparse: 15.54 GiB (5%) total zero data
INFO: backup was done incrementally, reused 47.12 GiB (15%)
INFO: transferred 300.00 GiB in 13317 seconds (23.1 MiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 103 (03:41:58)
INFO: Backup finished at 2025-03-10 12:06:13
INFO: Backup job finished successfully
TASK OK

So 3.75 hours

The same VM, with the same connection over the WAN, to the same TrueNAS but using Veeam (The worker had 32 vCPU and 32GB of RAM)

It took longer for Veeam to do the backup.

Now, this is where I started a different test using an onsite TrueNAS system. This system is pretty bonkers, but we wanted an all flash backup repo. This is a PBS container running on the TrueNAS system with no limit on resources.

Specs:
2x AMD EPYC 9354
192GB of DDR5 ECC
5x 61.44TB Solidigm D5-P5536 NVMe in RAIDz1 (Tested read speeds over 30GB/s)

ZFS details on the dataset:
TrueNAS 25.04
Type: FILESYSTEM
Sync: STANDARD
Compression Level: Inherit (LZ4)
Enable Atime: OFF
ZFS Deduplication: YES
Case Sensitivity: ON
Path: ssd/pbs
Record Size: 128KiB

So from PVE to PBS over a 10GbE link

Code:

INFO: starting new backup job: vzdump 103 --storage pbsnvme --notification-mode auto --mode snapshot --node der-pve3 --notes-template '{{guestname}}' --remove 0
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2025-03-11 09:38:50
INFO: status = running
INFO: VM Name: ubuntu
INFO: include disk 'scsi0' 'pure:vm-103-disk-0' 300G
Info :: Volume "vm-103-disk-0" is already connected to host "der-pve3".
wwid '3624a937096622a35a07e4a33000124ac' added
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/103/2025-03-11T18:38:50Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '2833c208-9002-43a2-bc07-74fb081ca1d1'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 0% (1.3 GiB of 300.0 GiB) in 3s, read: 438.7 MiB/s, write: 429.3 MiB/s
INFO: 1% (3.1 GiB of 300.0 GiB) in 7s, read: 477.0 MiB/s, write: 470.0 MiB/s
INFO: 2% (8.3 GiB of 300.0 GiB) in 10s, read: 1.7 GiB/s, write: 369.3 MiB/s
INFO: 3% (9.6 GiB of 300.0 GiB) in 13s, read: 434.7 MiB/s, write: 432.0 MiB/s
INFO: 4% (12.1 GiB of 300.0 GiB) in 18s, read: 524.8 MiB/s, write: 512.8 MiB/s
...truncated...
INFO: 96% (288.4 GiB of 300.0 GiB) in 6m 50s, read: 791.0 MiB/s, write: 764.0 MiB/s
INFO: 97% (291.5 GiB of 300.0 GiB) in 6m 54s, read: 786.0 MiB/s, write: 778.0 MiB/s
INFO: 98% (294.6 GiB of 300.0 GiB) in 6m 58s, read: 791.0 MiB/s, write: 773.0 MiB/s
INFO: 99% (297.6 GiB of 300.0 GiB) in 7m 3s, read: 633.6 MiB/s, write: 620.8 MiB/s
INFO: 100% (300.0 GiB of 300.0 GiB) in 7m 7s, read: 602.0 MiB/s, write: 592.0 MiB/s
INFO: Waiting for server to finish backup validation...
INFO: backup is sparse: 9.11 GiB (3%) total zero data
INFO: backup was done incrementally, reused 9.64 GiB (3%)
INFO: transferred 300.00 GiB in 431 seconds (712.8 MiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 103 (00:07:12)
INFO: Backup finished at 2025-03-11 09:46:02
INFO: Backup job finished successfully
TASK OK

You can see a few cores on the host is getting pegged

Confirmed the speeds on the TrueNAS and host

I'm trying to figure out how to get Veeam to use the 10GbE network, the Proxmox config is a little complicated and I don't think Veeam was anticipating people tagging VLANs on the NICs of the VMs instead of the bridges. So, I'll post results of the backup test from PVE to Veeam to the stupid flash server when I can get it working.

Here is it on the 1GbE though.

lucius_the · Mar 12, 2025

@PwrBank any info about restore speeds ? Backups are fast, we all agree on that

also dedupe is great, etc - it's the restores that are slow. Please try restoring.

PwrBank · Mar 12, 2025

I'm still having a tough time getting Veeam to utilize the 10G networking, so, again, I'll add those when I get them.

But here is the results of the restore over the 10G from PBS

NIC speed between the PBS and PVE

Restoring the 300GB backup from PBS is maxing out around ~200MB/s - Ran the restore 3 times

Verified there was no significant load on the CPUs or RAM on either the PVE or PBS systems.

Code:

Info :: Volume "vm-103-disk-4" is created (serial=96622A35A07E4A33000132C8).
new volume ID is 'pure:vm-103-disk-4'
Info :: Volume "vm-103-disk-4" is connected to host "der-pve3".
wwid '3624a937096622a35a07e4a33000132c8' added

Info :: Waiting (0s) for map volume "vm-103-disk-4"...
restore proxmox backup image: /usr/bin/pbs-restore --repository root@pam@10.10.254.86:nvme vm/103/2025-03-11T18:38:50Z drive-scsi0.img.fidx /dev/disk/by-id/wwn-0x624a937096622a35a07e4a33000132c8 --verbose --format raw --skip-zero
connecting to repository 'root@pam@10.10.254.86:nvme'
open block backend for target '/dev/disk/by-id/wwn-0x624a937096622a35a07e4a33000132c8'
starting to restore snapshot 'vm/103/2025-03-11T18:38:50Z'
download and verify backup index
progress 1% (read 3221225472 bytes, zeroes = 1% (58720256 bytes), duration 19 sec)
progress 2% (read 6442450944 bytes, zeroes = 44% (2860515328 bytes), duration 22 sec)
progress 3% (read 9663676416 bytes, zeroes = 45% (4408213504 bytes), duration 35 sec)
...truncated...
progress 98% (read 315680096256 bytes, zeroes = 3% (9638510592 bytes), duration 2110 sec)
progress 99% (read 318901321728 bytes, zeroes = 3% (9739173888 bytes), duration 2129 sec)
progress 100% (read 322122547200 bytes, zeroes = 3% (9776922624 bytes), duration 2148 sec)
restore image complete (bytes=322122547200, duration=2148.84s, speed=142.96MB/s)
Debug :: Session token expired
Info :: Device path resolved to "/dev/dm-0".
Info :: Device "/dev/dm-0" is a multipath device. Proceeding with multipath removal.
wwid '3624a937096622a35a07e4a33000132c8' removed
Info :: Device "dm-0" removed from system.
Info :: Volume "vm-103-disk-4" is disconnected from host "der-pve3".
Info :: Volume "vm-103-disk-4" is deactivated.
rescan volumes...
TASK OK

So roughly 5x longer to restore from PBS than it is to backup to PBS.

PwrBank · Mar 14, 2025

Okay, was on the phone with Veeam for about 2 hours diagnosing some performance issues, it seems to not want to use the preferred network for backups. So for now, all Proxmox nodes are being added to Veeam via our 10/25GbE network, so no bottle necks should be there. There's a concern that the way I have the storage setup may be a bottleneck for the Worker. I'll report back with different results if I get better than this. It may be a limitation of how Proxmox is backing up and restoring via API calls vs through some other method.

I installed a 1.92TB SATA3 SSD from Samsung so I could eliminate if the Pure Plugin/iSCSI was any issue. So re-tested everything, this was the final results

So backing up was 3x faster than restoring in PBS.
Veeam was almost the same speed during the restore as backup, but I think I need to get some faster storage on the system to compare that. The NVMe SSDs on the system just aren't big enough to hold a VM this size for testing.

So raw results...

It took about ~600 seconds to backup from PVE to PBS
It took about 1036 seconds to backup from PVE to Veeam

It took about 1789 seconds to restore from PBS to PVE
It took about ~1440 seconds to restore from Veeam to PVE

Veeam is about 20% faster during a restore than PBS in this scenario.

I think the backup speeds would have been faster to PBS had I not been using a SATA SSD, but you can see in the previous results that it was achieving 700MB/s, which would make that ratio of backup to restore speeds much worse. If you account for that, it makes the ratio 4.2x - So it takes 4.2x longer to restore a VM than to back it up from PBS.

Any other data y'all want?

MarkusKo · Mar 19, 2025

veeam released a new update, including backup with non root user and performance improvements
would be nice if you could test this and post some fresh numbers

https://www.veeam.com/kb4721

PwrBank · Mar 19, 2025

MarkusKo said:
veeam released a new update, including backup with non root user and performance improvements
would be nice if you could test this and post some fresh numbers

https://www.veeam.com/kb4721

Can do, I saw that as well.

With some more tuning I was able to get the Veeam to match the speed of PBS in back up times, but not sure how much faster due to the limit of the SSD I am using. However, the delta backups are much much much faster on PBS. 300GB VM with about 10-11GB of differences takes less than 8 seconds. Veeam it takes about 1 minute for the worker to boot, then about a minute for the backup to do it's thing, then shutdown. Sometimes it takes a few minutes. I

It's all still a lot faster than the ESXi environment, so that's cool.

MarkusKo · Mar 19, 2025

would be nice if they could get rid of the worker vm in the future. also waiting for their backup appliance in version 13, no windows server anymore

PwrBank · Mar 19, 2025

I'm actually not entirely sure why they need the worker anyway. I was on a call with a Veeam technician and he described how Proxmox backup works in Veeam, and from what it sounds like, it's just using API calls to pull the data into the worker and then spit it out into the repo - Which seems like you something you could easily just have the host do anyway, but yeah.

MarkusKo · Mar 19, 2025

i guess its a quick and dirty implementation and they have not worked through the whole proxmox api yet

PwrBank · Mar 20, 2025

I'm still doing some tests and will post results as soon as I have a good data set.

Something to note as well, it seems the verification process is single threaded as well. When running a verification on 13 backups it seems to only ever use 1 core, it will bounce around to different cores, but only one core.

I've also confirmed that PBS can read from the storage at 25GB/s, or at least the OS underneath can. While writes are 8GB/s, which is still acceptable. So, definitely no bottleneck on the storage, processor, or RAM.

PwrBank · Mar 21, 2025

More results.

30 incremental backups on both PBS and Veeam.

Time to restore was ~23 minutes for Veeam, without including the 1 minute to boot the worker and for it to do whatever it does.

Time to restore was ~43 minutes for PBS, and while doing so, maxed out 8 cores, and only 8 cores, on the host. It wasn't always the same 8 cores, but it seems whatever it's doing during the restore is coded to use 8 threads or something.

This looks like it was using the 1GbE interface, but I can assure you, this was over the 25GbE interface.

lucius_the · Mar 22, 2025

Thank you for reporting back. These are MUCH better results than what I have, with spinning rust (you're using all flash I think, we're can't afford all-flash for backups). My results, Veeam vs PBS tested on same hardware using HDDs, are about 10x slower restores with PBS vs Veeam. Thats' a huge difference in performance (on same hardware).

There are still possible improvements that could be done in PBS software, regarding restore (and verify) speeds for other, not such strong, setups. I mean: most of my production servers are still not all-flash, let alone backup servers. Enterprise flash, although price per TB is going down, is still quite pricey. When you need 10 TB and want some storage redundancy (think RAID10 or RAIDZ2, depending on width) it still comes quite pricey, compared to HDDs. So I mostly use smaller SSD pools for DB workloads and HDD pools with a smaller SSD special vdev mirror (and/or SLOG vdev) for pretty much everything else. Backup servers alike (those pools without SLOG vdev).

guruevi · Mar 22, 2025

Question is what are you trading in with Veeam in regards performance and quality of the backups. I have a hard time keeping my PBS busy, but it can backup and restore 21 PVE servers, during the CrowdStrike problem, we restored dozens of Windows machines simultaneously using live-restore. It can also do very fast incremental backups and individual file restores because it doesn’t need to access an entire disk image and it has significant de-duplication which is important both on spinning disk and flash - spinning disk in server has become more expensive in data center because flash is faster and requires less space and energy for the same IOPS.

To me 23 minutes to restore is slow, PBS live restore is measured in seconds to restore, but most of the time the transfer to Ceph is completed in a matter of 10-15 minutes.

Parametrizing your infrastructure is also important when you need to know about performance things. It’s slow is not a problem statement when you can’t tell me where your bottlenecks are.

update: benchmarking my setup - backing up is 6-7Gbps on the wire (this is between data-centers ~100km apart with 4x10Gbps in bond), writing to storage fluctuates between a few Mbps to ~2Gbps. I think that has mostly to do with the fact it is compressing and deduplicating (it uses 3-6 cores for this). I guess Veeam compresses and deduplicates on the client side, which would have a signficant performance impact on the hypervisor side.
The same happens in return, it starts pretty high (I am booting live with is barely noticeable as if it were local) then slows down "write" toabout 1-2Gbps to storage but a lot of the disk is 'empty blocks' which is being thin provisioned, but I get the expected wirespeed for a single connection.

Abysmally slow restore from backup

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Active Member

Member

Active Member

Member

Active Member

Member

Member

Member

Well-Known Member

We value your privacy