Slow restore performance

JustThat · Jun 3, 2024

Currently have proxmox 8.2 and PBS 3.2, backup of a 1.65TiB takes around 1 hour (which is pretty fast).

But restore took us 14 hours....which is very long.

You can find attached image of a PBS benchmark:

Machine specs:

i9-10940X CPU @ 3.30GHz CPU @ 3.3GHz
128G DDR4 3000MT/s
4TiB P2 NVMe PCIe SSD
Upload speed is ~60MB/s

Below you can find fio random read benchmark:

Code:

sudo fio --filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
...
iops-test-job: (groupid=0, jobs=4): err= 0: pid=9063: Mon Jun  3 10:59:15 2024
  read: IOPS=214k, BW=835MiB/s (875MB/s)(97.8GiB/120008msec)
    slat (nsec): min=1038, max=266104, avg=1849.36, stdev=703.19
    clat (usec): min=783, max=14359, avg=4788.89, stdev=1264.40
     lat (usec): min=785, max=14360, avg=4790.74, stdev=1264.49
    clat percentiles (usec):
     |  1.00th=[ 2507],  5.00th=[ 3228], 10.00th=[ 3425], 20.00th=[ 3687],
     | 30.00th=[ 4047], 40.00th=[ 4490], 50.00th=[ 4621], 60.00th=[ 4817],
     | 70.00th=[ 5014], 80.00th=[ 5342], 90.00th=[ 7046], 95.00th=[ 7439],
     | 99.00th=[ 7963], 99.50th=[ 8160], 99.90th=[ 8979], 99.95th=[ 9372],
     | 99.99th=[10290]
   bw (  KiB/s): min=566744, max=1359152, per=100.00%, avg=855669.28, stdev=34033.89, samples=956
   iops        : min=141686, max=339788, avg=213917.33, stdev=8508.48, samples=956
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.17%, 4=28.91%, 10=70.91%, 20=0.02%
  cpu          : usr=6.05%, sys=14.03%, ctx=15253024, majf=0, minf=1065
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=25645666,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256


Run status group 0 (all jobs):
   READ: bw=835MiB/s (875MB/s), 835MiB/s-835MiB/s (875MB/s-875MB/s), io=97.8GiB (105GB), run=120008-120008msec


Disk stats (read/write):
  nvme0n1: ios=25611281/202, merge=0/118, ticks=122611533/491, in_queue=122612180, util=99.97%

Remote proxmox has specs like in the picture:

When running iotop, upload speed is around 6MB/s, disk read tops at 28MB/s at best, what's the issue here ?

fabian · Jun 3, 2024

what kind of backup (VM, CT)? what kind of target store? over the network, or are PBS and PVE co-located?

a basic tool like "atop" might already give you a clue where the bottle neck is..

JustThat · Jun 3, 2024

It's backup up a VM. As for target store on proxmox, it's ZFS (NVME in RAID1), and for PBS, it's 1 NVME (no raid). It's over the network, both machines are in different locations.

I'm not very familiar with ATOP, but here is a screenshot:

I use instead BTOP:

Both are run on the PBS. Aside from network upload capping at 6MB/s, I don't see anything else, journalctl shows that it's constantly retrieving chunks on GET /chunk endpoint

fabian · Jun 3, 2024

and on the PVE side? can you try to run a proxmox-backup-client benchmark with --repository set so that the communication happens between them, and not local on the PBS (i.e., execute it on PVE, point it at PBS)?

JustThat · Jun 3, 2024

From PVE to PBS, benchmark is as follows:

ATOP on PVE:

And BTOP on PVE:

iotop shows write speed of pbs-restore of around 30MB/s

bomzh · Jun 3, 2024

I think the network latency is the bottleneck here. Some time ago we gave up on running PBS when PVE and PBS are located in different datacenter locations - east and west of EU, even thought we had stable 1Gbit/s connectivity between the servers - PBS uses lots of small chunks and even several milliseconds of network latency add up a lot.

fabian · Jun 4, 2024

yes, given the benchmark result I'd also say that is the case here. HTTP 2 really suffers when the latency goes up..

JustThat · Jun 4, 2024

I confirm that putting PVE and PBS on lan is way faster.

Just a quick question regarding deduplication. When I start a restore of the VM 1.65TiB, does it send the whole 1.65TiB over the network or only the deduplicated data (which in my case is around 500Gb) ?

jlauro · Jun 4, 2024

JustThat said:
I confirm that putting PVE and PBS on lan is way faster.

Just a quick question regarding deduplication. When I start a restore of the VM 1.65TiB, does it send the whole 1.65TiB over the network or only the deduplicated data (which in my case is around 500Gb) ?

My guess is that the dedupe is on the PBS side and so it goes over the network multiple times. PVE doesn't do deduplication natively so I assume that's how it's implemented.

What you probably want to do is have PBS at both location and a sync job between them. That should take advantage of the dedupe and the restore wouldn't be over the WAN.

_--James--_ · Jun 4, 2024

PBS really needs to be local to PVE, but PBS's network targets for backup storage can be across the WAN with the likes of NFS(this introduces other issues, but the protocol is much more forgiving). As it was already mentioned, latency is why. Your throughput seems to be there, but having network latency above a tolerance level (unknown for PBS, but I would put my finger on 10ms) will create the performance condition you are experiencing.

You can also have PBS fully built local for backups(DAS/NAS local to PBS) and replicate the backups off site, this is what we do and it has worked quite well.

fabian · Jun 5, 2024

JustThat said:
I confirm that putting PVE and PBS on lan is way faster.

Just a quick question regarding deduplication. When I start a restore of the VM 1.65TiB, does it send the whole 1.65TiB over the network or only the deduplicated data (which in my case is around 500Gb) ?

it depends. if most of that deduplicating is empty/zero chunks, those are special cased in a lot of places. for other chunks, we do keep a chunk and handle the most used chunks as well, so unless you don't have every chunk exactly three times in your case, most of the deduplicated data should only be transferred once over the network.

also keep in mind that the chunks are compressed, which of course also reduces the amount of actual data transferred

JustThat · Jun 5, 2024

Ok thank you for the clarifications. I still find it strange to be required to have a lan latency for PBS to be effective. It makes sense to me to have PBS in a different physical location than PVE in case of major incident. Are the plans to improve the protocol used for transfer (http2) ?

Also thought having to make an http request for every chunk of 2Mb and having large data to restore has big overhead overall , am I wrong on this ?

fabian · Jun 5, 2024

we will probably evaluate QUIC at some point, but other than that, no low-hanging fruit at the moment to improve the high latency experience there I am afraid.

JustThat · Jun 5, 2024

Ok thank you for the help/clarifications. Looking forward to improvements on this

VictorSTS · Jun 5, 2024

JustThat said:
Ok thank you for the help/clarifications. Looking forward to improvements on this

Which is your average latency between PBS and PVE?

JustThat · Jun 5, 2024

Around 30ish ms (rather stable)

_--James--_ · Jun 5, 2024

fabian said:
we will probably evaluate QUIC at some point, but other than that, no low-hanging fruit at the moment to improve the high latency experience there I am afraid.

IMHO its in the deployment method. One of the biggest things Veeam did for this was introduced backup proxy servers, where the backup job's heavy lifting gets offloaded from the backup system to a proxy server, then the database links the backup normally when complete. Maybe that is something that can be looked into for PBS for those that want to have one PBS per Cluster in stretched cluster? Or redesign the deployment guide based on what is supported (PBX local to target PVE's, PBS to PBS replication for offsites...etc)? I dont think Quic is the right answer due to network controls that can/will break it (For example, Quic is blocked where we deploy).

fabian · Jun 6, 2024

_--James--_ said:
IMHO its in the deployment method. One of the biggest things Veeam did for this was introduced backup proxy servers, where the backup job's heavy lifting gets offloaded from the backup system to a proxy server, then the database links the backup normally when complete. Maybe that is something that can be looked into for PBS for those that want to have one PBS per Cluster in stretched cluster? Or redesign the deployment guide based on what is supported (PBX local to target PVE's, PBS to PBS replication for offsites...etc)? I dont think Quic is the right answer due to network controls that can/will break it (For example, Quic is blocked where we deploy).

you can already set this up - backup to a local instance that has aggressive pruning, sync to remote instance that keeps long-term archives. doesn't solve the issue that the transfer between local and remote over http 2.0 will be slower than expected over higher latency links

jlauro · Jun 6, 2024

fabian said:
you can already set this up - backup to a local instance that has aggressive pruning, sync to remote instance that keeps long-term archives. doesn't solve the issue that the transfer between local and remote over http 2.0 will be slower than expected over higher latency links

What happens if the link is down and so the aggressive pruning kicks in prior to transfer?

Although it might not solve it completely, I suspect the dedupe would work better PBS to PBS compared to a backup/restore job... although it would require extra storage and ongoing traffic for the sync job instead of only on demand, but good chance the backup job will already be at the remote site before you know you need it, which could solve the local and remote over http 2.0 issue...

fabian · Jun 6, 2024

jlauro said:
What happens if the link is down and so the aggressive pruning kicks in prior to transfer?

then you'll prune more than you want, but you can delegate the pruning to the PBS doing the sync in this case to avoid that.

jlauro said:
Although it might not solve it completely, I suspect the dedupe would work better PBS to PBS compared to a backup/restore job... although it would require extra storage and ongoing traffic for the sync job instead of only on demand, but good chance the backup job will already be at the remote site before you know you need it, which could solve the local and remote over http 2.0 issue...

I am not sure you know how PBS works under the hood, but I can't really parse what you wrote above

Slow restore performance

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Member

Proxmox Staff Member

New Member

Member

Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Famous Member

New Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member