[SOLVED] Slow Backup-Speed - find the Bottleneck

floh

Active Member
Jul 19, 2018
62
5
28
Hello Proxmox-Comunity!

We are testing the PBS for (maybe) our next backup solution of PVE. Currently PBS is virtualized on the PVE-Cluster. The Datastore is an external Cifs Share (for now).
The whole Cluster is connected with 20GBit to the CIFS-Share and between the nodes.

When benchmarking the PBS with the cifs-share as the repository we are getting the following results:

Code:
Uploaded 334 chunks in 5 seconds.
Time per request: 15118 microseconds.
TLS speed: 277.43 MB/s
SHA256 speed: 314.79 MB/s
Compression speed: 578.94 MB/s
Decompress speed: 897.54 MB/s
AES256/GCM speed: 413.03 MB/s
Verify speed: 228.35 MB/s
┌───────────────────────────────────┬───────────────────┐
│ Name                              │ Value             │
╞═══════════════════════════════════╪═══════════════════╡
│ TLS (maximal backup upload speed) │ 277.43 MB/s (22%) │
├───────────────────────────────────┼───────────────────┤
│ SHA256 checksum computation speed │ 314.79 MB/s (16%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 compression speed    │ 578.94 MB/s (77%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 decompression speed  │ 897.54 MB/s (75%) │
├───────────────────────────────────┼───────────────────┤
│ Chunk verification speed          │ 228.35 MB/s (30%) │
├───────────────────────────────────┼───────────────────┤
│ AES256 GCM encryption speed       │ 413.03 MB/s (11%) │
└───────────────────────────────────┴───────────────────┘
Note that this benchmark was made while there were running backup-jobs (f.ex. like the one a bit lower posted in this post)
So it should be theoretically possible to backup with up to ~270 MB/s - maybe 220 MB/s.


The fact is - sometimes we are having a performance which is acceptable at about ~50 MB/s in average.
But most of the time the performance is like this:
Code:
INFO: starting new backup job: vzdump 224 --storage PBS --node [nodename] --mode snapshot --remove 0
INFO: Starting Backup of VM 224 (qemu)
INFO: Backup started at 2020-11-06 08:13:42
INFO: status = running
INFO: VM Name: [vm-name]
INFO: include disk 'virtio0' 'SSD-Storage2:vm-224-disk-1' 128G
INFO: include disk 'virtio1' 'SSD-Storage2:vm-224-disk-0' 1T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/224/2020-11-06T07:13:42Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'f98aa810-cc19-4946-bb45-bdb452af1282'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: created new
INFO: virtio1: dirty-bitmap status: created new
INFO: 0% (404.0 MiB of 1.1 TiB) in 3s, read: 134.7 MiB/s, write: 97.3 MiB/s
INFO: 1% (11.5 GiB of 1.1 TiB) in 21m 14s, read: 9.0 MiB/s, write: 9.0 MiB/s
INFO: 2% (23.0 GiB of 1.1 TiB) in 42m 30s, read: 9.2 MiB/s, write: 6.6 MiB/s
INFO: 3% (34.6 GiB of 1.1 TiB) in 1h 18m 56s, read: 5.4 MiB/s, write: 5.4 MiB/s
INFO: 4% (46.1 GiB of 1.1 TiB) in 1h 40m 5s, read: 9.3 MiB/s, write: 8.0 MiB/s
INFO: 5% (57.6 GiB of 1.1 TiB) in 2h 21m 7s, read: 4.8 MiB/s, write: 4.8 MiB/s
INFO: 6% (69.1 GiB of 1.1 TiB) in 2h 49m 57s, read: 6.8 MiB/s, write: 6.8 MiB/s
INFO: 7% (80.6 GiB of 1.1 TiB) in 3h 22m 54s, read: 6.0 MiB/s, write: 5.9 MiB/s
INFO: 8% (92.3 GiB of 1.1 TiB) in 3h 54m 28s, read: 6.3 MiB/s, write: 5.6 MiB/s
INFO:   9% (103.7 GiB of 1.1 TiB) in  4h 12m 57s, read: 10.5 MiB/s, write: 4.8 MiB/s

The source (where the vm-disk is on) is an performant SSD-Storage (about 1GB/s read and write).
The PBS's hardware is:
* memory: 64GB
* CPU 31 Cores
* disk: 64GB - ssd-performance


Is there a way to figure out why the backup read and write speeds are sometimes that low?
Is there a way to figure out where the bottleneck is (PVE-->PBS-->CIFS-Share)?


best regards,
Flo
 
Last edited:
One thing you should check (probably not the cause of the issue but wouldn't hurt) is if you have the AES flag set for the PBS VM. In Hardware -> CPU -> Advanced you should be able to enable hardware acceleration for AES. This will improve your score on the encryption benchmark.
Since PBS is working with a lot of small files (~4MiB) IOPS is a important factor. Could you try to benchmark the mounted CIFS share?
 
Thanks @Cookiefamily for that advice - AES-flag is set. Without the AES-flag the encryption speed was around ~120MB/s.

I'll start an benchmark of the mounted CIFS-Share and will provided that bit of information as fast as the benchmark is done.
 
I'm not quite sure if I've done the benchmark the right way - if not please feel free to correct me.

I've done one test with fio with the filesize of 10M and another one with 4M.


Code:
root@srv-v-backup:/mnt/[datastore]/benchmark_test# fio --rw=write --name=test --size=10M --filename=file1
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 10MiB)

test: (groupid=0, jobs=1): err= 0: pid=16126: Fri Nov  6 13:24:43 2020
  write: IOPS=51.2k, BW=200MiB/s (210MB/s)(10.0MiB/50msec); 0 zone resets
    clat (nsec): min=2780, max=23351, avg=3426.84, stdev=1169.46
     lat (nsec): min=2851, max=23422, avg=3509.63, stdev=1189.25
    clat percentiles (nsec):
     |  1.00th=[ 2832],  5.00th=[ 2896], 10.00th=[ 2960], 20.00th=[ 3056],
     | 30.00th=[ 3120], 40.00th=[ 3152], 50.00th=[ 3216], 60.00th=[ 3280],
     | 70.00th=[ 3344], 80.00th=[ 3440], 90.00th=[ 3952], 95.00th=[ 4448],
     | 99.00th=[ 6752], 99.50th=[13632], 99.90th=[19584], 99.95th=[20608],
     | 99.99th=[23424]
  lat (usec)   : 4=90.04%, 10=9.30%, 20=0.59%, 50=0.08%
  cpu          : usr=6.12%, sys=6.12%, ctx=2, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=200MiB/s (210MB/s), 200MiB/s-200MiB/s (210MB/s-210MB/s), io=10.0MiB (10.5MB), run=50-50msec
root@srv-v-backup:/mnt/[datastore]/benchmark_test# fio --rw=write --name=test --size=4M --filename=file2
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 4MiB)

test: (groupid=0, jobs=1): err= 0: pid=16129: Fri Nov  6 13:25:02 2020
  write: IOPS=26.3k, BW=103MiB/s (108MB/s)(4096KiB/39msec); 0 zone resets
    clat (nsec): min=2836, max=97021, avg=3645.00, stdev=3371.48
     lat (nsec): min=2926, max=97102, avg=3727.65, stdev=3385.34
    clat percentiles (nsec):
     |  1.00th=[ 2864],  5.00th=[ 2928], 10.00th=[ 2960], 20.00th=[ 3024],
     | 30.00th=[ 3088], 40.00th=[ 3152], 50.00th=[ 3184], 60.00th=[ 3280],
     | 70.00th=[ 3376], 80.00th=[ 3504], 90.00th=[ 4256], 95.00th=[ 4832],
     | 99.00th=[11584], 99.50th=[18048], 99.90th=[26240], 99.95th=[96768],
     | 99.99th=[96768]
  lat (usec)   : 4=87.99%, 10=10.74%, 20=0.78%, 50=0.39%, 100=0.10%
  cpu          : usr=2.63%, sys=0.00%, ctx=2, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=103MiB/s (108MB/s), 103MiB/s-103MiB/s (108MB/s-108MB/s), io=4096KiB (4194kB), run=39-39msec

best regards
 
Last edited:
I've done one test with fio with the filesize of 10M and another one with 4M.
In my experiments I got impossible high results with small testfiles - the tests basically reflect the performance of the RAM. So the test-file size should be larger than available ram. You have 64 GiB Ram, so try to run fio with 128G. It might take a long time but the results are much more confident...

Just my 2€¢
 
In my experiments I got impossible high results with small testfiles - the tests basically reflect the performance of the RAM. So the test-file size should be larger than available ram. You have 64 GiB Ram, so try to run fio with 128G. It might take a long time but the results are much more confident...

Just my 2€¢
Exactly, Linux will have cached that file, that’s why you get high speeds :) you need to try it with more files to spill past your avaible Ram.
 
  • Like
Reactions: floh
Thanks cookiefamily and UdoB for your tips!

That was the problem - I started the fio-benchmark again with the following paramters:

fio --ioengine=libaio --direct=1 --name=test --filename=test --bs=4m --iodepth=64 --size=128G --readwrite=write

In that case the performance is:
Bandwidth between 40 and 50 MB/s
IOPS between 8 and 12/s

So the bottleneck is most likely the storage itself and I'm currently testing another storage server.

For me it's solved - if someone want to provide a "better" benchmark-method please feel free to document it here.
 
  • Like
Reactions: Cookiefamily

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!