[SOLVED] Slow Backup-Speed - find the Bottleneck

floh

Active Member
Jul 19, 2018
62
5
28
Hello Proxmox-Comunity!

We are testing the PBS for (maybe) our next backup solution of PVE. Currently PBS is virtualized on the PVE-Cluster. The Datastore is an external Cifs Share (for now).
The whole Cluster is connected with 20GBit to the CIFS-Share and between the nodes.

When benchmarking the PBS with the cifs-share as the repository we are getting the following results:

Code:
Uploaded 334 chunks in 5 seconds.
Time per request: 15118 microseconds.
TLS speed: 277.43 MB/s
SHA256 speed: 314.79 MB/s
Compression speed: 578.94 MB/s
Decompress speed: 897.54 MB/s
AES256/GCM speed: 413.03 MB/s
Verify speed: 228.35 MB/s
┌───────────────────────────────────┬───────────────────┐
│ Name                              │ Value             │
╞═══════════════════════════════════╪═══════════════════╡
│ TLS (maximal backup upload speed) │ 277.43 MB/s (22%) │
├───────────────────────────────────┼───────────────────┤
│ SHA256 checksum computation speed │ 314.79 MB/s (16%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 compression speed    │ 578.94 MB/s (77%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 decompression speed  │ 897.54 MB/s (75%) │
├───────────────────────────────────┼───────────────────┤
│ Chunk verification speed          │ 228.35 MB/s (30%) │
├───────────────────────────────────┼───────────────────┤
│ AES256 GCM encryption speed       │ 413.03 MB/s (11%) │
└───────────────────────────────────┴───────────────────┘
Note that this benchmark was made while there were running backup-jobs (f.ex. like the one a bit lower posted in this post)
So it should be theoretically possible to backup with up to ~270 MB/s - maybe 220 MB/s.


The fact is - sometimes we are having a performance which is acceptable at about ~50 MB/s in average.
But most of the time the performance is like this:
Code:
INFO: starting new backup job: vzdump 224 --storage PBS --node [nodename] --mode snapshot --remove 0
INFO: Starting Backup of VM 224 (qemu)
INFO: Backup started at 2020-11-06 08:13:42
INFO: status = running
INFO: VM Name: [vm-name]
INFO: include disk 'virtio0' 'SSD-Storage2:vm-224-disk-1' 128G
INFO: include disk 'virtio1' 'SSD-Storage2:vm-224-disk-0' 1T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/224/2020-11-06T07:13:42Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'f98aa810-cc19-4946-bb45-bdb452af1282'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: created new
INFO: virtio1: dirty-bitmap status: created new
INFO: 0% (404.0 MiB of 1.1 TiB) in 3s, read: 134.7 MiB/s, write: 97.3 MiB/s
INFO: 1% (11.5 GiB of 1.1 TiB) in 21m 14s, read: 9.0 MiB/s, write: 9.0 MiB/s
INFO: 2% (23.0 GiB of 1.1 TiB) in 42m 30s, read: 9.2 MiB/s, write: 6.6 MiB/s
INFO: 3% (34.6 GiB of 1.1 TiB) in 1h 18m 56s, read: 5.4 MiB/s, write: 5.4 MiB/s
INFO: 4% (46.1 GiB of 1.1 TiB) in 1h 40m 5s, read: 9.3 MiB/s, write: 8.0 MiB/s
INFO: 5% (57.6 GiB of 1.1 TiB) in 2h 21m 7s, read: 4.8 MiB/s, write: 4.8 MiB/s
INFO: 6% (69.1 GiB of 1.1 TiB) in 2h 49m 57s, read: 6.8 MiB/s, write: 6.8 MiB/s
INFO: 7% (80.6 GiB of 1.1 TiB) in 3h 22m 54s, read: 6.0 MiB/s, write: 5.9 MiB/s
INFO: 8% (92.3 GiB of 1.1 TiB) in 3h 54m 28s, read: 6.3 MiB/s, write: 5.6 MiB/s
INFO:   9% (103.7 GiB of 1.1 TiB) in  4h 12m 57s, read: 10.5 MiB/s, write: 4.8 MiB/s

The source (where the vm-disk is on) is an performant SSD-Storage (about 1GB/s read and write).
The PBS's hardware is:
* memory: 64GB
* CPU 31 Cores
* disk: 64GB - ssd-performance


Is there a way to figure out why the backup read and write speeds are sometimes that low?
Is there a way to figure out where the bottleneck is (PVE-->PBS-->CIFS-Share)?


best regards,
Flo
 
Last edited:
One thing you should check (probably not the cause of the issue but wouldn't hurt) is if you have the AES flag set for the PBS VM. In Hardware -> CPU -> Advanced you should be able to enable hardware acceleration for AES. This will improve your score on the encryption benchmark.
Since PBS is working with a lot of small files (~4MiB) IOPS is a important factor. Could you try to benchmark the mounted CIFS share?
 
Thanks @Cookiefamily for that advice - AES-flag is set. Without the AES-flag the encryption speed was around ~120MB/s.

I'll start an benchmark of the mounted CIFS-Share and will provided that bit of information as fast as the benchmark is done.
 
I'm not quite sure if I've done the benchmark the right way - if not please feel free to correct me.

I've done one test with fio with the filesize of 10M and another one with 4M.


Code:
root@srv-v-backup:/mnt/[datastore]/benchmark_test# fio --rw=write --name=test --size=10M --filename=file1
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 10MiB)

test: (groupid=0, jobs=1): err= 0: pid=16126: Fri Nov  6 13:24:43 2020
  write: IOPS=51.2k, BW=200MiB/s (210MB/s)(10.0MiB/50msec); 0 zone resets
    clat (nsec): min=2780, max=23351, avg=3426.84, stdev=1169.46
     lat (nsec): min=2851, max=23422, avg=3509.63, stdev=1189.25
    clat percentiles (nsec):
     |  1.00th=[ 2832],  5.00th=[ 2896], 10.00th=[ 2960], 20.00th=[ 3056],
     | 30.00th=[ 3120], 40.00th=[ 3152], 50.00th=[ 3216], 60.00th=[ 3280],
     | 70.00th=[ 3344], 80.00th=[ 3440], 90.00th=[ 3952], 95.00th=[ 4448],
     | 99.00th=[ 6752], 99.50th=[13632], 99.90th=[19584], 99.95th=[20608],
     | 99.99th=[23424]
  lat (usec)   : 4=90.04%, 10=9.30%, 20=0.59%, 50=0.08%
  cpu          : usr=6.12%, sys=6.12%, ctx=2, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=200MiB/s (210MB/s), 200MiB/s-200MiB/s (210MB/s-210MB/s), io=10.0MiB (10.5MB), run=50-50msec
root@srv-v-backup:/mnt/[datastore]/benchmark_test# fio --rw=write --name=test --size=4M --filename=file2
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 4MiB)

test: (groupid=0, jobs=1): err= 0: pid=16129: Fri Nov  6 13:25:02 2020
  write: IOPS=26.3k, BW=103MiB/s (108MB/s)(4096KiB/39msec); 0 zone resets
    clat (nsec): min=2836, max=97021, avg=3645.00, stdev=3371.48
     lat (nsec): min=2926, max=97102, avg=3727.65, stdev=3385.34
    clat percentiles (nsec):
     |  1.00th=[ 2864],  5.00th=[ 2928], 10.00th=[ 2960], 20.00th=[ 3024],
     | 30.00th=[ 3088], 40.00th=[ 3152], 50.00th=[ 3184], 60.00th=[ 3280],
     | 70.00th=[ 3376], 80.00th=[ 3504], 90.00th=[ 4256], 95.00th=[ 4832],
     | 99.00th=[11584], 99.50th=[18048], 99.90th=[26240], 99.95th=[96768],
     | 99.99th=[96768]
  lat (usec)   : 4=87.99%, 10=10.74%, 20=0.78%, 50=0.39%, 100=0.10%
  cpu          : usr=2.63%, sys=0.00%, ctx=2, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=103MiB/s (108MB/s), 103MiB/s-103MiB/s (108MB/s-108MB/s), io=4096KiB (4194kB), run=39-39msec

best regards
 
Last edited:
I've done one test with fio with the filesize of 10M and another one with 4M.
In my experiments I got impossible high results with small testfiles - the tests basically reflect the performance of the RAM. So the test-file size should be larger than available ram. You have 64 GiB Ram, so try to run fio with 128G. It might take a long time but the results are much more confident...

Just my 2€¢
 
In my experiments I got impossible high results with small testfiles - the tests basically reflect the performance of the RAM. So the test-file size should be larger than available ram. You have 64 GiB Ram, so try to run fio with 128G. It might take a long time but the results are much more confident...

Just my 2€¢
Exactly, Linux will have cached that file, that’s why you get high speeds :) you need to try it with more files to spill past your avaible Ram.
 
  • Like
Reactions: floh
Thanks cookiefamily and UdoB for your tips!

That was the problem - I started the fio-benchmark again with the following paramters:

fio --ioengine=libaio --direct=1 --name=test --filename=test --bs=4m --iodepth=64 --size=128G --readwrite=write

In that case the performance is:
Bandwidth between 40 and 50 MB/s
IOPS between 8 and 12/s

So the bottleneck is most likely the storage itself and I'm currently testing another storage server.

For me it's solved - if someone want to provide a "better" benchmark-method please feel free to document it here.
 
  • Like
Reactions: Cookiefamily