slow speed on NFS backups

Rico29

Active Member
Jan 10, 2018
28
0
41
38
Hello
I recently add a NFS server to my proxmox architecture, for NFS backups.
proxmox v6.1-8
Proxmox is using ceph storage for VMs (full 2*10G network, mtru=9000, full high-perf ssd).
NFS storage is attached via 10Gb link too, and mtu is set to 9000.


Proxmox is 172.18.2.191, NFS storage is 172.18.2.199.
Code:
proxmox# ping -M do 172.18.2.199 -s $((9000-28)) -c1
PING 172.18.2.199 (172.18.2.199) 8972(9000) bytes of data.
8980 bytes from 172.18.2.199: icmp_seq=1 ttl=64 time=0.234 ms

--- 172.18.2.199 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.234/0.234/0.234/0.000 ms
NFS share is mounted (via proxmox web GUI) like this :
Code:
# mount | grep -i nfs
172.18.2.199:/mnt/iscsi-c7000-pa2-pxmx-BACKUPS on /mnt/pve/NFS_backups type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.18.2.191,local_lock=none,addr=172.18.2.199)


Backups are slow :
Code:
INFO: starting new backup job: vzdump 162 117 --node c7000-pa2-pxmx1 --mailnotification always --storage NFS_backups --all 0 --compress lzo --mailto exploit@dom.tld --mode snapshot --quiet 1
INFO: Starting Backup of VM 117 (qemu)
INFO: Backup started at 2020-07-13 09:55:58
INFO: status = running
INFO: update VM 117: -lock backup
INFO: VM Name: seniorMedia
INFO: include disk 'virtio0' 'ceph_storage:vm-117-disk-0' 50G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/NFS_backups/dump/vzdump-qemu-117-2020_07_13-09_55_58.vma.lzo'
INFO: started backup task '7592b581-2809-4215-9aa9-eada14ab06de'
INFO: status: 0% (364904448/53687091200), sparse 0% (38543360), duration 3, read/write 121/108 MB/s

...

INFO: status: 100% (53687091200/53687091200), sparse 2% (1565765632), duration 560, read/write 84/84 MB/s
INFO: transferred 53687 MB in 560 seconds (95 MB/s)
INFO: archive file size: 22.38GB
INFO: Finished Backup of VM 117 (00:09:22)
INFO: Backup finished at 2020-07-13 10:05:20




with fio and a testfile of 5G (4k blocs) / 50G (512k blocs and 4M blocks), I get these results :

with 4k blocs :
Code:
#:/mnt/pve/NFS_backups# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=5g --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=55.6MiB/s][w=14.2k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=62888: Mon Jul 13 10:09:25 2020
  write: IOPS=8347, BW=32.6MiB/s (34.2MB/s)(5120MiB/157026msec); 0 zone resets
   bw (  KiB/s): min=10376, max=85584, per=99.99%, avg=33384.69, stdev=15963.83, samples=314
   iops        : min= 2594, max=21396, avg=8346.14, stdev=3990.97, samples=314
  cpu          : usr=4.65%, sys=13.75%, ctx=631567, majf=0, minf=2360
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1310720,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=32.6MiB/s (34.2MB/s), 32.6MiB/s-32.6MiB/s (34.2MB/s-34.2MB/s), io=5120MiB (5369MB), run=157026-157026msec

with 512k blocs :
Code:
#:/mnt/pve/NFS_backups# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=512k --iodepth=64 --size=50g --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=353MiB/s][w=705 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=64837: Mon Jul 13 10:15:12 2020
  write: IOPS=589, BW=295MiB/s (309MB/s)(50.0GiB/173662msec); 0 zone resets
   bw (  KiB/s): min=148480, max=780288, per=99.85%, avg=301443.44, stdev=57886.30, samples=347
   iops        : min=  290, max= 1524, avg=588.69, stdev=113.06, samples=347
  cpu          : usr=2.10%, sys=6.41%, ctx=70250, majf=0, minf=106907
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,102400,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=295MiB/s (309MB/s), 295MiB/s-295MiB/s (309MB/s-309MB/s), io=50.0GiB (53.7GB), run=173662-173662msec

with 4M blocs :
Code:
#:/mnt/pve/NFS_backups# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4m --iodepth=64 --size=50g --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=0): [f(1)][100.0%][w=717MiB/s][w=179 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=835: Mon Jul 13 10:16:56 2020
  write: IOPS=189, BW=758MiB/s (794MB/s)(50.0GiB/67586msec); 0 zone resets
   bw (  KiB/s): min=434176, max=917504, per=99.63%, avg=772845.51, stdev=66402.77, samples=135
   iops        : min=  106, max=  224, avg=188.66, stdev=16.23, samples=135
  cpu          : usr=3.62%, sys=15.04%, ctx=3432, majf=0, minf=192417
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,12800,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=758MiB/s (794MB/s), 758MiB/s-758MiB/s (794MB/s-794MB/s), io=50.0GiB (53.7GB), run=67586-67586msec


Question behind these tests : backups does not seem to be slowed by hardware / network (I checked, CPU and memory are not saturetad at all).
is there a way to improve NFS backups speed ?
 
New tests :
To local storage :
no compression : INFO: transferred 34359 MB in 180 seconds (190 MB/s)
using "pigz : 8" : INFO: transferred 34359 MB in 302 seconds (113 MB/s)
using "pigz : 12" : INFO: transferred 34359 MB in 276 seconds (124 MB/s)

To NFS storage :
no compression : INFO: transferred 34359 MB in 194 seconds (177 MB/s)
using "pigz : 8" : INFO: transferred 34359 MB in 315 seconds (109 MB/s)
using "pigz : 12" : INFO: transferred 34359 MB in 292 seconds (117 MB/s)

playing with ionice does not change anything
So it seems it's not related to NFS storage. vzdump is just damn slow
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!