slow speed on NFS backups

Rico29

Well-Known Member
Jan 10, 2018
30
3
48
40
Hello
I recently add a NFS server to my proxmox architecture, for NFS backups.
proxmox v6.1-8
Proxmox is using ceph storage for VMs (full 2*10G network, mtru=9000, full high-perf ssd).
NFS storage is attached via 10Gb link too, and mtu is set to 9000.


Proxmox is 172.18.2.191, NFS storage is 172.18.2.199.
Code:
proxmox# ping -M do 172.18.2.199 -s $((9000-28)) -c1
PING 172.18.2.199 (172.18.2.199) 8972(9000) bytes of data.
8980 bytes from 172.18.2.199: icmp_seq=1 ttl=64 time=0.234 ms

--- 172.18.2.199 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.234/0.234/0.234/0.000 ms
NFS share is mounted (via proxmox web GUI) like this :
Code:
# mount | grep -i nfs
172.18.2.199:/mnt/iscsi-c7000-pa2-pxmx-BACKUPS on /mnt/pve/NFS_backups type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.18.2.191,local_lock=none,addr=172.18.2.199)


Backups are slow :
Code:
INFO: starting new backup job: vzdump 162 117 --node c7000-pa2-pxmx1 --mailnotification always --storage NFS_backups --all 0 --compress lzo --mailto exploit@dom.tld --mode snapshot --quiet 1
INFO: Starting Backup of VM 117 (qemu)
INFO: Backup started at 2020-07-13 09:55:58
INFO: status = running
INFO: update VM 117: -lock backup
INFO: VM Name: seniorMedia
INFO: include disk 'virtio0' 'ceph_storage:vm-117-disk-0' 50G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/NFS_backups/dump/vzdump-qemu-117-2020_07_13-09_55_58.vma.lzo'
INFO: started backup task '7592b581-2809-4215-9aa9-eada14ab06de'
INFO: status: 0% (364904448/53687091200), sparse 0% (38543360), duration 3, read/write 121/108 MB/s

...

INFO: status: 100% (53687091200/53687091200), sparse 2% (1565765632), duration 560, read/write 84/84 MB/s
INFO: transferred 53687 MB in 560 seconds (95 MB/s)
INFO: archive file size: 22.38GB
INFO: Finished Backup of VM 117 (00:09:22)
INFO: Backup finished at 2020-07-13 10:05:20




with fio and a testfile of 5G (4k blocs) / 50G (512k blocs and 4M blocks), I get these results :

with 4k blocs :
Code:
#:/mnt/pve/NFS_backups# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=5g --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=55.6MiB/s][w=14.2k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=62888: Mon Jul 13 10:09:25 2020
  write: IOPS=8347, BW=32.6MiB/s (34.2MB/s)(5120MiB/157026msec); 0 zone resets
   bw (  KiB/s): min=10376, max=85584, per=99.99%, avg=33384.69, stdev=15963.83, samples=314
   iops        : min= 2594, max=21396, avg=8346.14, stdev=3990.97, samples=314
  cpu          : usr=4.65%, sys=13.75%, ctx=631567, majf=0, minf=2360
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1310720,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=32.6MiB/s (34.2MB/s), 32.6MiB/s-32.6MiB/s (34.2MB/s-34.2MB/s), io=5120MiB (5369MB), run=157026-157026msec

with 512k blocs :
Code:
#:/mnt/pve/NFS_backups# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=512k --iodepth=64 --size=50g --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=353MiB/s][w=705 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=64837: Mon Jul 13 10:15:12 2020
  write: IOPS=589, BW=295MiB/s (309MB/s)(50.0GiB/173662msec); 0 zone resets
   bw (  KiB/s): min=148480, max=780288, per=99.85%, avg=301443.44, stdev=57886.30, samples=347
   iops        : min=  290, max= 1524, avg=588.69, stdev=113.06, samples=347
  cpu          : usr=2.10%, sys=6.41%, ctx=70250, majf=0, minf=106907
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,102400,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=295MiB/s (309MB/s), 295MiB/s-295MiB/s (309MB/s-309MB/s), io=50.0GiB (53.7GB), run=173662-173662msec

with 4M blocs :
Code:
#:/mnt/pve/NFS_backups# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4m --iodepth=64 --size=50g --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=0): [f(1)][100.0%][w=717MiB/s][w=179 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=835: Mon Jul 13 10:16:56 2020
  write: IOPS=189, BW=758MiB/s (794MB/s)(50.0GiB/67586msec); 0 zone resets
   bw (  KiB/s): min=434176, max=917504, per=99.63%, avg=772845.51, stdev=66402.77, samples=135
   iops        : min=  106, max=  224, avg=188.66, stdev=16.23, samples=135
  cpu          : usr=3.62%, sys=15.04%, ctx=3432, majf=0, minf=192417
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,12800,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=758MiB/s (794MB/s), 758MiB/s-758MiB/s (794MB/s-794MB/s), io=50.0GiB (53.7GB), run=67586-67586msec


Question behind these tests : backups does not seem to be slowed by hardware / network (I checked, CPU and memory are not saturetad at all).
is there a way to improve NFS backups speed ?
 
New tests :
To local storage :
no compression : INFO: transferred 34359 MB in 180 seconds (190 MB/s)
using "pigz : 8" : INFO: transferred 34359 MB in 302 seconds (113 MB/s)
using "pigz : 12" : INFO: transferred 34359 MB in 276 seconds (124 MB/s)

To NFS storage :
no compression : INFO: transferred 34359 MB in 194 seconds (177 MB/s)
using "pigz : 8" : INFO: transferred 34359 MB in 315 seconds (109 MB/s)
using "pigz : 12" : INFO: transferred 34359 MB in 292 seconds (117 MB/s)

playing with ionice does not change anything
So it seems it's not related to NFS storage. vzdump is just damn slow