High IO delay on restore

andy77

Renowned Member
Jul 6, 2016
248
14
83
41
Hello @ all,

I am asking myself why a restore form an NFS storage (1Gbit/s) is causing an IO delay of over 60% even if the disks used on the node are NVMe SSDs that should be minimum 10 times faster then the possible transfer rate over the 1Gbit/s network of the NFS Storage?

It also caused a Load of 6 of 8 what is also very high. Ok this may be because of compression, but the IO is for me unknowable.

Regards
Andy
 
Hi,

I wait means a process waits for io in your case the nvme ssd is waiting for slow nfs data.
 
Can you be more specific out it get slow?
What filesystem do you use?
 
I also see these high IO delay while restoring to a zfs pool. I tried with and without dedup and/or compression. Regardless with settings I choose I get IO delays about 50%. The test system uses consumer grade SSD with SATA connections, but I see similar results with NVMe SSD. Maybe there's something while restoring a backup that causes high IO load to the filesystem that is independet from the underlying filesystem.

Code:
zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
r1_00  1.80T   904G   936G         -    90%    49%  1.42x  ONLINE  -
r1_01  1.86T   737G  1.14T         -    77%    38%  1.23x  ONLINE  -

zpool status
  pool: r1_00
 state: ONLINE
  scan: scrub repaired 0 in 4h42m with 0 errors on Sun Apr  9 05:06:49 2017
config:

        NAME                                                   STATE     READ WRITE CKSUM
        r1_00                                                  ONLINE       0     0     0
          mirror-0                                             ONLINE       0     0     0
            ata-SanDisk_SDSSDXPS960G_xxx-part1        ONLINE       0     0     0
            ata-SanDisk_Ultra_II_960GB_xxx-part1      ONLINE       0     0     0
          mirror-1                                             ONLINE       0     0     0
            ata-Samsung_SSD_850_PRO_1TB_xxx-part1  ONLINE       0     0     0
            ata-Samsung_SSD_850_PRO_1TB_xxx-part1  ONLINE       0     0     0
        logs
          ata-INTEL_SSDSC2BB160G4_xxx-part1     ONLINE       0     0     0

errors: No known data errors

  pool: r1_01
 state: ONLINE
  scan: scrub repaired 0 in 1h56m with 0 errors on Sun Apr  9 02:20:56 2017
config:

        NAME                                                   STATE     READ WRITE CKSUM
        r1_01                                                  ONLINE       0     0     0
          mirror-0                                             ONLINE       0     0     0
            ata-Samsung_SSD_850_PRO_2TB_xxx-part2  ONLINE       0     0     0
            ata-Samsung_SSD_850_PRO_2TB_xxx-part2  ONLINE       0     0     0
        logs
          ata-INTEL_SSDSC2BB160G4_xxx-part2     ONLINE       0     0     0

errors: No known data errors

PS: What is needed to get a response to a bug request like this: https://bugzilla.proxmox.com/show_bug.cgi?id=1344
 
I/O delay when you do unthrottled I/O intensive operations is normal - I/O delay just says that there are processes waiting for I/O to complete. If your systems has a high I/O delay constantly - then you have a problem, because your system is bottle-necked by I/O performance. If you see high I/O delay while you are actually attempting to do more I/O than your system can handle (like when you are restoring a backup, which is almost always limited by either the source's read or the targets write performance, or when you are benchmarking ;)), then this is just a result of what you are doing, and not an indicator of a problem.

What can be a problem is swamping some disk / storage / .. with requests by a restore such that it cannot handle other requests in parallel in a timely manner.