Poor performance of ZFS pool on SSD

docent

Renowned Member
Jul 23, 2009
94
1
73
I have 5 SSD disks connected to P420i in HBA mode on DL380 Gen8.

Code:
Smart Array P420i in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   RAID 6 (ADG) Status: Enabled
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 8.32-0
   Cache Board Present: True
   Cache Status: Not Configured
   Total Cache Size: 1.0
   Total Cache Memory Available: 0.8
   Driver Name: hpsa
   Driver Version: 3.4.20
   HBA Mode Enabled: True
   Port Max Phy Rate Limiting Supported: False
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None
Code:
Smart Array P420i in Slot 0 (Embedded)
   HBA Drives
      physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA SSD, 2 TB, OK)
      physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA SSD, 2 TB, OK)
      physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA SSD, 2 TB, OK)
      physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA SSD, 2 TB, OK)
      physicaldrive 2I:2:5 (port 2I:box 2:bay 5, SATA SSD, 2 TB, OK)
Code:
Smart Array P420i in Slot 0 (Embedded)
   HBA Drives
      physicaldrive 1I:2:4
         Port: 1I
         Box: 2
         Bay: 4
         Status: OK
         Drive Type: HBA Mode Drive
         Interface Type: Solid State SATA
         Size: 2 TB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Model: ATA     Samsung SSD 860
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Disk Name: /dev/sde
         Mount Points: None
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None
Code:
Device Model:     Samsung SSD 860 QVO 2TB
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Each disk produces about 420 IOPS on 4K blocks.

Code:
root@vmc1-3:/var/log# fio --filename=/dev/sdd --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=10 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=1673KiB/s][w=418 IOPS][eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=2637528: Wed Oct 28 17:57:32 2020
  write: IOPS=454, BW=1820KiB/s (1863kB/s)(17.8MiB/10002msec); 0 zone resets
    slat (usec): min=43, max=473, avg=47.68, stdev= 6.74
    clat (usec): min=1869, max=12936, avg=2141.43, stdev=538.19
     lat (usec): min=1917, max=13059, avg=2190.08, stdev=538.69
    clat percentiles (usec):
     |  1.00th=[ 1942],  5.00th=[ 2008], 10.00th=[ 2024], 20.00th=[ 2040],
     | 30.00th=[ 2057], 40.00th=[ 2073], 50.00th=[ 2089], 60.00th=[ 2114],
     | 70.00th=[ 2147], 80.00th=[ 2147], 90.00th=[ 2147], 95.00th=[ 2147],
     | 99.00th=[ 2573], 99.50th=[ 7635], 99.90th=[ 8848], 99.95th=[10028],
     | 99.99th=[12911]
   bw (  KiB/s): min= 1496, max= 1872, per=100.00%, avg=1819.60, stdev=93.10, samples=20
   iops        : min=  374, max=  468, avg=454.90, stdev=23.27, samples=20
  lat (msec)   : 2=4.37%, 4=94.81%, 10=0.75%, 20=0.07%
  cpu          : usr=0.84%, sys=2.30%, ctx=9100, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4550,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1820KiB/s (1863kB/s), 1820KiB/s-1820KiB/s (1863kB/s-1863kB/s), io=17.8MiB (18.6MB), run=10002-10002msec

Disk stats (read/write):
  sdd: ios=100/4500, merge=0/0, ticks=20/9679, in_queue=192, util=97.97%

I ctreated a ZFS pool RAIDZ1 on them.

Code:
NAME                                                 SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
pool1                                               9.09T  2.37G  9.09T        -         -     3%     0%  1.00x    ONLINE  -
  raidz1                                            9.09T  2.37G  9.09T        -         -     3%  0.02%      -  ONLINE
    scsi-35002538e90130b1f                              -      -      -        -         -      -      -      -  ONLINE
    scsi-35002538e90130b22                              -      -      -        -         -      -      -      -  ONLINE
    scsi-35002538e90130b1b                              -      -      -        -         -      -      -      -  ONLINE
    scsi-35002538e90130b15                              -      -      -        -         -      -      -      -  ONLINE
    scsi-35002538e90130b1a                              -      -      -        -         -      -      -      -  ONLINE
Code:
              total        used        free      shared  buff/cache   available
Mem:          125Gi       3.4Gi       121Gi        60Mi       466Mi       121Gi

But I don't understand why the performance of ZFS pool is so poor.

Code:
root@vmc1-3:~# fio --ioengine=libaio --filename=/pool1/fiotank --size=8G --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
journal-test: Laying out IO file (1 file / 8192MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=2450KiB/s][w=612 IOPS][eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=78825: Thu Oct 29 12:20:35 2020
  write: IOPS=577, BW=2308KiB/s (2364kB/s)(135MiB/60002msec); 0 zone resets
    slat (usec): min=1333, max=12376, avg=1718.62, stdev=857.33
    clat (usec): min=2, max=330, avg= 5.89, stdev= 1.91
     lat (usec): min=1341, max=12386, avg=1726.18, stdev=857.35
    clat percentiles (nsec):
     |  1.00th=[ 2608],  5.00th=[ 5664], 10.00th=[ 5792], 20.00th=[ 5856],
     | 30.00th=[ 5856], 40.00th=[ 5920], 50.00th=[ 5920], 60.00th=[ 5920],
     | 70.00th=[ 5984], 80.00th=[ 5984], 90.00th=[ 6048], 95.00th=[ 6048],
     | 99.00th=[ 6560], 99.50th=[ 7200], 99.90th=[21632], 99.95th=[23936],
     | 99.99th=[28032]
   bw (  KiB/s): min= 1704, max= 2576, per=99.99%, avg=2307.86, stdev=210.69, samples=120
   iops        : min=  426, max=  644, avg=576.92, stdev=52.65, samples=120
  lat (usec)   : 4=2.22%, 10=97.66%, 20=0.01%, 50=0.11%, 500=0.01%
  cpu          : usr=1.16%, sys=8.08%, ctx=69324, majf=0, minf=22
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,34625,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Why each disk is loaded the same as entire ZFS pool.
And what does mean ~500 r/s at 0 rkB/s.

Code:
Linux 5.4.65-1-pve (vmc1-3)     10/29/2020      _x86_64_        (40 CPU)

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb            478.20  390.50      0.00   1826.00     0.00     0.10   0.00   0.03    0.89    0.07   0.01     0.00     4.68   0.91  79.24
sdc            478.30  389.90      0.00   1824.80     0.00     0.00   0.00   0.00    0.87    0.07   0.01     0.00     4.68   0.91  79.16
sdd            478.20  389.40      0.00   1824.00     0.00     0.10   0.00   0.03    0.86    0.07   0.01     0.00     4.68   0.91  79.36
sde            478.20  390.20      0.00   1823.20     0.00     0.10   0.00   0.03    0.85    0.07   0.01     0.00     4.67   0.91  79.36
sdf            478.20  389.80      0.00   1827.60     0.00     0.20   0.00   0.05    0.85    0.07   0.01     0.00     4.69   0.91  79.12

sdb            584.00  484.80      0.00   2544.80     0.00     0.00   0.00   0.00    0.88    0.08   0.01     0.00     5.25   0.90  96.64
sdc            583.90  482.70      0.00   2543.60     0.00     0.10   0.00   0.02    0.86    0.08   0.01     0.00     5.27   0.91  96.76
sdd            584.00  483.60      0.00   2551.20     0.00     0.10   0.00   0.02    0.85    0.08   0.01     0.00     5.28   0.91  96.88
sde            584.10  484.30      0.00   2551.60     0.00     0.10   0.00   0.02    0.87    0.08   0.01     0.00     5.27   0.91  97.08
sdf            584.00  483.40      0.00   2544.80     0.00     0.20   0.00   0.04    0.84    0.08   0.01     0.00     5.26   0.91  96.96

sdb            569.60  472.70      0.00   2492.80     0.00     0.10   0.00   0.02    0.89    0.07   0.02     0.00     5.27   0.92  96.40
sdc            569.60  471.40      0.00   2494.80     0.00     0.30   0.00   0.06    0.86    0.08   0.01     0.00     5.29   0.93  96.60
sdd            569.60  471.80      0.00   2491.20     0.00     0.20   0.00   0.04    0.85    0.07   0.01     0.00     5.28   0.93  96.40
sde            569.50  472.00      0.00   2490.80     0.00     0.00   0.00   0.00    0.84    0.08   0.01     0.00     5.28   0.93  96.60
sdf            569.70  471.90      0.00   2489.20     0.00     0.00   0.00   0.00    0.84    0.08   0.01     0.00     5.27   0.93  96.76

sdb            577.00  478.50      0.00   2513.60     0.00     0.20   0.00   0.04    0.86    0.08   0.00     0.00     5.25   0.92  97.52
sdc            577.10  478.60      0.00   2511.60     0.00     0.00   0.00   0.00    0.86    0.08   0.01     0.00     5.25   0.92  97.44
sdd            577.10  479.20      0.00   2511.20     0.00     0.20   0.00   0.04    0.85    0.08   0.01     0.00     5.24   0.92  97.52
sde            577.10  477.70      0.00   2510.00     0.00     0.00   0.00   0.00    0.84    0.07   0.01     0.00     5.25   0.93  97.72
sdf            577.00  478.90      0.00   2512.40     0.00     0.30   0.00   0.06    0.84    0.08   0.01     0.00     5.25   0.92  97.56

sdb            572.30  475.80      0.00   2497.60     0.00     0.10   0.00   0.02    0.88    0.07   0.01     0.00     5.25   0.93  97.00
sdc            572.20  477.70      0.00   2498.80     0.00     0.00   0.00   0.00    0.85    0.07   0.00     0.00     5.23   0.92  96.96
sdd            572.20  474.40      0.00   2499.20     0.00     0.10   0.00   0.02    0.87    0.07   0.02     0.00     5.27   0.93  96.92
sde            572.30  474.30      0.00   2498.80     0.00     0.20   0.00   0.04    0.85    0.07   0.01     0.00     5.27   0.93  97.00
sdf            572.30  474.40      0.00   2497.60     0.00     0.00   0.00   0.00    0.84    0.07   0.01     0.00     5.26   0.93  97.08

sdb            576.20  478.70      0.00   2522.40     0.00     0.00   0.00   0.00    0.88    0.07   0.01     0.00     5.27   0.92  97.04
sdc            576.20  479.80      0.00   2522.40     0.00     0.10   0.00   0.02    0.88    0.07   0.02     0.00     5.26   0.92  97.40
sdd            576.20  477.90      0.00   2525.20     0.00     0.10   0.00   0.02    0.86    0.07   0.01     0.00     5.28   0.92  97.12
sde            576.20  477.30      0.00   2525.60     0.00     0.10   0.00   0.02    0.83    0.08   0.00     0.00     5.29   0.92  97.04
sdf            576.20  477.40      0.00   2524.40     0.00     0.00   0.00   0.00    0.84    0.07   0.01     0.00     5.29   0.92  97.28
 
Samsung SSD 860 QVO 2TB
These are QLC SSDs with two layers of caches, they are slow. Additionally the raidz1 will decrease performance as well.

EDIT: run the FIO job you used on a single disk and with %50 of the SDDs size to get the IOps of the single SSD.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!