Only increase zfs_dirty_data_max (4294967296 -> 10737418240 -> 21474836480 -> 42949672960) compensate performance penalties, but this is background record same slow per nvme devices ~10k iops per device:
We can increase IOPS by increasing:
zfs_vdev_async_write_min_active
zfs_vdev_async_write_max_active
Than single nvme device have raw speed 700k iops:
Bash:
# fio --time_based --name=benchmark --size=15G --runtime=30 --filename=/mnt/zfs/g-fio.test --ioengine=libaio --randrepeat=0 --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting
benchmark: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
...
fio-3.25
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][w=871MiB/s][w=223k IOPS][eta 00m:00s]
benchmark: (groupid=0, jobs=4): err= 0: pid=13035: Thu Nov 25 18:19:06 2021
write: IOPS=166k, BW=650MiB/s (682MB/s)(19.0GiB/30001msec); 0 zone resets
# iostat -x 1 | awk '{print $1"\t"$8"\t"$9}'
Device w/s wkB/s
loop0 1.00 4.00
loop1 1.00 4.00
nvme0n1 0.00 0.00
nvme1n1 7963.00 398872.00
nvme2n1 6197.00 393752.00
nvme3n1 8052.00 403096.00
nvme4n1 7933.00 398872.00
nvme5n1 0.00 0.00
nvme6n1 0.00 0.00
We can increase IOPS by increasing:
zfs_vdev_async_write_min_active
zfs_vdev_async_write_max_active
Bash:
Device w/s wkB/s
loop0 0.00 0.00
loop1 0.00 0.00
nvme0n1 0.00 0.00
nvme1n1 48071.00 1595316.00
nvme2n1 47334.00 1496244.00
nvme3n1 48044.00 1595120.00
nvme4n1 47676.00 1549908.00
nvme5n1 0.00 0.00
nvme6n1 0.00 0.00
Than single nvme device have raw speed 700k iops:
Bash:
# fio --time_based --name=benchmark --size=15G --runtime=30 --filename=/dev/nvme6n1 --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting
benchmark: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.25
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][w=2761MiB/s][w=707k IOPS][eta 00m:00s]
benchmark: (groupid=0, jobs=4): err= 0: pid=3828468: Thu Nov 25 21:30:06 2021
write: IOPS=706k, BW=2758MiB/s (2892MB/s)(80.8GiB/30001msec); 0 zone resets
slat (nsec): min=1300, max=258439, avg=2391.93, stdev=1159.86
clat (usec): min=314, max=2934, avg=722.11, stdev=111.84
lat (usec): min=316, max=2936, avg=724.57, stdev=111.82
clat percentiles (usec):
| 1.00th=[ 502], 5.00th=[ 553], 10.00th=[ 586], 20.00th=[ 627],
| 30.00th=[ 660], 40.00th=[ 685], 50.00th=[ 717], 60.00th=[ 750],
| 70.00th=[ 783], 80.00th=[ 816], 90.00th=[ 865], 95.00th=[ 906],
| 99.00th=[ 988], 99.50th=[ 1020], 99.90th=[ 1106], 99.95th=[ 1205],
| 99.99th=[ 1942]
bw ( MiB/s): min= 2721, max= 2801, per=100.00%, avg=2760.62, stdev= 3.03, samples=236
iops : min=696742, max=717228, avg=706717.86, stdev=775.87, samples=236
lat (usec) : 500=0.97%, 750=59.81%, 1000=38.47%
lat (msec) : 2=0.74%, 4=0.01%
cpu : usr=24.41%, sys=44.96%, ctx=5787827, majf=0, minf=70
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=0,21184205,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128
Run status group 0 (all jobs):
WRITE: bw=2758MiB/s (2892MB/s), 2758MiB/s-2758MiB/s (2892MB/s-2892MB/s), io=80.8GiB (86.8GB), run=30001-30001msec
Disk stats (read/write):
nvme6n1: ios=50/21098501, merge=0/0, ticks=4/15168112, in_queue=15168116, util=99.77%
Last edited: