Hi All,
Hoping to gain a better understanding of why I'm seeing a difference in performance running the same fio command in different contexts.
A little background, my pool originally consisted contained 4 vdevs, each with 2x 3TB. However due to a failing disk on mirror-3 and wanting to increase my storage capacity for the future I ended up replacing both 3TB drives with 10TB instead.
Recently I've been trying to test if my storage is performing as expected for spinning rust. I've come to learn that while I've increased my total capacity the pool is current imbalanced. I understand that any new writes zfs distributes to the vdevs will favor the one with the highest free space in a ratio determined to ensure the pool eventually allocation evens out.
But this has taken me down a rabbit hole in trying to understand why I'm seeing the numbers I am. Each of the tests I've run are below. It seems that the numbers I achieve from my first lxc which was on the original zdev configuration performs nearly identical to the pool pool, however running the same command in a newly created lxc the performance is vastly different.
I've also run each fio on the lxc and subvol folder on my zfs pool with results being within a margin of error of each other.
So in summary, the questions I'm hoping to have answered are as follows:
fio command:
zfs pool:
Running 4 mirrored vdevs.
Hoping to gain a better understanding of why I'm seeing a difference in performance running the same fio command in different contexts.
A little background, my pool originally consisted contained 4 vdevs, each with 2x 3TB. However due to a failing disk on mirror-3 and wanting to increase my storage capacity for the future I ended up replacing both 3TB drives with 10TB instead.
Recently I've been trying to test if my storage is performing as expected for spinning rust. I've come to learn that while I've increased my total capacity the pool is current imbalanced. I understand that any new writes zfs distributes to the vdevs will favor the one with the highest free space in a ratio determined to ensure the pool eventually allocation evens out.
But this has taken me down a rabbit hole in trying to understand why I'm seeing the numbers I am. Each of the tests I've run are below. It seems that the numbers I achieve from my first lxc which was on the original zdev configuration performs nearly identical to the pool pool, however running the same command in a newly created lxc the performance is vastly different.
I've also run each fio on the lxc and subvol folder on my zfs pool with results being within a margin of error of each other.
So in summary, the questions I'm hoping to have answered are as follows:
- Is the imbalance in my pool the cause of the differing fio tests?
- It seems only one of my original lxc containers is able to match the performance of the root pool, why?
- What can be done to improve my spinning rust performance?
fio command:
Code:
fio --name=test --size=5g --rw=write --ioengine=posixaio --direct=1 --bs=1m
zfs pool:
Running 4 mirrored vdevs.
Code:
pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 12:50:55 with 0 errors on Sun Feb 6 12:50:57 2022
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD30EFRX-REDACTED ONLINE 0 0 0
ata-WDC_WD30EFRX-REDACTED ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD30EFRX-REDACTED ONLINE 0 0 0
ata-WDC_WD30EFRX-REDACTED ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD30EFRX-REDACTED ONLINE 0 0 0
ata-WDC_WD30EFRX-REDACTED ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
ata-WDC_WD100EFAX-REDACTED ONLINE 0 0 0
ata-WDC_WD100EFAX-REDACTED ONLINE 0 0 0
errors: No known data errors
Running fio on the root of my pool:
Code:
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
fio-3.25
Starting 1 process
test: Laying out IO file (1 file / 5120MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=221MiB/s][w=221 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=974661: Tue Feb 8 17:10:41 2022
write: IOPS=609, BW=609MiB/s (639MB/s)(5120MiB/8404msec); 0 zone resets
slat (usec): min=7, max=160, avg=22.84, stdev= 9.90
clat (usec): min=134, max=33390, avg=1616.04, stdev=1687.71
lat (usec): min=152, max=33434, avg=1638.87, stdev=1691.40
clat percentiles (usec):
| 1.00th=[ 147], 5.00th=[ 163], 10.00th=[ 239], 20.00th=[ 251],
| 30.00th=[ 265], 40.00th=[ 289], 50.00th=[ 396], 60.00th=[ 2114],
| 70.00th=[ 2933], 80.00th=[ 3261], 90.00th=[ 3884], 95.00th=[ 4146],
| 99.00th=[ 4752], 99.50th=[ 5538], 99.90th=[11994], 99.95th=[15139],
| 99.99th=[33424]
bw ( KiB/s): min=231424, max=3084288, per=100.00%, avg=642176.00, stdev=905215.68, samples=16
iops : min= 226, max= 3012, avg=627.12, stdev=884.00, samples=16
lat (usec) : 250=19.02%, 500=33.24%, 750=1.19%, 1000=0.18%
lat (msec) : 2=5.86%, 4=33.05%, 10=7.34%, 20=0.08%, 50=0.04%
cpu : usr=1.81%, sys=0.25%, ctx=5377, majf=0, minf=48
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,5120,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=609MiB/s (639MB/s), 609MiB/s-609MiB/s (639MB/s-639MB/s), io=5120MiB (5369MB), run=8404-8404msec
Code:
root@:/tank# zpool iostat -vy 15 1
capacity operations bandwidth
pool alloc free read write read write
-------------------------------------------- ----- ----- ----- ----- ----- -----
tank 7.93T 9.29T 4 1.23K 155K 493M
mirror 2.23T 499G 1 315 34.7K 100M
ata-WDC_WD30EFRX-REDACTED - - 0 170 20.0K 50.4M
ata-WDC_WD30EFRX-REDACTED - - 0 144 14.7K 49.8M
mirror 2.26T 474G 1 334 50.9K 119M
ata-WDC_WD30EFRX-REDACTED - - 0 150 23.5K 59.4M
ata-WDC_WD30EFRX-REDACTED - - 0 183 27.5K 59.7M
mirror 2.27T 459G 1 274 36.8K 119M
ata-WDC_WD30EFRX-REDACTED - - 0 135 24.3K 59.1M
ata-WDC_WD30EFRX-REDACTED - - 0 139 12.5K 59.8M
mirror 1.17T 7.89T 0 333 33.1K 155M
ata-WDC_WD100EFAX-REDACTED - - 0 165 19.7K 77.8M
ata-WDC_WD100EFAX-REDACTED - - 0 167 13.3K 77.4M
-------------------------------------------- ----- ----- ----- ----- ----- -----
fio test #2:
Run on my first lxc from the original pool configuration.
Code:
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=276MiB/s][w=276 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3976819: Wed Feb 9 01:55:53 2022
write: IOPS=651, BW=651MiB/s (683MB/s)(5120MiB/7859msec); 0 zone resets
slat (usec): min=7, max=578, avg=22.18, stdev=12.62
clat (usec): min=145, max=59141, avg=1510.28, stdev=2204.29
lat (usec): min=155, max=59191, avg=1532.46, stdev=2207.01
clat percentiles (usec):
| 1.00th=[ 149], 5.00th=[ 151], 10.00th=[ 153], 20.00th=[ 198],
| 30.00th=[ 265], 40.00th=[ 318], 50.00th=[ 424], 60.00th=[ 1532],
| 70.00th=[ 2606], 80.00th=[ 3097], 90.00th=[ 3458], 95.00th=[ 3621],
| 99.00th=[ 4490], 99.50th=[ 6783], 99.90th=[27919], 99.95th=[45876],
| 99.99th=[58983]
bw ( KiB/s): min=206435, max=3033088, per=100.00%, avg=684380.53, stdev=888845.19, samples=15
iops : min= 201, max= 2962, avg=668.27, stdev=868.05, samples=15
lat (usec) : 250=25.96%, 500=25.18%, 750=1.17%, 1000=0.37%
lat (msec) : 2=13.52%, 4=32.15%, 10=1.27%, 20=0.20%, 50=0.16%
lat (msec) : 100=0.04%
cpu : usr=1.68%, sys=0.43%, ctx=5742, majf=0, minf=49
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,5120,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=651MiB/s (683MB/s), 651MiB/s-651MiB/s (683MB/s-683MB/s), io=5120MiB (5369MB), run=7859-7859msec
Code:
root@:/tank# zpool iostat -vy 15 1
capacity operations bandwidth
pool alloc free read write read write
-------------------------------------------- ----- ----- ----- ----- ----- -----
tank 7.93T 9.29T 1 2.15K 8.80K 258M
mirror 2.23T 499G 0 526 3.20K 61.0M
ata-WDC_WD30EFRX-REDACTED - - 0 272 1.87K 30.7M
ata-WDC_WD30EFRX-REDACTED - - 0 253 1.33K 30.3M
mirror 2.26T 474G 0 597 819 44.5M
ata-WDC_WD30EFRX-REDACTED - - 0 292 273 22.2M
ata-WDC_WD30EFRX-REDACTED - - 0 304 546 22.4M
mirror 2.27T 459G 0 360 3.73K 100M
ata-WDC_WD30EFRX-REDACTED - - 0 184 2.93K 50.2M
ata-WDC_WD30EFRX-REDACTED - - 0 175 819 50.0M
mirror 1.17T 7.89T 0 713 1.07K 52.1M
ata-WDC_WD100EFAX-REDACTED - - 0 349 0 25.9M
ata-WDC_WD100EFAX-REDACTED - - 0 363 1.07K 26.3M
-------------------------------------------- ----- ----- ----- ----- ----- -----
fio test #3:
Run on a newly created lxc.
Code:
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
fio-3.25
clock setaffinity failed: Invalid argument
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=4104: Tue Feb 8 14:56:26 2022
write: IOPS=246, BW=246MiB/s (258MB/s)(5120MiB/20793msec); 0 zone resets
slat (usec): min=6, max=192, avg=12.78, stdev= 3.84
clat (usec): min=144, max=2677.3k, avg=4047.24, stdev=97625.45
lat (usec): min=155, max=2677.3k, avg=4060.02, stdev=97625.43
clat percentiles (usec):
| 1.00th=[ 151], 5.00th=[ 151], 10.00th=[ 153],
| 20.00th=[ 153], 30.00th=[ 153], 40.00th=[ 155],
| 50.00th=[ 155], 60.00th=[ 161], 70.00th=[ 176],
| 80.00th=[ 198], 90.00th=[ 210], 95.00th=[ 219],
| 99.00th=[ 355], 99.50th=[ 408], 99.90th=[2399142],
| 99.95th=[2634023], 99.99th=[2667578]
bw ( KiB/s): min=225280, max=1265664, per=100.00%, avg=837599.00, stdev=400357.30, samples=12
iops : min= 220, max= 1236, avg=817.92, stdev=390.92, samples=12
lat (usec) : 250=97.36%, 500=2.23%, 750=0.02%, 1000=0.02%
lat (msec) : 4=0.04%, 10=0.06%, 20=0.08%, 50=0.04%, >=2000=0.16%
cpu : usr=0.42%, sys=0.06%, ctx=10243, majf=0, minf=51
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,5120,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s), io=5120MiB (5369MB), run=20793-20793msec
Code:
root@:/tank# zpool iostat -vy 30 1
capacity operations bandwidth
pool alloc free read write read write
-------------------------------------------- ----- ----- ----- ----- ----- -----
tank 7.93T 9.29T 1 921 14.3K 346M
mirror 2.23T 499G 0 203 2.27K 75.2M
ata-WDC_WD30EFRX-REDACTED - - 0 101 1.20K 37.6M
ata-WDC_WD30EFRX-REDACTED - - 0 101 1.07K 37.6M
mirror 2.26T 474G 0 266 2.80K 77.6M
ata-WDC_WD30EFRX-REDACTED - - 0 133 546 38.8M
ata-WDC_WD30EFRX-REDACTED - - 0 132 2.27K 38.8M
mirror 2.27T 459G 0 192 2.93K 81.5M
ata-WDC_WD30EFRX-REDACTED - - 0 94 2.93K 40.7M
ata-WDC_WD30EFRX-REDACTED - - 0 97 0 40.7M
mirror 1.17T 7.89T 0 259 6.27K 111M
ata-WDC_WD100EFAX-REDACTED - - 0 127 5.87K 55.6M
ata-WDC_WD100EFAX-REDACTED - - 0 132 409 55.6M
-------------------------------------------- ----- ----- ----- ----- ----- -----