Hello all,
I'm using a single node proxmox server with ZFS in mirror:
With the following drives:
On a host with AMD EPYC 7272 12-Core Processor and 256G RAM.
I am trying to verify disk performance before installing customer vms on it.
I am first running fio on windows vm with seq read 1M and the results are acceptable:
But as soon as I test with random read 4K, it gets really worse:
I tried to benchmark the host directly (using /dev device that was created by ZFS, maybe that's not correct?) with the same specs (only changing device and ioengine obviously), results are not that great either but they are better still:
For the record, here is the same test but with a mirror member directly (bypassing ZFS altogether):
Compression is on on zpool.
I am not that experienced with ZFS and if you have any pointers for me that would be awesome!
Thanks a lot in advance,
Best,
BTW: there is also a noticable difference between the raw device and the zfs device in seq read 1M
I'm using a single node proxmox server with ZFS in mirror:
Code:
pool: rpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.002538bc31a32330-part3 ONLINE 0 0 0
nvme-eui.002538bc31a32315-part3 ONLINE 0 0 0
With the following drives:
Code:
SAMSUNG MZVL22T0HBLB-00B00
On a host with AMD EPYC 7272 12-Core Processor and 256G RAM.
I am trying to verify disk performance before installing customer vms on it.
I am first running fio on windows vm with seq read 1M and the results are acceptable:
Code:
C:\Users\Administrator>fio --name=mytest --filename=\\.\PhysicalDrive0 --rw=read --bs=1M --ioengine=windowsaio --direct=1 --time_based --runtime=30 --group_reporting --iodepth=16 --thread=1
mytest: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=windowsaio, iodepth=16
fio-3.38
Starting 1 thread
Jobs: 1 (f=0): [f(1)][100.0%][r=5496MiB/s][r=5495 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=7188: Fri Dec 13 21:08:32 2024
read: IOPS=5382, BW=5383MiB/s (5644MB/s)(158GiB/30003msec)
slat (nsec): min=900, max=1747.7k, avg=61804.56, stdev=35938.80
clat (usec): min=776, max=12086, avg=2874.11, stdev=418.40
lat (usec): min=822, max=12172, avg=2935.91, stdev=416.70
clat percentiles (usec):
| 1.00th=[ 2114], 5.00th=[ 2311], 10.00th=[ 2409], 20.00th=[ 2573],
| 30.00th=[ 2737], 40.00th=[ 2835], 50.00th=[ 2900], 60.00th=[ 2966],
| 70.00th=[ 2999], 80.00th=[ 3064], 90.00th=[ 3195], 95.00th=[ 3326],
| 99.00th=[ 3884], 99.50th=[ 5014], 99.90th=[ 7635], 99.95th=[ 8586],
| 99.99th=[10028]
bw ( MiB/s): min= 4959, max= 6282, per=100.00%, avg=5386.79, stdev=337.54, samples=59
iops : min= 4959, max= 6282, avg=5386.42, stdev=337.55, samples=59
lat (usec) : 1000=0.01%
lat (msec) : 2=0.52%, 4=98.62%, 10=0.84%, 20=0.01%
cpu : usr=0.00%, sys=36.66%, ctx=0, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=12.0%, 16=87.9%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.9%, 8=0.1%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=161494,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=5383MiB/s (5644MB/s), 4096MiB/s-5383MiB/s (4295MB/s-5644MB/s), io=158GiB (169GB), run=30003-30003msec
But as soon as I test with random read 4K, it gets really worse:
Code:
C:\Users\Administrator>fio --name=mytest --filename=\\.\PhysicalDrive0 --rw=read --bs=1M --ioengine=windowsaio --direct=1 --time_based --runtime=30 --group_reporting --iodepth=16 --thread=1
mytest: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=windowsaio, iodepth=16
fio-3.38
Starting 1 thread
Jobs: 1 (f=0): [f(1)][100.0%][r=5496MiB/s][r=5495 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=7188: Fri Dec 13 21:08:32 2024
read: IOPS=5382, BW=5383MiB/s (5644MB/s)(158GiB/30003msec)
slat (nsec): min=900, max=1747.7k, avg=61804.56, stdev=35938.80
clat (usec): min=776, max=12086, avg=2874.11, stdev=418.40
lat (usec): min=822, max=12172, avg=2935.91, stdev=416.70
clat percentiles (usec):
| 1.00th=[ 2114], 5.00th=[ 2311], 10.00th=[ 2409], 20.00th=[ 2573],
| 30.00th=[ 2737], 40.00th=[ 2835], 50.00th=[ 2900], 60.00th=[ 2966],
| 70.00th=[ 2999], 80.00th=[ 3064], 90.00th=[ 3195], 95.00th=[ 3326],
| 99.00th=[ 3884], 99.50th=[ 5014], 99.90th=[ 7635], 99.95th=[ 8586],
| 99.99th=[10028]
bw ( MiB/s): min= 4959, max= 6282, per=100.00%, avg=5386.79, stdev=337.54, samples=59
iops : min= 4959, max= 6282, avg=5386.42, stdev=337.55, samples=59
lat (usec) : 1000=0.01%
lat (msec) : 2=0.52%, 4=98.62%, 10=0.84%, 20=0.01%
cpu : usr=0.00%, sys=36.66%, ctx=0, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=12.0%, 16=87.9%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.9%, 8=0.1%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=161494,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=5383MiB/s (5644MB/s), 4096MiB/s-5383MiB/s (4295MB/s-5644MB/s), io=158GiB (169GB), run=30003-30003msec
I tried to benchmark the host directly (using /dev device that was created by ZFS, maybe that's not correct?) with the same specs (only changing device and ioengine obviously), results are not that great either but they are better still:
Code:
fio --name=mytest --filename=/dev/zd64 --rw=randread --bs=4K --ioengine=libaio --direct=1 --time_based --runtime=30 --group_reporting --iodepth=16 --thread=1
mytest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
fio-3.33
Starting 1 thread
Jobs: 1 (f=1): [r(1)][100.0%][r=474MiB/s][r=121k IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=1067235: Sat Dec 14 06:18:36 2024
read: IOPS=54.2k, BW=212MiB/s (222MB/s)(6351MiB/30001msec)
slat (usec): min=2, max=138, avg= 6.32, stdev= 1.39
clat (usec): min=13, max=8556, avg=126.55, stdev=68.26
lat (usec): min=17, max=8611, avg=132.86, stdev=68.33
clat percentiles (usec):
| 1.00th=[ 50], 5.00th=[ 66], 10.00th=[ 75], 20.00th=[ 85],
| 30.00th=[ 93], 40.00th=[ 100], 50.00th=[ 108], 60.00th=[ 115],
| 70.00th=[ 126], 80.00th=[ 147], 90.00th=[ 243], 95.00th=[ 269],
| 99.00th=[ 310], 99.50th=[ 326], 99.90th=[ 408], 99.95th=[ 562],
| 99.99th=[ 947]
bw ( KiB/s): min=59872, max=501528, per=100.00%, avg=464471.11, stdev=81291.09, samples=27
iops : min=14970, max=125382, avg=116117.85, stdev=20322.39, samples=27
lat (usec) : 20=0.01%, 50=1.06%, 100=39.72%, 250=50.51%, 500=8.65%
lat (usec) : 750=0.04%, 1000=0.02%
lat (msec) : 2=0.01%, 10=0.01%
cpu : usr=6.88%, sys=91.77%, ctx=35080, majf=7, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=1625957,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=212MiB/s (222MB/s), 212MiB/s-212MiB/s (222MB/s-222MB/s), io=6351MiB (6660MB), run=30001-30001msec
Disk stats (read/write):
zd64: ios=1612016/128, merge=0/0, ticks=144074/234, in_queue=144308, util=44.91%
For the record, here is the same test but with a mirror member directly (bypassing ZFS altogether):
Code:
fio --name=mytest --filename=/dev/nvme0n1 --rw=randread --bs=4K --ioengine=libaio --direct=1 --time_based --runtime=30 --group_reporting --iodepth=16 --thread=1
mytest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
fio-3.33
Starting 1 thread
Jobs: 1 (f=1): [r(1)][100.0%][r=546MiB/s][r=140k IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=1075286: Sat Dec 14 06:48:24 2024
read: IOPS=137k, BW=536MiB/s (562MB/s)(15.7GiB/30001msec)
slat (usec): min=3, max=195, avg= 5.37, stdev= 2.73
clat (usec): min=12, max=8323, avg=110.16, stdev=31.05
lat (usec): min=15, max=8465, avg=115.53, stdev=31.17
clat percentiles (usec):
| 1.00th=[ 77], 5.00th=[ 84], 10.00th=[ 88], 20.00th=[ 93],
| 30.00th=[ 96], 40.00th=[ 98], 50.00th=[ 101], 60.00th=[ 104],
| 70.00th=[ 109], 80.00th=[ 129], 90.00th=[ 157], 95.00th=[ 167],
| 99.00th=[ 182], 99.50th=[ 190], 99.90th=[ 210], 99.95th=[ 221],
| 99.99th=[ 302]
bw ( KiB/s): min=532136, max=561104, per=100.00%, avg=549006.78, stdev=10284.35, samples=59
iops : min=133034, max=140276, avg=137251.69, stdev=2571.08, samples=59
lat (usec) : 20=0.01%, 50=0.01%, 100=46.25%, 250=53.73%, 500=0.01%
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 10=0.01%
cpu : usr=25.20%, sys=74.71%, ctx=394, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=4117503,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=536MiB/s (562MB/s), 536MiB/s-536MiB/s (562MB/s-562MB/s), io=15.7GiB (16.9GB), run=30001-30001msec
Disk stats (read/write):
nvme0n1: ios=4098031/882, merge=0/0, ticks=99420/54, in_queue=99511, util=98.83%
Compression is on on zpool.
I am not that experienced with ZFS and if you have any pointers for me that would be awesome!
Thanks a lot in advance,
Best,
BTW: there is also a noticable difference between the raw device and the zfs device in seq read 1M
Code:
Raw device:
READ: bw=6767MiB/s (7096MB/s), 6767MiB/s-6767MiB/s (7096MB/s-7096MB/s), io=198GiB (213GB), run=30002-30002msec
ZFS device:
READ: bw=4394MiB/s (4608MB/s), 4394MiB/s-4394MiB/s (4608MB/s-4608MB/s), io=129GiB (138GB), run=30006-30006msec