Poor ZFS performance On Supermicro vs random ASUS board

kobuki · Dec 28, 2017

All types of disks often lie about their physical layout and sector size. I'd suggest keeping your record size the same as your DB page size in any case. atime=off is a good suggestion but I don't expect it to change the performance for this test or the written amount since for each sync write, metadata gets updated anyway.

docent · Dec 28, 2017

guletz said:
But is much better compared with raidz1, isin't it?

Try to check for each nvme disk what is phisical sector size:

Code:

cat /sys/block/sdX/queue/physical_block_size

And use for zpool atime=off !!! Then run again fio test!

Code:

root@vmc3-1:~# cat /sys/block/nvme0n1/queue/physical_block_size
512

Code:

root@vmc3-1:~# zfs set atime=off nvmepool/VMs

Code:

root@vmc3-1:/nvmepool/VMs# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/132.2MB/0KB /s] [0/33.9K/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1360: Thu Dec 28 16:16:51 2017
  write: io=4096.0MB, bw=110548KB/s, iops=27637, runt= 37941msec
  cpu          : usr=7.90%, sys=78.36%, ctx=96749, majf=0, minf=498
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=110548KB/s, minb=110548KB/s, maxb=110548KB/s, mint=37941msec, maxt=37941msec

Code:

root@vmc3-1:~# smartctl -a /dev/nvme0n1 | grep Written
Data Units Written:                 898,471 [460 GB]
root@vmc3-1:~# smartctl -a /dev/nvme1n1 | grep Written
Data Units Written:                 1,292,900 [661 GB]
root@vmc3-1:~# smartctl -a /dev/nvme2n1 | grep Written
Data Units Written:                 725,999 [371 GB]
root@vmc3-1:~# smartctl -a /dev/nvme3n1 | grep Written
Data Units Written:                 1,606,246 [822 GB]

57 GB instead of 60 GB. I think this is a statistical error. I do not see any changes.
If zfs will write to SSD 8 times more than necessary, then SSD will live 8 times less.

guletz · Dec 28, 2017

docent said:
If zfs will write to SSD 8 times more than necessary, then SSD will live 8 times less.

Is like you say .... omg I will not be able to use this nvme for 10 years, but only 4 years

But if you belive what you say, you can go forward with any other non-zfs sistem. Any sistem have good parts and bad parts. As a side node I have a zfs server with 2 x Intel DataCenter SSD(perconaDB in one of 3 containers), who was installed in NOV2014(smart show 27001056 MiB for write). And of course I will replace this 2 SSD with other models next year. A 3-4 year use in a enterprise enviroment(24h/24) is more then sufficient ! Please, belive me that I do not regret even for a moment(zfs was smart enough to save my data in many ocassions - including here checksums errors).
If you don't need to know that your DBs are OK(via checksums), then do not use zfs.

Search

Search

Poor ZFS performance On Supermicro vs random ASUS board

kobuki

Renowned Member

docent

Renowned Member

guletz

Famous Member

We value your privacy