Poor ZFS performance On Supermicro vs random ASUS board

All types of disks often lie about their physical layout and sector size. I'd suggest keeping your record size the same as your DB page size in any case. atime=off is a good suggestion but I don't expect it to change the performance for this test or the written amount since for each sync write, metadata gets updated anyway.
 
But is much better compared with raidz1, isin't it?

Try to check for each nvme disk what is phisical sector size:

Code:
cat /sys/block/sdX/queue/physical_block_size

And use for zpool atime=off !!! Then run again fio test!
Code:
root@vmc3-1:~# cat /sys/block/nvme0n1/queue/physical_block_size
512
Code:
root@vmc3-1:~# zfs set atime=off nvmepool/VMs
Code:
root@vmc3-1:/nvmepool/VMs# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/132.2MB/0KB /s] [0/33.9K/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1360: Thu Dec 28 16:16:51 2017
  write: io=4096.0MB, bw=110548KB/s, iops=27637, runt= 37941msec
  cpu          : usr=7.90%, sys=78.36%, ctx=96749, majf=0, minf=498
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=110548KB/s, minb=110548KB/s, maxb=110548KB/s, mint=37941msec, maxt=37941msec
Code:
root@vmc3-1:~# smartctl -a /dev/nvme0n1 | grep Written
Data Units Written:                 898,471 [460 GB]
root@vmc3-1:~# smartctl -a /dev/nvme1n1 | grep Written
Data Units Written:                 1,292,900 [661 GB]
root@vmc3-1:~# smartctl -a /dev/nvme2n1 | grep Written
Data Units Written:                 725,999 [371 GB]
root@vmc3-1:~# smartctl -a /dev/nvme3n1 | grep Written
Data Units Written:                 1,606,246 [822 GB]
57 GB instead of 60 GB. I think this is a statistical error. I do not see any changes.
If zfs will write to SSD 8 times more than necessary, then SSD will live 8 times less.
 
If zfs will write to SSD 8 times more than necessary, then SSD will live 8 times less.

Is like you say .... omg I will not be able to use this nvme for 10 years, but only 4 years ;) But if you belive what you say, you can go forward with any other non-zfs sistem. Any sistem have good parts and bad parts. As a side node I have a zfs server with 2 x Intel DataCenter SSD(perconaDB in one of 3 containers), who was installed in NOV2014(smart show 27001056 MiB for write). And of course I will replace this 2 SSD with other models next year. A 3-4 year use in a enterprise enviroment(24h/24) is more then sufficient ! Please, belive me that I do not regret even for a moment(zfs was smart enough to save my data in many ocassions - including here checksums errors).
If you don't need to know that your DBs are OK(via checksums), then do not use zfs.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!