Strange ZFS behaviour

gfreeman

Member
Mar 25, 2019
1
0
21
36
Hi!

I have a single server:
DELL PowerEdge R630
2x Intel(R) Xeon(R) CPU E5-2697 v4
256 GB RAM
2x INTEL DC P3605 HHHL NVME

PVE: pve-manager/6.4-15/af7986e6 (running kernel: 5.4.203-1-pve)

I make mirror ZFS over NVME using
zpool create -o ashift=12 nvtank mirror /dev/nvme0n1 /dev/nvme1n1

Bash:
NAME    PROPERTY              VALUE                  SOURCE
nvtank  type                  filesystem             -
nvtank  creation              Sun Aug 27  0:55 2023  -
nvtank  used                  8.80G                  -
nvtank  available             1.40T                  -
nvtank  referenced            96K                    -
nvtank  compressratio         1.14x                  -
nvtank  mounted               yes                    -
nvtank  quota                 none                   default
nvtank  reservation           none                   default
nvtank  recordsize            128K                   default
nvtank  mountpoint            /nvtank                default
nvtank  sharenfs              off                    default
nvtank  checksum              on                     default
nvtank  compression           lz4                    local
nvtank  atime                 on                     default
nvtank  devices               on                     default
nvtank  exec                  on                     default
nvtank  setuid                on                     default
nvtank  readonly              off                    default
nvtank  zoned                 off                    default
nvtank  snapdir               hidden                 default
nvtank  aclmode               discard                default
nvtank  aclinherit            restricted             default
nvtank  createtxg             1                      -
nvtank  canmount              on                     default
nvtank  xattr                 on                     default
nvtank  copies                1                      default
nvtank  version               5                      -
nvtank  utf8only              off                    -
nvtank  normalization         none                   -
nvtank  casesensitivity       sensitive              -
nvtank  vscan                 off                    default
nvtank  nbmand                off                    default
nvtank  sharesmb              off                    default
nvtank  refquota              none                   default
nvtank  refreservation        none                   default
nvtank  guid                  11814277247284247473   -
nvtank  primarycache          all                    default
nvtank  secondarycache        all                    default
nvtank  usedbysnapshots       0B                     -
nvtank  usedbydataset         96K                    -
nvtank  usedbychildren        8.80G                  -
nvtank  usedbyrefreservation  0B                     -
nvtank  logbias               latency                default
nvtank  objsetid              51                     -
nvtank  dedup                 on                     local
nvtank  mlslabel              none                   default
nvtank  sync                  standard               default
nvtank  dnodesize             legacy                 default
nvtank  refcompressratio      1.00x                  -
nvtank  written               96K                    -
nvtank  logicalused           9.62G                  -
nvtank  logicalreferenced     40K                    -
nvtank  volmode               default                default
nvtank  filesystem_limit      none                   default
nvtank  snapshot_limit        none                   default
nvtank  filesystem_count      none                   default
nvtank  snapshot_count        none                   default
nvtank  snapdev               hidden                 default
nvtank  acltype               off                    default
nvtank  context               none                   default
nvtank  fscontext             none                   default
nvtank  defcontext            none                   default
nvtank  rootcontext           none                   default
nvtank  relatime              off                    default
nvtank  redundant_metadata    all                    default
nvtank  overlay               on                     default
nvtank  encryption            off                    default
nvtank  keylocation           none                   default
nvtank  keyformat             none                   default
nvtank  pbkdf2iters           0                      default
nvtank  special_small_blocks  0                      default

Then i create Windows guest, using Virtio SCSI:
Code:
bootdisk: scsi0
cores: 16
memory: 8192
net0: virtio=02:D3:BC:03:0C:2D,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
scsi0: zpool:vm-100-disk-0,cache=writethrough,discard=on,size=32G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=7c30a65a-3b7a-471a-bd99-ea0a148ff576
sockets: 2
vmgenid: 8c740fb9-4d67-41b7-a10e-d122107e8a75

And run CrystalDiskMark on guest VM:

Code:
-----------------------------------------------------------------------
CrystalDiskMark 6.0.2 x64 (C) 2007-2018 hiyohiyo
                          Crystal Dew World : https://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 1) :  2429.333 MB/s
  Sequential Write (Q= 32,T= 1) :   241.827 MB/s
  Random Read 4KiB (Q=  8,T= 8) :   103.838 MB/s [  25351.1 IOPS]
 Random Write 4KiB (Q=  8,T= 8) :    28.708 MB/s [   7008.8 IOPS]
  Random Read 4KiB (Q= 32,T= 1) :    96.823 MB/s [  23638.4 IOPS]
 Random Write 4KiB (Q= 32,T= 1) :    25.029 MB/s [   6110.6 IOPS]
  Random Read 4KiB (Q=  1,T= 1) :    28.862 MB/s [   7046.4 IOPS]
 Random Write 4KiB (Q=  1,T= 1) :     7.819 MB/s [   1908.9 IOPS]

  Test : 16384 MiB [C: 33.3% (10.5/31.5 GiB)] (x5)  [Interval=5 sec]
  Date : 2023/08/27 1:35:39
    OS : Windows Server 2016 Server Standard (full installation) [10.0 Build 17763] (x64)

Its very, very, very bad results.

Ok. Next i destroy storage and create LVM storage over NVME, create VM and run CDM again:

Code:
-----------------------------------------------------------------------
CrystalDiskMark 6.0.2 x64 (C) 2007-2018 hiyohiyo
                          Crystal Dew World : https://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 1) :  8407.806 MB/s
  Sequential Write (Q= 32,T= 1) :  9090.689 MB/s
  Random Read 4KiB (Q=  8,T= 8) :   650.442 MB/s [ 158799.3 IOPS]
 Random Write 4KiB (Q=  8,T= 8) :   656.345 MB/s [ 160240.5 IOPS]
  Random Read 4KiB (Q= 32,T= 1) :   433.450 MB/s [ 105822.8 IOPS]
 Random Write 4KiB (Q= 32,T= 1) :   433.612 MB/s [ 105862.3 IOPS]
  Random Read 4KiB (Q=  1,T= 1) :    49.328 MB/s [  12043.0 IOPS]
 Random Write 4KiB (Q=  1,T= 1) :    45.037 MB/s [  10995.4 IOPS]

  Test : 16384 MiB [D: 0.2% (0.1/32.0 GiB)] (x5)  [Interval=5 sec]
  Date : 2023/08/25 23:06:59
    OS : Windows Server 2016 Server Standard (full installation) [10.0 Build 17763] (x64)

It is very GOOD result, but LVM in PVE not supporting mirrors.

What am i doing wrong with this zfs???
I broke my brain....

PS: i make test on raw zfs:
Bash:
TEST: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [m(1)][11.7%][r=57.5MiB/s,w=57.1MiB/s][r=14.7k,w=14.6k IOPS][eta 00m:53s]
Jobs: 1 (f=1): [m(1)][21.7%][r=50.8MiB/s,w=50.5MiB/s][r=12.0k,w=12.9k IOPS][eta 00m:47s]
Jobs: 1 (f=1): [m(1)][31.7%][r=28.9MiB/s,w=28.0MiB/s][r=7394,w=7412 IOPS][eta 00m:41s] 
Jobs: 1 (f=1): [m(1)][41.7%][r=53.3MiB/s,w=52.2MiB/s][r=13.7k,w=13.4k IOPS][eta 00m:35s]
Jobs: 1 (f=1): [m(1)][51.7%][r=52.6MiB/s,w=51.2MiB/s][r=13.5k,w=13.1k IOPS][eta 00m:29s]
Jobs: 1 (f=1): [m(1)][61.7%][r=46.2MiB/s,w=46.5MiB/s][r=11.8k,w=11.9k IOPS][eta 00m:23s]
Jobs: 1 (f=1): [m(1)][71.7%][r=49.5MiB/s,w=50.2MiB/s][r=12.7k,w=12.8k IOPS][eta 00m:17s]
Jobs: 1 (f=1): [m(1)][81.7%][r=44.2MiB/s,w=43.8MiB/s][r=11.3k,w=11.2k IOPS][eta 00m:11s]
Jobs: 1 (f=1): [m(1)][91.7%][r=29.0MiB/s,w=28.9MiB/s][r=7435,w=7387 IOPS][eta 00m:05s] 
Jobs: 1 (f=1): [m(1)][100.0%][r=30.1MiB/s,w=30.2MiB/s][r=7707,w=7744 IOPS][eta 00m:00s] 
TEST: (groupid=0, jobs=1): err= 0: pid=35709: Mon Aug 28 14:12:09 2023
  read: IOPS=11.8k, BW=45.9MiB/s (48.2MB/s)(2755MiB/60000msec)
    slat (usec): min=2, max=562, avg= 5.67, stdev= 9.84
    clat (nsec): min=531, max=165951, avg=775.82, stdev=821.75
     lat (usec): min=2, max=586, avg= 6.53, stdev= 9.95
    clat percentiles (nsec):
     |  1.00th=[  548],  5.00th=[  580], 10.00th=[  612], 20.00th=[  652],
     | 30.00th=[  684], 40.00th=[  692], 50.00th=[  708], 60.00th=[  732],
     | 70.00th=[  788], 80.00th=[  820], 90.00th=[  860], 95.00th=[  940],
     | 99.00th=[ 1304], 99.50th=[ 1592], 99.90th=[17280], 99.95th=[17792],
     | 99.99th=[18816]
   bw (  KiB/s): min=14496, max=63784, per=100.00%, avg=47039.32, stdev=11106.04, samples=119
   iops        : min= 3624, max=15946, avg=11759.81, stdev=2776.50, samples=119
  write: IOPS=11.7k, BW=45.7MiB/s (47.9MB/s)(2742MiB/60000msec); 0 zone resets
    slat (usec): min=5, max=1581, avg= 9.99, stdev=10.66
    clat (nsec): min=556, max=216367, avg=822.57, stdev=856.10
     lat (usec): min=5, max=1603, avg=10.92, stdev=10.80
    clat percentiles (nsec):
     |  1.00th=[  588],  5.00th=[  628], 10.00th=[  652], 20.00th=[  692],
     | 30.00th=[  724], 40.00th=[  740], 50.00th=[  756], 60.00th=[  780],
     | 70.00th=[  820], 80.00th=[  852], 90.00th=[  900], 95.00th=[ 1004],
     | 99.00th=[ 1400], 99.50th=[ 1720], 99.90th=[17536], 99.95th=[18048],
     | 99.99th=[18816]
   bw (  KiB/s): min=14184, max=61936, per=100.00%, avg=46806.80, stdev=10850.90, samples=119
   iops        : min= 3546, max=15484, avg=11701.66, stdev=2712.72, samples=119
  lat (nsec)   : 750=55.05%, 1000=40.63%
  lat (usec)   : 2=3.96%, 4=0.11%, 10=0.02%, 20=0.23%, 50=0.01%
  lat (usec)   : 100=0.01%, 250=0.01%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=20, max=30565, avg=71.31, stdev=200.41
    sync percentiles (nsec):
     |  1.00th=[   22],  5.00th=[   29], 10.00th=[   32], 20.00th=[   34],
     | 30.00th=[   35], 40.00th=[   37], 50.00th=[   69], 60.00th=[   80],
     | 70.00th=[   86], 80.00th=[   95], 90.00th=[  118], 95.00th=[  139],
     | 99.00th=[  201], 99.50th=[  258], 99.90th=[  386], 99.95th=[  470],
     | 99.99th=[15680]
  cpu          : usr=6.84%, sys=42.06%, ctx=1404063, majf=0, minf=298
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=705388,701854,0,1407238 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=45.9MiB/s (48.2MB/s), 45.9MiB/s-45.9MiB/s (48.2MB/s-48.2MB/s), io=2755MiB (2889MB), run=60000-60000msec
  WRITE: bw=45.7MiB/s (47.9MB/s), 45.7MiB/s-45.7MiB/s (47.9MB/s-47.9MB/s), io=2742MiB (2875MB), run=60000-60000msec
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!