zfs performance testen

HBO · Sep 13, 2017

Guten Morgen,

ich habe so das Gefühl, dass unser ZFS SAS Pool irgendwie kaum Leistung hat (lesen/schreiben). Wie müsste ich einen Benchmark per "fio" angehen um hier mal ein aussagekräftiges Ergebnis zu bekommen? Eine Subvol hab ich schonmal angelegt.

wolfgang · Sep 13, 2017

Hi,

ich persönlich mach immer worsecase Scenarios mit 4KB und 4MB. Damit weist du was so ca das untere limit ist.
Besser werden kann es dann immer.
4KB testet wenn vorhanden den Cache.
4MB testet die SAS Platen.

write

Code:

fio --filename=/dev/zvol/tank02/test --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test

read

Code:

fio --filename=/dev/zvol/tank02/test --sync=1 --rw=read --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test

HBO · Sep 13, 2017

Ich habe nun einmal den SAS 7,2k rpm sowie und den SSD Pool getestet, die Unterschiede sind hier doch sehr mächtig. Bemerkbar macht sich vor allem der Unterschied beim Booten einer Linux VM, auf einem alten System mit einem Softraid 1 und SATA Enterprise Festplatten läuft der Start zBsp einer Datenbank wesentlich schneller durch.
Erstellt sind die Pools auf gleiche Art und Weise inklusive ZIL und ARC.

Code:

fio --filename=/dev/zvol/sata/test --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [f(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=50229: Wed Sep 13 10:09:48 2017
  write: io=22512KB, bw=192078B/s, iops=46, runt=120015msec
    clat (usec): min=165, max=1472.2K, avg=21319.90, stdev=59420.45
     lat (usec): min=165, max=1472.2K, avg=21320.30, stdev=59420.47
    clat percentiles (usec):
     |  1.00th=[  223],  5.00th=[  366], 10.00th=[  426], 20.00th=[ 3664],
     | 30.00th=[ 7456], 40.00th=[10176], 50.00th=[12864], 60.00th=[15424],
     | 70.00th=[18304], 80.00th=[22144], 90.00th=[29056], 95.00th=[45824],
     | 99.00th=[252928], 99.50th=[440320], 99.90th=[897024], 99.95th=[1044480],
     | 99.99th=[1466368]
    lat (usec) : 250=1.53%, 500=11.51%, 750=3.46%, 1000=0.16%
    lat (msec) : 2=0.57%, 4=3.45%, 10=18.57%, 20=35.38%, 50=20.74%
    lat (msec) : 100=2.03%, 250=1.58%, 500=0.66%, 750=0.23%, 1000=0.09%
    lat (msec) : 2000=0.05%
  cpu          : usr=0.03%, sys=0.79%, ctx=21159, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=5628/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=22512KB, aggrb=187KB/s, minb=187KB/s, maxb=187KB/s, mint=120015msec, maxt=120015msec
 
 
fio --filename=/dev/zvol/sata/test --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [412KB/0KB/0KB /s] [103/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=51264: Wed Sep 13 10:12:59 2017
  read : io=51816KB, bw=442137B/s, iops=107, runt=120007msec
    clat (usec): min=7, max=1459.3K, avg=9261.55, stdev=37367.30
     lat (usec): min=7, max=1459.3K, avg=9261.71, stdev=37367.33
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   11], 10.00th=[   12], 20.00th=[   13],
     | 30.00th=[   14], 40.00th=[   16], 50.00th=[  211], 60.00th=[ 4448],
     | 70.00th=[ 9152], 80.00th=[13760], 90.00th=[19840], 95.00th=[25728],
     | 99.00th=[90624], 99.50th=[252928], 99.90th=[536576], 99.95th=[675840],
     | 99.99th=[1044480]
    lat (usec) : 10=1.65%, 20=40.57%, 50=3.93%, 100=0.08%, 250=6.73%
    lat (usec) : 500=2.60%, 750=0.05%, 1000=0.02%
    lat (msec) : 2=0.36%, 4=3.06%, 10=13.68%, 20=17.46%, 50=8.14%
    lat (msec) : 100=0.73%, 250=0.44%, 500=0.38%, 750=0.08%, 1000=0.02%
    lat (msec) : 2000=0.02%
  cpu          : usr=0.03%, sys=0.64%, ctx=11051, majf=0, minf=6
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=12954/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=51816KB, aggrb=431KB/s, minb=431KB/s, maxb=431KB/s, mint=120007msec, maxt=120007msec

Code:

fio --filename=/dev/zvol/ssd/test --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [f(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=70302: Wed Sep 13 11:20:19 2017
  write: io=1606.9MB, bw=13703KB/s, iops=3425, runt=120078msec
    clat (usec): min=136, max=1146.4K, avg=289.92, stdev=3287.29
     lat (usec): min=136, max=1146.4K, avg=290.14, stdev=3287.29
    clat percentiles (usec):
     |  1.00th=[  157],  5.00th=[  169], 10.00th=[  177], 20.00th=[  187],
     | 30.00th=[  197], 40.00th=[  205], 50.00th=[  213], 60.00th=[  221],
     | 70.00th=[  231], 80.00th=[  247], 90.00th=[  286], 95.00th=[  322],
     | 99.00th=[  556], 99.50th=[ 1080], 99.90th=[ 3120], 99.95th=[12736],
     | 99.99th=[166912]
    lat (usec) : 250=81.71%, 500=17.07%, 750=0.53%, 1000=0.16%
    lat (msec) : 2=0.24%, 4=0.22%, 10=0.02%, 20=0.01%, 50=0.01%
    lat (msec) : 100=0.01%, 250=0.03%, 500=0.01%, 2000=0.01%
  cpu          : usr=0.94%, sys=20.31%, ctx=823949, majf=0, minf=1393
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=411355/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=1606.9MB, aggrb=13702KB/s, minb=13702KB/s, maxb=13702KB/s, mint=120078msec, maxt=120078msec

 
fio --filename=/dev/zvol/ssd/test --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [f(1)] [100.0% done] [2992KB/0KB/0KB /s] [748/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=72330: Wed Sep 13 11:26:15 2017
  read : io=1601.1MB, bw=13670KB/s, iops=3417, runt=120001msec
    clat (usec): min=7, max=336470, avg=290.48, stdev=3145.32
     lat (usec): min=7, max=336471, avg=290.65, stdev=3145.32
    clat percentiles (usec):
     |  1.00th=[   10],  5.00th=[  131], 10.00th=[  147], 20.00th=[  175],
     | 30.00th=[  195], 40.00th=[  203], 50.00th=[  215], 60.00th=[  227],
     | 70.00th=[  239], 80.00th=[  255], 90.00th=[  286], 95.00th=[  318],
     | 99.00th=[  498], 99.50th=[  788], 99.90th=[ 3088], 99.95th=[91648],
     | 99.99th=[185344]
    lat (usec) : 10=0.29%, 20=3.90%, 50=0.30%, 100=0.01%, 250=72.78%
    lat (usec) : 500=21.75%, 750=0.46%, 1000=0.16%
    lat (msec) : 2=0.15%, 4=0.14%, 10=0.03%, 20=0.01%, 50=0.01%
    lat (msec) : 100=0.03%, 250=0.02%, 500=0.01%
  cpu          : usr=1.02%, sys=12.31%, ctx=409240, majf=0, minf=1057
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=410089/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=1601.1MB, aggrb=13669KB/s, minb=13669KB/s, maxb=13669KB/s, mint=120001msec, maxt=120001msec

SMART Infos einer der SAS Festplatten:

Code:

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST2000NX0263
Revision:             K002
Compliance:           SPC-4
User Capacity:        2,000,398,934,016 bytes [2.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c5008fba55ef
Serial number:        S4609ZK40000K6155MSP
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Sep 13 11:29:20 2017 CEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

wolfgang · Sep 13, 2017

Das sind werte die ich von einen Mirror oder singel HDD erwarten würde.

Du schreibst du hast einen cache für das HDD System was verwendest du da als SSD?

HBO · Sep 13, 2017

Eine Intel DC aus der S3500er Reihe, hier die SMART Infos. Allerdings besteht der SAS Pool aus 6 der Festplatten in 3 Mirrors.

Code:

Model Family:     Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model:     INTEL SSDSC2BX480G4
Serial Number:    BTHC71450CT9480MGN
LU WWN Device Id: 5 5cd2e4 14db8d54f
Firmware Version: G2010150
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Sep 13 11:54:11 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

wolfgang · Sep 13, 2017

Das ist sehr komisch.
Hast du auf der Intel SSD noch andere Partitionen außer dem ZIL und L2ARC für dem SATA pool?

HBO · Sep 13, 2017

Ja, und zwar ZIL und ARC für den SSD Pool:

Code:

Device     Boot     Start       End   Sectors   Size Id Type
/dev/sdi1            2048  41945087  41943040    20G 83 Linux
/dev/sdi2        41945088  83888127  41943040    20G 83 Linux
/dev/sdi3        83888128 937703087 853814960 407.1G  5 Extended
/dev/sdi5        83890176 503320575 419430400   200G 83 Linux
/dev/sdi6       503322624 937703087 434380464 207.1G 83 Linux

Code:

NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool   111G  8.19G   103G         -    16%     7%  1.00x  ONLINE  -
  mirror   111G  8.19G   103G         -    16%     7%
    sdg2      -      -      -         -      -      -
    sdh2      -      -      -         -      -      -
sata  5.44T  2.10T  3.34T         -    38%    38%  1.00x  ONLINE  -
  mirror  1.81T   717G  1.11T         -    38%    38%
    sda      -      -      -         -      -      -
    sdb      -      -      -         -      -      -
  mirror  1.81T   717G  1.11T         -    38%    38%
    sdc      -      -      -         -      -      -
    sdd      -      -      -         -      -      -
  mirror  1.81T   717G  1.11T         -    38%    38%
    sde      -      -      -         -      -      -
    sdf      -      -      -         -      -      -
log      -      -      -         -      -      -
  sdi2  19.9G  41.9M  19.8G         -    87%     0%
cache      -      -      -         -      -      -
  sdi6   207G   128G  78.8G         -     0%    61%
ssd   888G   356G   532G         -    44%    40%  1.00x  ONLINE  -
  mirror   444G   178G   266G         -    43%    40%
    sdj      -      -      -         -      -      -
    sdk      -      -      -         -      -      -
  mirror   444G   178G   266G         -    45%    40%
    sdl      -      -      -         -      -      -
    sdm      -      -      -         -      -      -
log      -      -      -         -      -      -
  sdi1  19.9G  16.5M  19.9G         -    13%     0%
cache      -      -      -         -      -      -
  sdi5   200G   108G  92.3G         -     0%    53%

wolfgang · Sep 13, 2017

Es soll immer nur eine SSD für einen Pool verwendet werden.

Wenn ich das richtig sehe verwendest du MBR?
Bitte auf GPT unpartitionierteren.

Die ZIL und L2ARC kannst du aus dem pool zur Laufzeit rausnehmen.
Dann nochmal mit einem pool und SSD das ganze testen.

Auch ist die Frage ob der ZIL und L2ARC bei deinen SSD pool was bringt?
Mal Blechmarken ohne ZIL machen.

HBO · Sep 13, 2017

Okay, werde ich dann mal probieren.
ZIL und L2ARC Entfernen mit "zpool remove sata sdi1" bzw "zpool remove sata sdi5" sowie gleiches beim SSD Pool? Und das geht wirklich Problemlos während des Betriebes?

Ich hatte leider den Fehler gemacht die "sd(x)" als Namen zu nutzen, gibt es eine Möglichkeit auf die UUID der Platten im laufenden Betrieb bzw vor einem Reboot des Hosts zu wechseln? Konnte hierzu im Netz leider keine Infos finden.

wolfgang · Sep 13, 2017

HBO said:
ZIL und L2ARC Entfernen mit "zpool remove sata sdi1" bzw "zpool remove sata sdi5" sowie gleiches beim SSD Pool?

Ja.
Ja mach ich dauernd und alle Daten sind noch da. ;-)
Kann aber dauern je-nachdem was im ZIL drin ist.

HBO said:
Ich hatte leider den Fehler gemacht die "sd(x)"

Das ist bei heutigen udev egal wie du die benennst. Es gibt keinen unterschied mehr.

HBO · Sep 13, 2017

Da hat sich auf jedenfall schonmal was getan, ZIL und L2ARC von beiden Pools entfernt, die ZIL/L2ARC SSD mit GPT neu partitioniert und nur dem Pool mit den SAS Festplatten hinzu gefügt. Die Lese Performance ohne ZIL und L2ARC scheint identisch, die Schreib Performance aber wesentlich schlechter (lohnt sich hier nicht doch noch eine SSD für ZIL und L2ARC für den SSD Pool?

Hier die Werte und Daten:

Code:

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdi: 937703088 sectors, 447.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 26045249-0418-414F-8BA8-485CAF20FA3C
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 937703054
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048       134219775   64.0 GiB    8300  Linux filesystem
   2       134219776       937703054   383.1 GiB   8300  Linux filesystem

Code:

NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
sata  5.44T  2.10T  3.34T         -    38%    38%  1.00x  ONLINE  -
  mirror  1.81T   717G  1.11T         -    38%    38%
    sda      -      -      -         -      -      -
    sdb      -      -      -         -      -      -
  mirror  1.81T   717G  1.11T         -    39%    38%
    sdc      -      -      -         -      -      -
    sdd      -      -      -         -      -      -
  mirror  1.81T   718G  1.11T         -    38%    38%
    sde      -      -      -         -      -      -
    sdf      -      -      -         -      -      -
log      -      -      -         -      -      -
  sdi1  63.5G  29.3M  63.5G         -     0%     0%
cache      -      -      -         -      -      -
  sdi2   383G   650M   382G         -     0%     0%

Code:

fio --filename=/dev/zvol/sata/test --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/272KB/0KB /s] [0/68/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=57347: Wed Sep 13 14:40:05 2017
  write: io=45544KB, bw=388603B/s, iops=94, runt=120012msec
    clat (usec): min=204, max=355303, avg=10535.55, stdev=10955.41
     lat (usec): min=204, max=355304, avg=10535.99, stdev=10955.44
    clat percentiles (usec):
     |  1.00th=[  370],  5.00th=[  494], 10.00th=[  556], 20.00th=[  684],
     | 30.00th=[ 4256], 40.00th=[ 6752], 50.00th=[ 8896], 60.00th=[10816],
     | 70.00th=[13760], 80.00th=[16768], 90.00th=[21888], 95.00th=[26496],
     | 99.00th=[46336], 99.50th=[58624], 99.90th=[95744], 99.95th=[150528],
     | 99.99th=[199680]
    lat (usec) : 250=0.06%, 500=5.35%, 750=16.12%, 1000=0.82%
    lat (msec) : 2=1.08%, 4=5.77%, 10=27.04%, 20=30.74%, 50=12.23%
    lat (msec) : 100=0.69%, 250=0.09%, 500=0.01%
  cpu          : usr=0.10%, sys=1.66%, ctx=43002, majf=0, minf=182
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=11386/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=45544KB, aggrb=379KB/s, minb=379KB/s, maxb=379KB/s, mint=120012msec, maxt=120012msec


fio --filename=/dev/zvol/sata/test --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [368KB/0KB/0KB /s] [92/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=58018: Wed Sep 13 14:42:11 2017
  read : io=82220KB, bw=701487B/s, iops=171, runt=120021msec
    clat (usec): min=7, max=127074, avg=5836.63, stdev=8876.75
     lat (usec): min=7, max=127074, avg=5836.79, stdev=8876.85
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   10], 10.00th=[   11], 20.00th=[   12],
     | 30.00th=[   13], 40.00th=[   14], 50.00th=[   17], 60.00th=[ 4016],
     | 70.00th=[ 8512], 80.00th=[12352], 90.00th=[17536], 95.00th=[22400],
     | 99.00th=[35072], 99.50th=[47360], 99.90th=[63744], 99.95th=[80384],
     | 99.99th=[98816]
    lat (usec) : 10=4.06%, 20=50.02%, 50=1.44%, 100=0.01%, 250=0.03%
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.46%, 4=3.91%, 10=14.95%, 20=18.01%, 50=6.67%
    lat (msec) : 100=0.42%, 250=0.01%
  cpu          : usr=0.06%, sys=0.89%, ctx=13377, majf=0, minf=183
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=20555/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=82220KB, aggrb=685KB/s, minb=685KB/s, maxb=685KB/s, mint=120021msec, maxt=120021msec

Code:

fio --filename=/dev/zvol/ssd/test --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [f(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=55066: Wed Sep 13 14:32:50 2017
  write: io=181980KB, bw=1516.8KB/s, iops=379, runt=120034msec
    clat (usec): min=649, max=370862, avg=2635.12, stdev=16852.81
     lat (usec): min=649, max=370863, avg=2635.44, stdev=16852.83
    clat percentiles (usec):
     |  1.00th=[  852],  5.00th=[  900], 10.00th=[  924], 20.00th=[  956],
     | 30.00th=[  988], 40.00th=[ 1020], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1192], 80.00th=[ 1256], 90.00th=[ 1320], 95.00th=[ 1400],
     | 99.00th=[10176], 99.50th=[183296], 99.90th=[276480], 99.95th=[276480],
     | 99.99th=[366592]
    lat (usec) : 750=0.27%, 1000=33.74%
    lat (msec) : 2=64.04%, 4=0.78%, 10=0.16%, 20=0.05%, 50=0.01%
    lat (msec) : 100=0.38%, 250=0.45%, 500=0.12%
  cpu          : usr=0.21%, sys=4.56%, ctx=154917, majf=0, minf=381
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=45495/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=181980KB, aggrb=1516KB/s, minb=1516KB/s, maxb=1516KB/s, mint=120034msec, maxt=120034msec
 
 
fio --filename=/dev/zvol/ssd/test --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [f(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=55740: Wed Sep 13 14:34:57 2017
  read : io=15252MB, bw=130148KB/s, iops=32537, runt=120001msec
    clat (usec): min=6, max=168830, avg=30.00, stdev=261.33
     lat (usec): min=6, max=168830, avg=30.07, stdev=261.33
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[    9], 10.00th=[    9], 20.00th=[   10],
     | 30.00th=[   10], 40.00th=[   10], 50.00th=[   11], 60.00th=[   11],
     | 70.00th=[   11], 80.00th=[   11], 90.00th=[   15], 95.00th=[  217],
     | 99.00th=[  262], 99.50th=[  282], 99.90th=[  346], 99.95th=[  450],
     | 99.99th=[  684]
    lat (usec) : 10=12.56%, 20=78.43%, 50=0.10%, 100=0.01%, 250=7.38%
    lat (usec) : 500=1.49%, 750=0.02%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 100=0.01%, 250=0.01%
  cpu          : usr=3.11%, sys=39.92%, ctx=350227, majf=0, minf=357
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=3904474/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=15252MB, aggrb=130148KB/s, minb=130148KB/s, maxb=130148KB/s, mint=120001msec, maxt=120001msec

Mir sagen die Werte aus dem FIO Ergebnis leider auch nicht wirklich viel, ist das nun für ein Pool mit 6 SAS 7,2k rpm Festplatten in 3 Mirrors ein guter Wert oder immernoch zu wenig?

wolfgang · Sep 13, 2017

Das ist sehr schlecht(SATA) und dürfte nicht so schlecht sein.

SSD ist auch nicht toll, sind Consumer SSD?

Was verwendest du als Disk Kontroller.

Bitte schick mal

Code:

pveversion -v

HBO · Sep 13, 2017

Von den SSDs darf man tatsächlich nicht viel erwarten, das sind Sandisk Extreme Pro.
Diese sind direkt auf dem Board angeschlossen, verbaut ist hier dieses: http://www.supermicro.com.tw/products/motherboard/Xeon/C600/X10DRU-i_.cfm

Die SAS Festplatten hängen hinter diesem HBA: http://www.supermicro.com/products/accessories/addon/AOC-S3008L-L8e.cfm

Code:

pveversion -v
proxmox-ve: 5.0-15 (running kernel: 4.10.15-1-pve)
pve-manager: 5.0-23 (running version: 5.0-23/af4267bf)
pve-kernel-4.10.15-1-pve: 4.10.15-15
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-10
qemu-server: 5.0-12
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-5
libpve-storage-perl: 5.0-12
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-8
pve-qemu-kvm: 2.9.0-2
pve-container: 2.0-15
pve-firewall: 3.0-1
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve2
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90

wolfgang · Sep 14, 2017

Was mir noch aufgefallen ist deine Blockgröße von den HDD.
Sie ist 4kn was Heist du solltest eine ashift min 12 auf dem HDD Pool haben.

Code:

zpool get ashift sata
zfs get get sync,volblocksize,secondarycache,logbias sata/test

HBO · Sep 14, 2017

ashift ist auf 12, bevor Anschaffung des Systems hatte hier ich hier im Forum schonmal gefragt ob das klappt mit den 4kn Platten.
Was mich nun nur wundert ist die Blocksize von 8kb im zvol:

Code:

zfs get sync,volblocksize,secondarycache,logbias sata/test
NAME       PROPERTY        VALUE           SOURCE
sata/test  sync            standard        default
sata/test  volblocksize    8K              -
sata/test  secondarycache  all             default
sata/test  logbias         latency         default

*edit*
Die gleiche Blocksize hat auch jedes VM zvol.

HBO · Sep 14, 2017

sata/test neu angelegt mit "-b 4k" und siehe da:

Code:

fio --filename=/dev/zvol/sata/test --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [f(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=6302: Thu Sep 14 08:43:44 2017
  write: io=2219.9MB, bw=18936KB/s, iops=4734, runt=120001msec
    clat (usec): min=134, max=269527, avg=209.36, stdev=507.08
     lat (usec): min=134, max=269528, avg=209.58, stdev=507.09
    clat percentiles (usec):
     |  1.00th=[  149],  5.00th=[  159], 10.00th=[  167], 20.00th=[  179],
     | 30.00th=[  189], 40.00th=[  197], 50.00th=[  205], 60.00th=[  211],
     | 70.00th=[  219], 80.00th=[  229], 90.00th=[  245], 95.00th=[  266],
     | 99.00th=[  322], 99.50th=[  354], 99.90th=[  740], 99.95th=[ 1736],
     | 99.99th=[ 4448]
    lat (usec) : 250=91.78%, 500=8.07%, 750=0.05%, 1000=0.02%
    lat (msec) : 2=0.03%, 4=0.03%, 10=0.01%, 20=0.01%, 250=0.01%
    lat (msec) : 500=0.01%
  cpu          : usr=1.34%, sys=24.35%, ctx=1136498, majf=11, minf=537
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=568086/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=2219.9MB, aggrb=18936KB/s, minb=18936KB/s, maxb=18936KB/s, mint=120001msec, maxt=120001msec

fio --filename=/dev/zvol/sata/test --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=1 --runtime=120 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [f(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=7037: Thu Sep 14 08:46:04 2017
  read : io=141903MB, bw=1182.6MB/s, iops=302724, runt=120001msec
    clat (usec): min=0, max=3691, avg= 2.85, stdev= 3.85
     lat (usec): min=0, max=3691, avg= 2.88, stdev= 3.85
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    1], 10.00th=[    1], 20.00th=[    1],
     | 30.00th=[    1], 40.00th=[    1], 50.00th=[    2], 60.00th=[    2],
     | 70.00th=[    2], 80.00th=[    2], 90.00th=[   10], 95.00th=[   11],
     | 99.00th=[   12], 99.50th=[   13], 99.90th=[   16], 99.95th=[   18],
     | 99.99th=[   22]
    lat (usec) : 2=40.49%, 4=44.00%, 10=3.07%, 20=12.41%, 50=0.02%
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
    lat (msec) : 2=0.01%, 4=0.01%
  cpu          : usr=17.18%, sys=82.79%, ctx=452, majf=0, minf=356
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=36327251/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=141903MB, aggrb=1182.6MB/s, minb=1182.6MB/s, maxb=1182.6MB/s, mint=120001msec, maxt=120001msec

Ich war eigentlich der festen Meinung im Proxmox GUI die Sectore Size beim einbinden des Pools auf 4k gesetzt zu haben, aber wohl nicht.
Soweit ich das nun richtig gesehen habe ist die volblocksize readonly, daher nur bei neu Erstellung relevant. Kann ich nun von einer VM ein Backup machen, die VM löschen und das Backup wieder einspielen sodass hier 4k genutzt werden?

wolfgang · Sep 14, 2017

Das ist sehr seltsam und sollte so nicht auftreten.

Wegen Backup.
Du kannst backup machen und dann die Blocksize im Dataceter im Storage auf 4k setzen.
Dann wieder alle VM einspielen.

HBO · Sep 14, 2017

Dann vielen Dank, auf die Sectorsize wäre ich nun glaube erst nach Tagen gekommen.

HBO · Sep 25, 2017

Ich muss das hier nun doch nochmal hoch holen.
Der ZFS Pool mit den SAS Platten läuft nun einwandfrei. Packe ich aber Last (hier reicht zBsp eine Installation von Windows Server oder VM Backup einspielen) auf den SSD Pool schmieren diverse VMs (nicht alle die auf dem Pool liegen) mit einer Kernel Panic (Bild) ab.

Nochmal zum Setup:
Mainboard: Supermicro X10DRU-i+
Anschluss der SSDs: Direkt am Board, kein HBA
SSD Typ: SanDisk Extreme Pro SSD 480GB Sata III
ZIL/L2ARC: macht keinen Unterschied ob vorhanden oder nicht

Liegt es nun wirklich an den Consumer SSDs oder vielleicht doch am Anschluss direkt auf dem Board SATA Ports ohne HBA? Die SAS Festplatten hängen an einem LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) im HBA Modus.

zfs performance testen

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Active Member

Proxmox Retired Staff

Active Member

Active Member