Poor performance with ZFS

cpzengel · Feb 10, 2017

Same Problem here with
C610/X99 Controller

Any Solution?

chalan · Sep 24, 2017

if i have poor r/w with zfs raid 10 4x1TB sata2 disks, i should buy another 1TB SSD disk and add it for ZIL?

cpzengel · Sep 24, 2017

chalan said:
if i have poor r/w with zfs raid 10 4x1TB sata2 disks, i should buy another 1TB SSD disk and add it for ZIL?

Proxmox 5? There is a Cron.D scrub!!!
Check zpool status

chalan · Sep 24, 2017

yes proxmox 5 fresh install with 4 1TB WD RED sata2 disks and the performace is very poor. copiing from one hp sata2 drive to rpool 20MB/s

direct in pve not in VM!!!

Code:

  pool: rpool
 state: ONLINE
  scan: none requested
config:

    NAME                                                STATE     READ WRITE CKSUM
    rpool                                               ONLINE       0     0     0
      mirror-0                                          ONLINE       0     0     0
        ata-WDC_WD10EFRX-68PJCN0_WD-WCC4J2021886-part2  ONLINE       0     0     0
        ata-WDC_WD10EFRX-68JCSN0_WD-WMC1U6546808-part2  ONLINE       0     0     0
      mirror-1                                          ONLINE       0     0     0
        ata-WDC_WD10EFRX-68FYTN0_WD-WCC4J2AK75T9        ONLINE       0     0     0
        ata-WDC_WD10EFRX-68FYTN0_WD-WCC4J1JE0SFR        ONLINE       0     0     0

errors: No known data errors

cpzengel · Sep 24, 2017

What says arcstat?
Enable Drive Writecache
Try to offline one drive after another
Install Smartmontools

smartctl --test=short /dev/sda
smartctl --test=short /dev/sdb
smartctl --test=short /dev/sdc
smartctl --test=short /dev/sdd

—— wait three minutes

clear
smartctl --all /dev/sda | grep Short
smartctl --all /dev/sdb | grep Short
smartctl --all /dev/sdc | grep Short
smartctl --all /dev/sdd | grep Short

chalan · Oct 9, 2017

Code:

root@pve-klenova:~# smartctl --all /dev/sda | grep Short
Short self-test routine
# 1  Short offline       Completed without error       00%     26667         -
root@pve-klenova:~# smartctl --all /dev/sdb | grep Short
Short self-test routine
# 1  Short offline       Completed without error       00%     43111         -
# 4  Short offline       Completed without error       00%     15243         -
root@pve-klenova:~# smartctl --all /dev/sdc | grep Short
Short self-test routine
# 1  Short offline       Completed without error       00%     26577         -
# 3  Short offline       Completed without error       00%     17443         -
root@pve-klenova:~# smartctl --all /dev/sdd | grep Short
Short self-test routine
# 1  Short offline       Completed without error       00%     26573         -
# 4  Short offline       Completed without error       00%     17443         -
root@pve-klenova:~# smartctl --all /dev/sde | grep Short
Short self-test routine
# 1  Short offline       Completed without error       00%       465         -
root@pve-klenova:~# smartctl --all /dev/sdf | grep Short
Short self-test routine
# 1  Short offline       Completed without error       00%       465         -

LnxBil · Oct 9, 2017

I'd also monitor the average io completion time with iostat on heavy load. Normally one bad (non-enterprise-grade) drive can slow down the whole pool.

chalan · Oct 9, 2017

the drives are 4x WD RED 1TB NAS... how can i check every single hdd from the zfs pool to find out which one is slow?

LnxBil · Oct 10, 2017

chalan said:
the drives are 4x WD RED 1TB NAS... how can i check every single hdd from the zfs pool to find out which one is slow?

Stress your pool and look at the iostat times, here an example:

Code:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,06    0,00    0,09   17,38    0,00   82,47

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
...
sda               0,00     0,00    0,00    0,60     0,00     2,40     8,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               0,20     0,00    0,60    0,00   245,60     0,00   818,67     0,02   25,33   25,33    0,00   9,33   0,56
sdc               0,20     0,00    0,60    0,00   302,40     0,00  1008,00     0,01   16,00   16,00    0,00   9,33   0,56
sde               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdd               0,20     0,00    0,60    0,00   254,40     0,00   848,00     0,00    6,67    6,67    0,00   5,33   0,32
sdq               0,00     0,00    1,40    0,00     7,20     0,00    10,29     0,01    9,14    9,14    0,00   9,14   1,28
sdn               0,00     0,00    1,00    0,00     6,40     0,00    12,80     0,00    2,40    2,40    0,00   2,40   0,24
sdk               0,00     0,00    6,60    0,00    85,60     0,00    25,94     0,00    0,61    0,61    0,00   0,24   0,16
sdh               0,00     0,00    2,80    0,00    66,40     0,00    47,43     0,00    0,86    0,86    0,00   0,57   0,16
sdo               0,00     0,00    4,00    0,00    92,00     0,00    46,00     0,01    2,60    2,60    0,00   2,40   0,96
sdf               0,00     0,00    3,00    0,00    66,40     0,00    44,27     0,01    3,20    3,20    0,00   3,20   0,96
sdi               0,00     0,00    0,20    0,00     0,80     0,00     8,00     0,00    0,00    0,00    0,00   0,00   0,00
sdl               0,00     0,00    4,00    0,00    92,80     0,00    46,40     0,01    3,40    3,40    0,00   1,40   0,56
sdj               0,00     0,00    5,00    0,00    83,20     0,00    33,28     0,01    2,40    2,40    0,00   0,80   0,40
sdg               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdm               0,00     0,00    7,60    0,00   187,20     0,00    49,26     0,07    9,16    9,16    0,00   3,47   2,64
sdp               0,00     0,00    3,40   13,00   117,60   342,40    56,10    14,73 1005,85 1588,47  853,48  60,98 100,00
...

The drive sdp was the bottleneck and took too much time to complete operation. It still did, but the performance was very, very bad.

chalan · Oct 15, 2017

iostat during vm clooning

Code:

Every 1.0s: iostat                         pve-klenova: Sun Oct 15 00:15:46 2017

Linux 4.10.17-2-pve (pve-klenova)       10/15/2017      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.32    0.00    3.69    1.68    0.00   79.30

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              47.76      1082.84       456.83  685286010  289110920
sdb              48.15      1082.20       456.83  684883290  289110920
sdc               1.36        21.45        79.39   13572168   50245208
sdd               1.39        21.41        79.55   13550700   50346472
sde              52.14      1085.24       477.21  686810800  302009308
sdf              51.87      1085.39       477.21  686903024  302009308
sdh               0.00         0.02         0.00      15176          0
zd0              50.54       237.90       145.61  150560533   92152132
zd16              0.00         0.00         0.00         20          0
zd32             46.41         1.92       181.67    1216540  114970932
zd48              1.92         1.36         6.58     862116    4164876
zd64              2.18         4.04         4.68    2558668    2959672
zd80             18.69        18.06        73.96   11427076   46808004

is it ok?

GadgetPig · Oct 15, 2017

Hi Chalan, you need to install package "sysstat" first
#apt-get install sysstat
run a clone operation and then run:
#iostat -x 5
then watch for a disk with any unusually high values under these columns
| await | r_await | w_await | % util

chalan · Oct 25, 2017

sda
sdb
sdf
sdg

are the drives in raid 10 seems to be similar

Code:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               1,40     0,00  210,60  153,80 13179,20  9620,80   125,14     4,44   12,74   20,03    2,75   2,58  94,00
sdc               1,20     0,00  119,40  173,80  8560,00 11289,60   135,40     5,51   19,56   43,46    3,13   3,38  99,12
sdd               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sde               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdf               1,20     0,00  164,40  222,60 10978,40 18026,40   149,90     2,61    6,76   14,16    1,29   1,89  73,20
sdg               0,80     0,00  136,00  223,80  7923,20 18026,40   144,24     2,13    5,97   14,01    1,09   1,86  67,04
zd0               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
zd16              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
zd32              0,00     0,00    0,00  146,40     0,00   584,80     7,99     0,98    6,77    0,00    6,77   6,68  97,76
zd48              0,00     0,00    0,00    3,00     0,00     9,60     6,40     0,19   62,93    0,00   62,93  62,93  18,88
zd64              0,00     0,00    0,00    2,00     0,00     8,00     8,00     1,08  426,00    0,00  426,00 220,40  44,08
zd80              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00

root@pve-klenova:~# arcstat
time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
14:15:05 30 14 46 1 5 13 100 1 8 4.0G 4.0G

GadgetPig · Oct 25, 2017

@chalan, It looks like drive "zd64 " might be the bottleneck. I The w_await=426,00 which seems a little high.

"w_await = The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them."

couple things to try:

Try replacing zd64 drive data cable and retest
If still the same, try moving drive zd64 data cable to another spare SAS/SATA port and retest
If all else fails, replace zd64

Nemesiz · Oct 25, 2017

@GadgetPig zd64 is not a hdd or ssd. Its ZFS pool vzol partition

chalan · Oct 25, 2017

so why i have such slow perf?

root@pve-klenova:~# pveperf
CPU BOGOMIPS: 38401.52
REGEX/SECOND: 430906
HD SIZE: 654.48 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 53.92
DNS EXT: 196.53 ms
DNS INT: 18.91 ms (elson.sk)

Nemesiz · Oct 25, 2017

@chalan If you want to boost

#1 solution
zfs set sync=disabled zfs_pool

#2 solution
Add super fast and durable SSD as external ZFS ZIL

#3 solution
Do not use ZFS

Nemesiz · Oct 25, 2017

Example - SSD server

ZFS sync standard

# pveperf
CPU BOGOMIPS: 19154.70
REGEX/SECOND: 1491489
HD SIZE: 33.58 GB (rpool/debian)
FSYNCS/SECOND: 2553.13
DNS EXT: 54.12 ms

ZFS sync disabled

# pveperf
CPU BOGOMIPS: 19154.70
REGEX/SECOND: 1501062
HD SIZE: 33.58 GB (rpool/debian)
FSYNCS/SECOND: 16839.19
DNS EXT: 54.74 ms

chalan · Oct 25, 2017

1.) i have 4xWD RED 1TB drives in raid10, do i really need another SSD drive? If so how BIG?

2.) Is the problem in my server LOW RAM or is THIS a standard performance?

3.) when I disable ZFS SYNC, can I expect an unstable system or data loss in case of power failure and so on?

4.) if i disable sync how can i enable it again? and how can i see if sync is enabled or disabled?

Nemesiz · Oct 25, 2017

chalan said:
1.) i have 4xWD RED 1TB drives in raid10, do i really need another SSD drive? If so how BIG?

Can you print? #zpool status

If you use SSD as ZFS ZIL you need big as = 5 seconds * SSD write speed
Like 5s*500MiB/s = 2.5 GiB

Don`t forget ZFS write data to ZIL linear from beginning every time. IT can eat SSD fast.

2.) Is the problem in my server LOW RAM or is THIS a standard performance?

It`s depends on what your server is doing with data.

3.) when I disable ZFS SYNC, can I expect an unstable system or data loss in case of power failure and so on?

ZFS is very stable and protected with data but more you can read here
https://forum.proxmox.com/threads/p...-ssd-drives-sync-parameter.31130/#post-155543
https://forum.proxmox.com/threads/high-io-load-txg_sync.35566/#post-174995

4.) if i disable sync how can i enable it again? and how can i see if sync is enabled or disabled?

You can set sync option in every zvol.

To see sync parameter everywhere -> # zfs get sync
To specific zvol/file system -> # zfs get sync pool/filesystem
To change -> # zfs set sync=[ standard | always | disabled ] pool/filesystem

Almost all ZFS parameters are tuneable live.

GadgetPig · Oct 26, 2017

Nemesiz said:
@GadgetPig zd64 is not a hdd or ssd. Its ZFS pool vzol partition

@Nemesiz Thanks for clarification!

And thanks for the ZFS sync info, I bookmarked those links

Poor performance with ZFS

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Distinguished Member

Member

Distinguished Member

Member

Member

Member

Member

Renowned Member

Member

Renowned Member

Renowned Member

Member

Renowned Member

Member

We value your privacy