ZFS 2x2TB HDD and SSD LOG + L2ARC - Slow writes, high IO Wait ? Need your advice

Please re-do a file copy test (not dd /dev/zero because you are using compression) and capture the output of iostat during that operation.

Copying .iso file
Code:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu                        -sz   await r_await w_await  svctm  %util


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.17    0.00   11.84   36.61    0.00   47.38


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu                        -sz   await r_await w_await  svctm  %util


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.07    0.00    1.51   26.72    0.00   66.70


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu                        -sz   await r_await w_await  svctm  %util


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.40    0.00    1.08   18.04    0.00   78.48


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu                        -sz   await r_await w_await  svctm  %util


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.85    0.00    1.59   27.12    0.00   66.43


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu                        -sz   await r_await w_await  svctm  %util


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.04    0.00    1.95   35.03    0.00   52.98




               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storagepool   802G  1.03T     19  1.88K   152K   104M
  mirror     802G  1.03T     19  1.88K   152K   104M
    sdc         -      -      9    909   188K   107M
    sdd         -      -      9    887   356K   104M
logs            -      -      -      -      -      -
  mirror    5.07M  24.9G      0      1      0   136K
    sda3        -      -      0      1      0   136K
    sdb3        -      -      0      1      0   136K
cache           -      -      -      -      -      -
  sda4      22.1G  52.9G      3    394  6.00K  47.7M
  sdb4      22.5G  52.5G      5     65  11.0K  5.55M
----------  -----  -----  -----  -----  -----  -----

At this moment VM-s are not using disks as they do at day IO Wait is going from 5 to 15% and when I do this copy of file it goes up to 35 %.

1.In this server I have VM-s that have CPanel and other Linux Vm-s that use MYSQL ? Can mysql cause such IO wait and its hard to make proper block size because MYSQL now runs as service mostly in every linux machine so I cant separate it.
2.I read that disabling atime=off increases read or writes a bit, can this cause any problem in my mail servers in VM-s etc
3.From my stats what do you think is zil not used as it should ? Sometimes in windows servers when you copy paste something Ram memory of PC gets used as buffer then it throws data step by step to drive...I am not seeing anything like that with ZIL LOG and I thought it should work something like that.

As you see in this picture below in VM COPY is very slow sometimes it goes 0 then maximum 15-20 mb but in linux you saw that with copy paste in /storagepool it goes up to 112 MB/s and here you have also in this code the stats of zil drives etc when this copy is slow in VM

copying in vm.PNG

How it looks when I make such copy
Code:
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storagepool   805G  1.03T      0  3.39K      0  26.2M
  mirror     805G  1.03T      0  3.39K      0  26.2M
    sdc         -      -      0    291      0  29.7M
    sdd         -      -      0    264      0  26.3M
logs            -      -      -      -      -      -
  mirror    4.41M  24.9G      0      0      0      0
    sda3        -      -      0      0      0      0
    sdb3        -      -      0      0      0      0
cache           -      -      -      -      -      -
  sda4      25.5G  49.5G  4.22K      0  33.7M      0
  sdb4      26.0G  49.0G  2.01K     12  16.1M  58.0K
----------  -----  -----  -----  -----  -----  -----
 
1.In this server I have VM-s that have CPanel and other Linux Vm-s that use MYSQL ? Can mysql cause such IO wait and its hard to make proper block size because MYSQL now runs as service mostly in every linux machine so I cant separate it.

I run Mysql in LXC with no problem. 70% write 30% read. I use ZFS with cheksum so i turn off some mysql functions.

Code:
#
# * InnoDB
#

# file settings
innodb_file_per_table = 1
innodb_file_format = BARRACUDA
innodb_file_format_max = BARRACUDA


innodb_flush_method = O_DIRECT

innodb_buffer_pool_size = 2G
innodb_buffer_pool_instances = 1

[B]innodb_doublewrite = 0[/B]

innodb_log_buffer_size = 1G
innodb_log_file_size = 1G
[B]innodb_checksum_algorithm=NONE[/B]

innodb_io_capacity = 2000 # def 200
innodb_use_native_aio = false

innodb_write_io_threads = 10 # def 4
innodb_read_io_threads = 10 # def 4

innodb_open_files = 3000

innodb_max_dirty_pages_pct = 75

[B]innodb_flush_log_at_trx_commit = 2[/B]


innodb_old_blocks_time = 2000


2.I read that disabling atime=off increases read or writes a bit, can this cause any problem in my mail servers in VM-s etc

atime makes ZFS do more writes. So turning it of decrease writes.

3.From my stats what do you think is zil not used as it should ? Sometimes in windows servers when you copy paste something Ram memory of PC gets used as buffer then it throws data step by step to drive...I am not seeing anything like that with ZIL LOG and I thought it should work something like that.

ZIL is used for sync writes only.


Another step is to check hdparm ( -i, -t )
 
iostat still doesn't show /dev/sd? activity at all.

Please try with
Code:
iostat -kxz 1

Code:
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storagepool   796G  1.04T  2.09K  3.49K  13.0M  24.6M
  mirror     796G  1.04T  2.09K  3.48K  13.0M  24.3M
    sdc         -      -  1.03K  1.36K  6.74M  31.2M
    sdd         -      -  1.06K  1.24K  7.29M  24.3M
logs            -      -      -      -      -      -
  mirror    77.1M  24.8G      0      3      0   332K
    sda3        -      -      0      3      0   332K
    sdb3        -      -      0      3      0   332K
cache           -      -      -      -      -      -
  sda4      28.5G  46.5G     17      0  29.5K      0
  sdb4      28.7G  46.3G     10     54  18.5K  4.42M
----------  -----  -----  -----  -----  -----  -----


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.91    0.00   10.79   20.05    0.00   58.25


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              57.00     0.00 1613.00   87.00 62411.00 10604.00    85.90     1.80    1.06    0.95    3.03   0.14  24.00
sdb              49.00     0.00 1404.00 2653.00 58479.50 19861.00    38.62     2.15    0.53    1.24    0.15   0.07  29.20
sdc               0.00     0.00  717.00  121.00  5112.00  1268.00    15.23     0.62    0.88    0.64    2.31   0.46  38.40
sdd               0.00     0.00  426.00  118.00  3248.00  1132.00    16.10     0.55    1.03    0.85    1.69   0.57  31.20
zd0               0.00     0.00    0.00    1.00     0.00     4.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
zd48              0.00     0.00  798.00 2284.00 101496.00  9128.00    71.79     4.13    1.35    4.58    0.23   0.25  76.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.68    0.00   17.44   25.72    0.00   42.16


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             192.00     0.00 4722.00  710.00 171472.50  6888.00    65.67     3.75    0.69    0.78    0.12   0.09  51.20
sdb             166.00     0.00 4758.00  203.00 171901.00  2899.50    70.47     3.86    0.78    0.80    0.33   0.10  50.40
sdc               2.00     0.00  490.00    0.00  3328.00     0.00    13.58     0.16    0.33    0.33    0.00   0.29  14.40
sdd               1.00     0.00  701.00    0.00  4636.00     0.00    13.23     0.21    0.30    0.30    0.00   0.27  18.80
zd48              0.00     0.00 2082.00 3181.00 266464.00 12724.00   106.09     8.61    1.64    4.05    0.06   0.16  85.20
zd208             0.00     0.00    0.00  966.00     0.00  9312.00    19.28     0.01    0.01    0.00    0.01   0.01   0.80
zd224             0.00     0.00    0.00  562.00     0.00  2240.00     7.97     0.04    0.08    0.00    0.08   0.08   4.40
This is from copy in guest machine and as you see again in picture it goes to somewhere then it goes down and it stalls down.
wr_speed.PNG

Below this picture you have the IO wait from copying big file in /storagepool

Copy of 10GB file
Code:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.26    0.00   13.15   32.24    0.00   53.35


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               4.00     0.00  895.00  557.00 103509.00 57704.00   222.06     1.80    1.25    1.63    0.63   0.34  49.60
sdb               4.00     0.00  802.00  952.00 96722.00 98751.50   222.89     2.00    1.14    1.75    0.62   0.43  74.80
sdc               0.00     0.00   31.00  750.00  3656.00 76068.00   204.16     1.44    1.83   22.06    1.00   1.16  90.80
sdd               0.00     0.00   21.00  794.00  2304.00 80768.00   203.86     2.76    4.02  112.19    1.16   1.20  98.00
zd16              0.00     0.00    0.00  337.00     0.00  1348.00     8.00     0.16    0.49    0.00    0.49   0.49  16.40
zd80              0.00     0.00    0.00    3.00     0.00     4.00     2.67     0.00    1.33    0.00    1.33   1.33   0.40
zd208             0.00     0.00    0.00    8.00     0.00    72.00    18.00     0.00    0.00    0.00    0.00   0.00   0.00
zd224             0.00     0.00    0.00   10.00     0.00    32.00     6.40     0.02    2.00    0.00    2.00   2.00   2.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.63    0.00    4.18   28.14    0.00   67.05


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00  210.00  764.00 25902.00 63207.00   182.98     0.37    0.38    0.90    0.24   0.25  24.80
sdb               0.00     0.00  321.00  372.00 40149.00 44843.50   245.29     1.50    2.17    3.34    1.16   0.62  43.20
sdc               3.00     0.00    6.00  847.00   672.00 75268.00   178.05     1.60    1.88  100.00    1.18   1.17  99.60
sdd               0.00     0.00    3.00  881.00   384.00 80424.00   182.82     3.98    3.05  569.33    1.12   1.13 100.00
zd208             0.00     0.00    1.00    0.00   128.00     0.00   256.00     0.27    0.00    0.00    0.00 272.00  27.20

This tests are done when nobody was using VM-s early in morning, because if I do this at day I think its going to reach easily 50-70 IO wait
 
I did some copy test.

VM: Windows 2012 R2
pool1: 2 disk mirror
pool2: 2 disk mirror

Test #1

File size: 11.3G

Copied file from pool1 (samba share) to pool2 (VM disk)

file_copy.png

Result: pool1 do only read and pool2 do only write. Both pools are mirrors and the speed in graph is ~max of pool1 read speed.
p.s. after file was copied pool2 was still doing write flush for next ~10s

Test #2

File size: 1.5G

Copied file from pool2 (samba share) to pool2 (VM disk)

file_copy2.png

Result: pool1 do read/write. Due to ZFS flush strategy you can see big downhill when ZFS do write flush and its hold read operation.
 
When you copy the file to storage pool, there should be no activity on log devices, because that is an async operation.
Therefore I assume that 60MB/s and 45MB/s writes to sd[ab] is ARC evict to L2ARC which is pretty high.
Did you adjust ZFS parameters? By default it writes with ~8MB/sec on L2ARC.
 
In first code where you see the log device writing is copy in vm which is sync operation.

Second code is only in storagepool.

Yes I have adjusted a bit zfs parameters the IO Wait dropped a bit but not where is reasonable.
 
Zfs flush strategy is okay to drop a bit but not like in my case to 0 with big stalls... anyway can you send me your ZFS Config ?

Zfs set commands and zfs.conf file please.

I am going to compare mine with yours guys.

I did some copy test.

VM: Windows 2012 R2
pool1: 2 disk mirror
pool2: 2 disk mirror

Test #1

File size: 11.3G

Copied file from pool1 (samba share) to pool2 (VM disk)

View attachment 3092

Result: pool1 do only read and pool2 do only write. Both pools are mirrors and the speed in graph is ~max of pool1 read speed.
p.s. after file was copied pool2 was still doing write flush for next ~10s

Test #2

File size: 1.5G

Copied file from pool2 (samba share) to pool2 (VM disk)

View attachment 3093

Result: pool1 do read/write. Due to ZFS flush strategy you can see big downhill when ZFS do write flush and its hold read operation.
 
My ZFS pool settings

pool1
pool2
ashift
9
12
compression
lz4
lz4
atime
off
off
sync
disabled
disabled

ARC size = 10GB
L2ARC = none
ZIL = none

everything else is default.


btw disk scheduler is set to noop
 
My ZFS pool settings

pool1pool2
ashift912
compressionlz4lz4
atimeoffoff
syncdisableddisabled

ARC size = 10GB
L2ARC = none
ZIL = none

everything else is default.


btw disk scheduler is set to noop

Sounds interesting runing l2arc = none and zil none I guess you have lots of ram.

Is it safe to make atime = off

I have emails servers, web servers etc, all of them run independently in their ZVOLS so is it safe to put atime off because I have seen that increases a bit performance but can this cause problems with mail servers, web servers etc.
 
I am sorry for writing a lot but this problem seems to be hot and needs more to deal with it until I fix.

Fdisk of one ssd is like that
Code:
Disk /dev/sda: 279.5 GiB, 300069052416 bytes, 586072368 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x0008e14d


Device     Boot     Start       End   Sectors  Size Id Type
/dev/sda1  *         4096  40962047  40957952 19.5G fd Linux raid autodetect
/dev/sda2        40962048  45154303   4192256    2G 82 Linux swap / Solaris
/dev/sda3        45154304  97583103  52428800   25G 83 Linux
/dev/sda4        97583104 586072367 488489264  233G  5 Extended
/dev/sda5        97585152 254871551 157286400   75G 83 Linux
/dev/sda6       254873600 586072367 331198768  158G 83 Linux


When I do hdparm -t /dev/sda and /dev/sda1(which sda1 is partition where proxmox is installed in md raid 1 ) its near 500MB/s from 450 to 485 sometimes which is good because ssd drive is intel ssd dc s3500 300GB and has read of 500MB/s with 75k IOPS and 9K iops in Write
Code:
/dev/sda:
 Timing buffered disk reads: 1334 MB in  3.00 seconds = 467.60 MB/sec

But when I test sda3 and sda4 sda5 sda6 it can't go more than 280MB/s
Code:
/dev/sda3:
 Timing buffered disk reads: 836 MB in  3.01 seconds = 278.05 MB/sec
/dev/sda5:
 Timing buffered disk reads: 732 MB in  3.01 seconds = 243.42 MB/sec

Then I was interested too see how it would perform if I make a raid 0 from both ssd drives as a storage pool and for that reason I created extended partition and this 158GB was used for new raid 0 zfs pool and installed Windows 2008R2 on it to make some tests of copy files etc.
Parameters of the new raid 0 zfs pool was default as zfs installs just tried compression lz4 and off and also sync=disabled and standard but it was strange ( writing speeds from both SSD drives as raid 0 in pool never reached more than 27 MB each so maximum speed of writes from both disks was 62-63 but not more )

I think this topic problem that I opened here has more something to do with partitions or with other configurations because ssd should do at least 150MB write each...and as you see read is at full speed in /dev/sda and sda1 but not in other partitions...maybe this is slowing my 2x2TB pool which has this ssd-s as logs and cache.

What do you think is happening here Sigxpu and Nemesiz because you guys know all this story because I replied with you every step, so I think you have more clear my problem with many posts here from smart data of disks which not showed any problem etc.
 
Sounds interesting runing l2arc = none and zil none I guess you have lots of ram.

10 GB he wrote, so not that much.

Is it safe to make atime = off

I have emails servers, web servers etc, all of them run independently in their ZVOLS so is it safe to put atime off because I have seen that increases a bit performance but can this cause problems with mail servers, web servers etc.

If you're using only zvols, atime does not matter anyway. Access time is updated on access of a file (hence the name), so if you do not have files, there is nothing to update ;)
 
In my server there are 48GB ram. 27GB is set for VMs. 10GB for ZFS ARC. So approximate 10 GB is free. But in reality sometimes server starts to use swap space. At that time I see only ~1-2 GB free. Later free space rise back to ~8 GB.

I can suggest you:

1. to use arcstats tool to see hit ratio.
2. do the same test with md0 ext4
 
BTW I have bad experience with L2ARC. Or at least it doest needed for my server. Then I tried to use L2ARC I saw only writes to it but no read. Even now arcstats shows 86 average read hit ratio.

Code:
ZFS Subsystem Report                Sat Nov 28 17:43:50 2015
ARC Summary: (HEALTHY)
    Memory Throttle Count:            0

ARC Misc:
    Deleted:                1.73m
    Mutex Misses:                7
    Evict Skips:                7

ARC Size:                99.99%    10.00    GiB
    Target Size: (Adaptive)        100.00%    10.00    GiB
    Min Size (Hard Limit):        0.31%    32.00    MiB
    Max Size (High Water):        320:1    10.00    GiB

ARC Size Breakdown:
    Recently Used Cache Size:    79.89%    7.99    GiB
    Frequently Used Cache Size:    20.11%    2.01    GiB

ARC Hash Breakdown:
    Elements Max:                1.05m
    Elements Current:        45.15%    472.49k
    Collisions:                714.69k
    Chain Max:                4
    Chains:                    12.33k

ARC Total accesses:                    16.35m
    Cache Hit Ratio:        86.34%    14.11m
    Cache Miss Ratio:        13.66%    2.23m
    Actual Hit Ratio:        84.83%    13.87m

    Data Demand Efficiency:        86.64%    9.09m
    Data Prefetch Efficiency:    13.23%    628.32k

    CACHE HITS BY CACHE LIST:
      Anonymously Used:        1.38%    195.20k
      Most Recently Used:        28.15%    3.97m
      Most Frequently Used:        70.10%    9.89m
      Most Recently Used Ghost:    0.23%    31.82k
      Most Frequently Used Ghost:    0.14%    19.63k

    CACHE HITS BY DATA TYPE:
      Demand Data:            55.81%    7.88m
      Prefetch Data:        0.59%    83.13k
      Demand Metadata:        42.44%    5.99m
      Prefetch Metadata:        1.16%    163.64k

    CACHE MISSES BY DATA TYPE:
      Demand Data:            54.41%    1.21m
      Prefetch Data:        24.42%    545.19k
      Demand Metadata:        19.24%    429.43k
      Prefetch Metadata:        1.94%    43.20k


File-Level Prefetch: (HEALTHY)
DMU Efficiency:                    47.36m
    Hit Ratio:            93.84%    44.44m
    Miss Ratio:            6.16%    2.92m

    Colinear:                2.92m
      Hit Ratio:            0.03%    1.02k
      Miss Ratio:            99.97%    2.92m

    Stride:                    44.42m
      Hit Ratio:            99.40%    44.16m
      Miss Ratio:            0.60%    266.78k

DMU Misc: 
    Reclaim:                2.92m
      Successes:            6.42%    187.18k
      Failures:            93.58%    2.73m

    Streams:                285.72k
      +Resets:            0.22%    641
      -Resets:            99.78%    285.08k
      Bogus:                0


ZFS Tunable:
    metaslab_debug_load                               0
    zfs_arc_min_prefetch_lifespan                     0
    zfetch_max_streams                                8
    zfs_nopwrite_enabled                              1
    zfetch_min_sec_reap                               2
    zfs_dbgmsg_enable                                 0
    zfs_dirty_data_max_max_percent                    25
    zfs_arc_p_aggressive_disable                      1
    spa_load_verify_data                              1
    zfs_zevent_cols                                   80
    zfs_dirty_data_max_percent                        10
    zfs_sync_pass_dont_compress                       5
    l2arc_write_max                                   104857600
    zfs_vdev_scrub_max_active                         2
    zfs_vdev_sync_write_min_active                    10
    zvol_prefetch_bytes                               131072
    metaslab_aliquot                                  524288
    zfs_no_scrub_prefetch                             0
    zfs_arc_shrink_shift                              0
    zfetch_block_cap                                  256
    zfs_txg_history                                   0
    zfs_delay_scale                                   500000
    zfs_vdev_async_write_active_min_dirty_percent     30
    metaslab_debug_unload                             0
    zfs_read_history                                  0
    zvol_max_discard_blocks                           16384
    zfs_recover                                       0
    l2arc_headroom                                    2
    zfs_deadman_synctime_ms                           1000000
    zfs_scan_idle                                     50
    zfs_free_min_time_ms                              1000
    zfs_dirty_data_max                                5062646169
    zfs_vdev_async_read_min_active                    1
    zfs_mg_noalloc_threshold                          0
    zfs_dedup_prefetch                                0
    zfs_vdev_max_active                               1000
    l2arc_write_boost                                 104857600
    zfs_resilver_min_time_ms                          3000
    zfs_vdev_async_write_max_active                   10
    zil_slog_limit                                    1048576
    zfs_prefetch_disable                              0
    zfs_resilver_delay                                2
    metaslab_lba_weighting_enabled                    1
    zfs_mg_fragmentation_threshold                    85
    l2arc_feed_again                                  1
    zfs_zevent_console                                0
    zfs_immediate_write_sz                            32768
    zfs_dbgmsg_maxsize                                4194304
    zfs_free_leak_on_eio                              0
    zfs_deadman_enabled                               1
    metaslab_bias_enabled                             1
    zfs_arc_p_dampener_disable                        1
    zfs_metaslab_fragmentation_threshold              70
    zfs_no_scrub_io                                   0
    metaslabs_per_vdev                                200
    zfs_dbuf_state_index                              0
    zfs_vdev_sync_read_min_active                     10
    metaslab_fragmentation_factor_enabled             1
    zvol_inhibit_dev                                  0
    zfs_vdev_async_write_active_max_dirty_percent     60
    zfs_vdev_cache_size                               0
    zfs_vdev_mirror_switch_us                         10000
    zfs_dirty_data_sync                               67108864
    spa_config_path                                   /etc/zfs/zpool.cache
    zfs_dirty_data_max_max                            12656615424
    zfs_arc_lotsfree_percent                          10
    zfs_zevent_len_max                                128
    zfs_scan_min_time_ms                              1000
    zfs_arc_sys_free                                  0
    zfs_arc_meta_strategy                             1
    zfs_vdev_cache_bshift                             16
    zfs_arc_meta_adjust_restarts                      4096
    zfs_max_recordsize                                1048576
    zfs_vdev_scrub_min_active                         1
    zfs_vdev_read_gap_limit                           32768
    zfs_arc_meta_limit                                5368709120
    zfs_vdev_sync_write_max_active                    10
    l2arc_norw                                        0
    zfs_arc_meta_prune                                10000
    metaslab_preload_enabled                          1
    l2arc_nocompress                                  0
    zvol_major                                        230
    zfs_vdev_aggregation_limit                        131072
    zfs_flags                                         0
    spa_asize_inflation                               24
    zfs_admin_snapshot                                0
    l2arc_feed_secs                                   1
    zfs_sync_pass_deferred_free                       2
    zfs_disable_dup_eviction                          0
    zfs_arc_grow_retry                                0
    zfs_read_history_hits                             0
    zfs_vdev_async_write_min_active                   1
    zfs_vdev_async_read_max_active                    3
    zfs_scrub_delay                                   4
    zfs_delay_min_dirty_percent                       60
    zfs_free_max_blocks                               100000
    zfs_vdev_cache_max                                16384
    zio_delay_max                                     30000
    zfs_top_maxinflight                               32
    spa_slop_shift                                    5
    zfs_vdev_write_gap_limit                          4096
    spa_load_verify_metadata                          1
    spa_load_verify_maxinflight                       10000
    l2arc_noprefetch                                  0
    zfs_vdev_scheduler                                noop
    zfs_expire_snapshot                               300
    zfs_sync_pass_rewrite                             2
    zil_replay_disable                                0
    zfs_nocacheflush                                  0
    zfs_arc_max                                       10737418240
    zfs_arc_min                                       0
    zfs_read_chunk_size                               1048576
    zfs_txg_timeout                                   5
    zfs_pd_bytes_max                                  52428800
    l2arc_headroom_boost                              200
    zfs_send_corrupt_data                             0
    l2arc_feed_min_ms                                 200
    zfs_arc_meta_min                                  0
    zfs_arc_average_blocksize                         8192
    zfetch_array_rd_sz                                1048576
    zfs_autoimport_disable                            1
    zfs_arc_p_min_shift                               0
    zio_requeue_io_start_cut_in_line                  1
    zfs_vdev_sync_read_max_active                     10
    zfs_mdcomp_disable                                0
    zfs_arc_num_sublists_per_state                    8
 
Last edited:
My ZFS is completely dying in writes very very slow...today in one virtual machine I tried to copy a file and look whats shows there
Code:
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storagepool   793G  1.04T    162   1010   664K  14.7M
  mirror     793G  1.04T    162    920   664K  6.08M
    sdc         -      -    105    284   432K  6.20M
    sdd         -      -     56    286   288K  6.29M
logs            -      -      -      -      -      -
  mirror    46.8M  24.8G      0     89      0  8.66M
    sda3        -      -      0     89      0  8.66M
    sdb3        -      -      0     89      0  8.66M
----------  -----  -----  -----  -----  -----  -----



avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          39.29    0.00    2.66   27.63    0.00   30.42


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00   53.00     0.00  1664.00    62.79     0.00    0.08    0.00    0.08   0.08   0.40
sdb               0.00     0.00    0.00   53.00     0.00  1664.00    62.79     0.00    0.08    0.00    0.08   0.08   0.40
sdc               0.00     0.00   34.00  340.00   148.00  4056.00    22.48     3.33   18.35  172.59    2.93   2.67 100.00
sdd               3.00     0.00   62.00  298.00  1280.00  3876.00    28.64     3.02   16.99   86.84    2.46   2.78 100.00
zd48              0.00     0.00    0.00   30.00     0.00   120.00     8.00     1.00   62.80    0.00   62.80  33.33 100.00
zd128             0.00     0.00   12.00   12.00   792.00    44.00    69.67     0.12    7.83    0.00   15.67   4.83  11.60
zd144             0.00     0.00    0.00   49.00     0.00   100.00     4.08     0.00    0.08    0.00    0.08   0.08   0.40
zd208             0.00     0.00    0.00  101.00     0.00   388.00     7.68     0.80    8.48    0.00    8.48   7.96  80.40
zd224             0.00     0.00    0.00  309.00     0.00  1216.00     7.87     1.40   16.75    0.00   16.75   3.17  98.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.11    0.00    1.76   31.95    0.00   48.18


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00   33.00     0.00   752.00    45.58     0.01    0.36    0.00    0.36   0.12   0.40
sdb               0.00     0.00    0.00   33.00     0.00   752.00    45.58     0.01    0.36    0.00    0.36   0.12   0.40
sdc               0.00     0.00    3.00  314.00    12.00  1696.00    10.78     2.12    3.43   30.67    3.17   3.15 100.00
sdd               0.00     0.00    0.00  343.00     0.00  1860.00    10.85     3.00    2.92    0.00    2.92   2.92 100.00
zd0               0.00     0.00    0.00    5.00     0.00    16.00     6.40     0.00    0.00    0.00    0.00   0.00   0.00
zd80              0.00     0.00    0.00    7.00     0.00    20.00     5.71     0.02    2.29    0.00    2.29   2.29   1.60
zd128             0.00     0.00    5.00   81.00    20.00   324.00     8.00     0.04    0.00    0.00    0.00   0.42   3.60
zd144             0.00     0.00    0.00  180.00     0.00   624.00     6.93     0.08    0.44    0.00    0.44   0.44   8.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.94    0.00    1.01   37.67    0.00   44.37


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00   20.00     0.00   380.00    38.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00   20.00     0.00   380.00    38.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    4.00  373.00    28.00  2492.00    13.37     2.90    4.63  212.00    2.40   2.65 100.00
sdd               0.00     0.00   16.00  336.00    68.00  2292.00    13.41     2.50   13.43  256.00    1.88   2.84 100.00
zd48              0.00     0.00    0.00    2.00     0.00     8.00     8.00     1.00 1132.00    0.00 1132.00 500.00 100.00
zd64              0.00     0.00    0.00    9.00     0.00    28.00     6.22     0.02    1.78    0.00    1.78   1.78   1.60
zd128             0.00     0.00    0.00 2490.00     0.00  9960.00     8.00     1.00    0.38    0.00    0.38   0.40 100.00
zd144             0.00     0.00    0.00   29.00     0.00    68.00     4.69     0.00    0.00    0.00    0.00   0.00   0.00
zd208             0.00     0.00    0.00   12.00     0.00    40.00     6.67     0.68  145.00    0.00  145.00  56.33  67.60


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          36.83    0.00    1.92   19.44    0.00   41.82


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00  237.00     0.00 15452.00   130.40     0.37    1.57    0.00    1.57   0.25   6.00
sdb               0.00     0.00    0.00  237.00     0.00 15452.00   130.40     0.36    1.54    0.00    1.54   0.22   5.20
sdc               0.00     0.00   90.00    6.00   484.00    16.00    10.42     1.24   37.75   35.96   64.67   7.08  68.00
sdd               0.00     0.00  136.00    6.00   556.00    16.00     8.06     1.41   10.79    8.32   66.67   5.04  71.60
zd48              0.00     0.00    0.00  390.00     0.00  1560.00     8.00     1.14    2.94    0.00    2.94   2.58 100.80
zd128             0.00     0.00    2.00  257.00     8.00  1024.00     7.97     0.36    1.78    2.00    1.77   1.41  36.40
zd144             0.00     0.00    2.00   95.00     8.00   276.00     5.86     0.17    1.73    0.00    1.77   1.73  16.80
zd208             0.00     0.00    0.00  110.00     0.00   432.00     7.85     0.22    2.00    0.00    2.00   2.00  22.00
zd224             0.00     0.00    0.00  473.00     0.00  1692.00     7.15     0.41    5.49    0.00    5.49   0.87  41.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          25.86    0.00    3.31   20.25    0.00   50.57


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    1.00  101.00     4.00  8024.00   157.41     0.18    1.76    0.00    1.78   0.20   2.00
sdb               0.00     2.00    0.00  101.00     0.00  8024.00   158.89     0.18    1.78    0.00    1.78   0.20   2.00
sdc               0.00     0.00  172.00  182.00   752.00  5216.00    33.72     0.85    2.40    3.23    1.60   1.86  66.00
md1               0.00     0.00    0.00    4.00     0.00    16.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00  165.00  184.00   724.00  5256.00    34.27     0.96    1.95    2.40    1.54   1.88  65.60
zd16              0.00     0.00    3.00    0.00    12.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
zd48              0.00     0.00    0.00  635.00     0.00  2540.00     8.00     1.35    1.32    0.00    1.32   1.57  99.60
zd96              0.00     0.00    0.00    7.00     0.00    20.00     5.71     0.01    1.14    0.00    1.14   1.14   0.80
zd128             0.00     0.00    1.00  112.00     4.00   448.00     8.00     0.14    1.27    0.00    1.29   1.27  14.40
zd144             0.00     0.00    0.00  254.00     0.00   880.00     6.93     0.02    0.06    0.00    0.06   0.06   1.60
zd208             0.00     0.00    0.00 1648.00     0.00  6584.00     7.99     0.12    0.06    0.00    0.06   0.07  12.00
zfscopy.PNG

With all this problems I am thinking to switch to other storage methods with xfs or md raid...tired from trying everything.
 
Your 5 VMs write asynchronously (very little synchronous) at the same time according to your iostat. This cannot be faster except if you're throwing more hardware at it (or use the SSD, ZIL and L2ARC does not work here to increase your throughput, au contraire)

Try LVM cache and no other filesystem, just lvm volumes with your SSDs as 'real' cache devices . This provides more performance for you very small storage system but of course no neat features such as compression, CoW snapshots etc. Please refer to this post for further information https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/

I'm still baffled that you expect a fast system with two SATA harddisks as mirror in a storage and virtualization box of your size.
 
Your 5 VMs write asynchronously (very little synchronous) at the same time according to your iostat. This cannot be faster except if you're throwing more hardware at it (or use the SSD, ZIL and L2ARC does not work here to increase your throughput, au contraire)

Try LVM cache and no other filesystem, just lvm volumes with your SSDs as 'real' cache devices . This provides more performance for you very small storage system but of course no neat features such as compression, CoW snapshots etc. Please refer to this post for further information https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/

I'm still baffled that you expect a fast system with two SATA harddisks as mirror in a storage and virtualization box of your size.

I know with this config that I have it can't be a miracle but even without SSD-s and only with md raid 1 drives I had better writes with same virtual machines...I feel something is not working well in my configuration.
I will turn off all other VM-s and let up only one in early morning and try same copy of files and you will see that it will not go better even without "heavy usage"...Thank you for the link I will plan something to do next days.
 
Looks like some VM-s have disk type IDE including this one where I tested writes, changed this one to SATA and writes performing a little better so I need to change other VM-s to Sata or even Virto ?

Should I use Virtio ? Are guests going to work in ZFS storage better with virto like before in ext4 ?
 
IDE is really slow and should not be used. This should be a general rule and does apply to all virtualization environments. Only use it if nothing else works (very old guest OS).Throughput and latency drastically improved for my machines after I upgraded from IDE to SATA and then "only" a little bit faster from SATA to VirtIO. Normally, the latency improves a lot because of a much shorter path from you guest IO to your backend IO.

Using VirtIO is always the best choice if there are drivers for your virtual guest available of course. Normally, you achieve better guest performance, yet if the host is on its limit, I do not know if it'll yield better performance. I'd suggest you try - can't be worse. Please also consider correct alignment of your guest partitions and filesystem to underlying storage and zfs record size match. This can also drastically improve performance (or on the other hand cost a lot of performance).

You wrote that you use the suggested cache setting from the ZFS wiki article, please try also other caching mechanisms. Try also direct-sync such that the writes can (I don't know for sure, but the same suggests it) go to ZIL due to the sync-behavior (assuming you do not set sync=disable).
 
IDE is really slow and should not be used. This should be a general rule and does apply to all virtualization environments. Only use it if nothing else works (very old guest OS).Throughput and latency drastically improved for my machines after I upgraded from IDE to SATA and then "only" a little bit faster from SATA to VirtIO. Normally, the latency improves a lot because of a much shorter path from you guest IO to your backend IO.

Using VirtIO is always the best choice if there are drivers for your virtual guest available of course. Normally, you achieve better guest performance, yet if the host is on its limit, I do not know if it'll yield better performance. I'd suggest you try - can't be worse. Please also consider correct alignment of your guest partitions and filesystem to underlying storage and zfs record size match. This can also drastically improve performance (or on the other hand cost a lot of performance).

You wrote that you use the suggested cache setting from the ZFS wiki article, please try also other caching mechanisms. Try also direct-sync such that the writes can (I don't know for sure, but the same suggests it) go to ZIL due to the sync-behavior (assuming you do not set sync=disable).


Today I see very good improvement after changing IDE to SATA...seems this was the biggest problem, havent tested writes yet but looks it 100% better.
today.PNG

Before installing proxmox I read Wiki ZFS proxmox fast without checking ever step and this words
Code:
[COLOR=#000000][FONT=sans-serif]If you are experimenting with an installation of Proxmox inside a VM ([/FONT][/COLOR][URL="https://pve.proxmox.com/wiki/Nested_Virtualization"]Nested_Virtualization[/URL][COLOR=#000000][FONT=sans-serif]), don't use Virtio for disks of that VM, since are not supported by ZFS, use IDE or SCSI instead.
put me in problem you see where it says since are not supported by ZFS, use IDE or SCSI Instead.

Yesterday reading it again I saw that its for nested virtualization so after changing drives to SATA its better, very soon I will convert everything to VIRTIO with drivers :)

[/FONT][/COLOR]
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!