ZFS worth it? Tuning tips?

Hi,

as long you use a Raid controller you will never see good performance with ZFS.
It does not matter if you use JBOD mode or Raid mode of the HW Raid.
The problem is an HW Raid controller has its own cache management to improve the performance.
But this Raid HW cache will reduce the speed of ZFS massive and often ends in stuck io requests.
I have a LSI hardware RAID10 and on top of that ZFS, only because I needed it for replication. ZFS is the only filesystem the enables replication in Proxmox, correct?

The array locked up once, butmade some changes and now it seems ok. Fsyncs > 5500. But write speeds are not very very high.

I'm considering a reinstall and offer the bare disks to ZFS, but should I benefit from disabling WT caching in my RAID controller in the mean while?
 
I guess hw raid is not even a consideration for me, even if someone managed to make it work. I only set it up because I didn't want to wait weeks for the hba to arrive before having my first zfs experiences...

I think pool and l2arc ashift mismatch can lead to poor performance.
Is this true? I read ashift settings are on a vdev level not zpool, but couldn't find any information about different ashift settings in the same pool affecting performance.

- start with defaults zfs and try to see if most of the time is ok or not (use a monitoring systems for this like librenms) taking in account not only zfs storage (cpu, ram, load and so on can have impact on zfs)
Cool! We already have a librenms running, I'll make sure to add the main host and all vms/containers :)

- focus your attention on VM and less on CT (VM with zfs are mostly impact on performance )
- for VM use the largest volblock size that fit your use case (make tests with values larger than defaults=4k PMX, like 16-64 )
- if most of the time you use large files, then a bigger volblocksize is better
- use no cache for VM, and primarycache=metadata for zvol
- using in a VM a DB is tricky, because you need to adjust again your volblocksize as the DB owner recommend (16 k for mysql as example)
So even if write back cache is recommended for windows vms you would use no cache when the host runs zfs? May be worth recommending an edit on the best practices articles?
I won't be running databases in a vm but I'll give your testing volblock numbers suggestion a try.

- remember that zfs is self-tunning, so many setings will be changing if some triggers will be happening (like if your free space will be under 10 %, or your fragmentation will rise, and so on)
- read many documentation as you can, at the beginning you will not understanding many things, but in time your brain will start to put all strange infos in order and to make unnumbered links ... and in a day your cloudy thoughs will be clear like crystal; )
This is interesting! When the HBA arrives I'll do a fresh install and probably only adjust the maximum ARC size so I can make sure there's enough ram for the vms and cts. Other than that I'll leave the defaults for a while before I start testing some adjustments.
 
Oh I see, so ARC space will be "wasted" to store L2ARC indexes when the original ARC size may be enough for my caching needs. Good tip, thanks!

and it can be huge. it's around 380bytes memory in arc for each block. So if you do volumes with 4k block, do the math. (25GB memory for 1TB l2arc with 4k block).
Better to have volumes with 64k or 128k block.
 
So HBA arrived last week!! :D I removed the PERC H310 mini that came with the server and installed a pci-e H310 flashed on IT mode.

I couldn't find any information about performance with mismatching ashift settings between the vdevs and zil/l2arc devices in the same pool so settled with ashift 12 which should work fine with all devices.

The server now is super responsive, feels as fast as it used to be when I tried it with Proxmox directly on one SSD.

I still haven't found an effective way of determining if I do need the L2ARC or not, so I set it up just to test it and after a couple of days I can see it's almost fully used! Should I take that as a sign that I will benefit from a L2ARC SSD or should I remove it and do some further testing to fully determine the impact?

I guess it's worth mentioning I'm only be able to guarantee 52GB of RAM for ARC at the moment and we don't want to be upgrading the server's RAM anytime soon.

vj41RQN.png
 
Code:
------------------------------------------------------------------------
ZFS Subsystem Report                Tue Jul 24 09:06:18 2018
ARC Summary: (HEALTHY)
    Memory Throttle Count:            0

ARC Misc:
    Deleted:                27.46M
    Mutex Misses:                47
    Evict Skips:                1.87k

ARC Size:                100.00%    48.00    GiB
    Target Size: (Adaptive)        100.00%    48.00    GiB
    Min Size (Hard Limit):        8.19%    3.93    GiB
    Max Size (High Water):        12:1    48.00    GiB

ARC Size Breakdown:
    Recently Used Cache Size:    94.78%    42.88    GiB
    Frequently Used Cache Size:    5.22%    2.36    GiB

ARC Hash Breakdown:
    Elements Max:                7.53M
    Elements Current:        63.54%    4.78M
    Collisions:                15.10M
    Chain Max:                7
    Chains:                    564.09k

ARC Total accesses:                    182.53M
    Cache Hit Ratio:        93.06%    169.85M
    Cache Miss Ratio:        6.94%    12.67M
    Actual Hit Ratio:        92.62%    169.05M

    Data Demand Efficiency:        89.39%    32.93M
    Data Prefetch Efficiency:    56.14%    1.53M

    CACHE HITS BY CACHE LIST:
      Anonymously Used:        0.34%    576.88k
      Most Recently Used:        11.06%    18.79M
      Most Frequently Used:        88.47%    150.26M
      Most Recently Used Ghost:    0.02%    27.06k
      Most Frequently Used Ghost:    0.12%    195.98k

    CACHE HITS BY DATA TYPE:
      Demand Data:            17.33%    29.43M
      Prefetch Data:        0.51%    857.94k
      Demand Metadata:        82.16%    139.54M
      Prefetch Metadata:        0.01%    17.32k

    CACHE MISSES BY DATA TYPE:
      Demand Data:            27.58%    3.50M
      Prefetch Data:        5.29%    670.36k
      Demand Metadata:        67.10%    8.50M
      Prefetch Metadata:        0.03%    4.24k

L2 ARC Summary: (HEALTHY)
    Low Memory Aborts:            0
    Free on Write:                141.15k
    R/W Clashes:                0
    Bad Checksums:                0
    IO Errors:                0

L2 ARC Size: (Adaptive)                492.45    GiB
    Compressed:            95.96%    472.55    GiB
    Header Size:            0.06%    327.58    MiB

L2 ARC Evicts:
    Lock Retries:                11
    Upon Reading:                0

L2 ARC Breakdown:                12.67M
    Hit Ratio:            1.92%    243.90k
    Miss Ratio:            98.08%    12.43M
    Feeds:                    410.66k

L2 ARC Writes:
    Writes Sent:            100.00%    230.43k

DMU Prefetch Efficiency:                    62.72M
    Hit Ratio:            2.25%    1.41M
    Miss Ratio:            97.75%    61.31M



ZFS Tunables:
    dbuf_cache_hiwater_pct                            10
    dbuf_cache_lowater_pct                            10
    dbuf_cache_max_bytes                              104857600
    dbuf_cache_max_shift                              5
    dmu_object_alloc_chunk_shift                      7
    ignore_hole_birth                                 1
    l2arc_feed_again                                  1
    l2arc_feed_min_ms                                 200
    l2arc_feed_secs                                   1
    l2arc_headroom                                    2
    l2arc_headroom_boost                              200
    l2arc_noprefetch                                  1
    l2arc_norw                                        0
    l2arc_write_boost                                 8388608
    l2arc_write_max                                   8388608
    metaslab_aliquot                                  524288
    metaslab_bias_enabled                             1
    metaslab_debug_load                               0
    metaslab_debug_unload                             0
    metaslab_fragmentation_factor_enabled             1
    metaslab_lba_weighting_enabled                    1
    metaslab_preload_enabled                          1
    metaslabs_per_vdev                                200
    send_holes_without_birth_time                     1
    spa_asize_inflation                               24
    spa_config_path                                   /etc/zfs/zpool.cache
    spa_load_verify_data                              1
    spa_load_verify_maxinflight                       10000
    spa_load_verify_metadata                          1
    spa_slop_shift                                    5
    zfetch_array_rd_sz                                1048576
    zfetch_max_distance                               8388608
    zfetch_max_streams                                8
    zfetch_min_sec_reap                               2
    zfs_abd_scatter_enabled                           1
    zfs_abd_scatter_max_order                         10
    zfs_admin_snapshot                                1
    zfs_arc_average_blocksize                         8192
    zfs_arc_dnode_limit                               0
    zfs_arc_dnode_limit_percent                       10
    zfs_arc_dnode_reduce_percent                      10
    zfs_arc_grow_retry                                0
    zfs_arc_lotsfree_percent                          10
    zfs_arc_max                                       51539607552
    zfs_arc_meta_adjust_restarts                      4096
    zfs_arc_meta_limit                                0
    zfs_arc_meta_limit_percent                        75
    zfs_arc_meta_min                                  0
    zfs_arc_meta_prune                                10000
    zfs_arc_meta_strategy                             1
    zfs_arc_min                                       0
    zfs_arc_min_prefetch_lifespan                     0
    zfs_arc_p_dampener_disable                        1
    zfs_arc_p_min_shift                               0
    zfs_arc_pc_percent                                0
    zfs_arc_shrink_shift                              0
    zfs_arc_sys_free                                  0
    zfs_autoimport_disable                            1
    zfs_checksums_per_second                          20
    zfs_compressed_arc_enabled                        1
    zfs_dbgmsg_enable                                 0
    zfs_dbgmsg_maxsize                                4194304
    zfs_dbuf_state_index                              0
    zfs_deadman_checktime_ms                          5000
    zfs_deadman_enabled                               1
    zfs_deadman_synctime_ms                           1000000
    zfs_dedup_prefetch                                0
    zfs_delay_min_dirty_percent                       60
    zfs_delay_scale                                   500000
    zfs_delays_per_second                             20
    zfs_delete_blocks                                 20480
    zfs_dirty_data_max                                4294967296
    zfs_dirty_data_max_max                            4294967296
    zfs_dirty_data_max_max_percent                    25
    zfs_dirty_data_max_percent                        10
    zfs_dirty_data_sync                               67108864
    zfs_dmu_offset_next_sync                          0
    zfs_expire_snapshot                               300
    zfs_flags                                         0
    zfs_free_bpobj_enabled                            1
    zfs_free_leak_on_eio                              0
    zfs_free_max_blocks                               100000
    zfs_free_min_time_ms                              1000
    zfs_immediate_write_sz                            32768
    zfs_max_recordsize                                1048576
    zfs_mdcomp_disable                                0
    zfs_metaslab_fragmentation_threshold              70
    zfs_metaslab_segment_weight_enabled               1
    zfs_metaslab_switch_threshold                     2
    zfs_mg_fragmentation_threshold                    85
    zfs_mg_noalloc_threshold                          0
    zfs_multihost_fail_intervals                      5
    zfs_multihost_history                             0
    zfs_multihost_import_intervals                    10
    zfs_multihost_interval                            1000
    zfs_multilist_num_sublists                        0
    zfs_no_scrub_io                                   0
    zfs_no_scrub_prefetch                             0
    zfs_nocacheflush                                  0
    zfs_nopwrite_enabled                              1
    zfs_object_mutex_size                             64
    zfs_pd_bytes_max                                  52428800
    zfs_per_txg_dirty_frees_percent                   30
    zfs_prefetch_disable                              0
    zfs_read_chunk_size                               1048576
    zfs_read_history                                  0
    zfs_read_history_hits                             0
    zfs_recover                                       0
    zfs_recv_queue_length                             16777216
    zfs_resilver_delay                                2
    zfs_resilver_min_time_ms                          3000
    zfs_scan_idle                                     50
    zfs_scan_ignore_errors                            0
    zfs_scan_min_time_ms                              1000
    zfs_scrub_delay                                   4
    zfs_send_corrupt_data                             0
    zfs_send_queue_length                             16777216
    zfs_sync_pass_deferred_free                       2
    zfs_sync_pass_dont_compress                       5
    zfs_sync_pass_rewrite                             2
    zfs_sync_taskq_batch_pct                          75
    zfs_top_maxinflight                               32
    zfs_txg_history                                   0
    zfs_txg_timeout                                   5
    zfs_vdev_aggregation_limit                        131072
    zfs_vdev_async_read_max_active                    3
    zfs_vdev_async_read_min_active                    1
    zfs_vdev_async_write_active_max_dirty_percent     60
    zfs_vdev_async_write_active_min_dirty_percent     30
    zfs_vdev_async_write_max_active                   10
    zfs_vdev_async_write_min_active                   2
    zfs_vdev_cache_bshift                             16
    zfs_vdev_cache_max                                16384
    zfs_vdev_cache_size                               0
    zfs_vdev_max_active                               1000
    zfs_vdev_mirror_non_rotating_inc                  0
    zfs_vdev_mirror_non_rotating_seek_inc             1
    zfs_vdev_mirror_rotating_inc                      0
    zfs_vdev_mirror_rotating_seek_inc                 5
    zfs_vdev_mirror_rotating_seek_offset              1048576
    zfs_vdev_queue_depth_pct                          1000
    zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3
    zfs_vdev_read_gap_limit                           32768
    zfs_vdev_scheduler                                noop
    zfs_vdev_scrub_max_active                         2
    zfs_vdev_scrub_min_active                         1
    zfs_vdev_sync_read_max_active                     10
    zfs_vdev_sync_read_min_active                     10
    zfs_vdev_sync_write_max_active                    10
    zfs_vdev_sync_write_min_active                    10
    zfs_vdev_write_gap_limit                          4096
    zfs_zevent_cols                                   80
    zfs_zevent_console                                0
    zfs_zevent_len_max                                768
    zil_replay_disable                                0
    zil_slog_bulk                                     786432
    zio_delay_max                                     30000
    zio_dva_throttle_enabled                          1
    zio_requeue_io_start_cut_in_line                  1
    zio_taskq_batch_pct                               75
    zvol_inhibit_dev                                  0
    zvol_major                                        230
    zvol_max_discard_blocks                           16384
    zvol_prefetch_bytes                               131072
    zvol_request_sync                                 0
    zvol_threads                                      32
    zvol_volmode                                      1

After my last post I also managed to add the pool on my LibreNMS so I can now track all those details. I will try removing the L2ARC today to see what happens here :D

I'm thinking the most important value to track would be "Frequently Used Cache Size" which is now sitting at 2.36 GiB. If that ever gets close to the max ARC size that's when I should probably consider introducing the L2ARC?
 
After my last post I also managed to add the pool on my LibreNMS so I can now track all those details. I will try removing the L2ARC today to see what happens here

zfs needs some time to make your caching "optimal". few weeks i suggest depending on your workloads.

Code:
ARC Total accesses:                    182.53M
    Cache Hit Ratio:        93.06%    169.85M
    Cache Miss Ratio:        6.94%    12.67M
    Actual Hit Ratio:        92.62%    169.05M

as you can see arc only stored about ~200mb till now

Code:
L2 ARC Breakdown:                12.67M
   Hit Ratio:            1.92%    243.90k
   Miss Ratio:            98.08%    12.43M
   Feeds:                    410.66k

as you can see your L2ARC is useless :)

try starting without L2ARC and check arc_summary after some weeks
if these values are high
Code:
Most Recently Used Ghost:    0.02%    27.06k
Most Frequently Used Ghost:    0.12%    195.98k
then you can start thinking about L2ARC
 
  • Like
Reactions: guletz