Help interprete ZFS stats (Grafana / Telegraf metrics)

Discussion in 'Proxmox VE: Installation and configuration' started by Denny Fuchs, Oct 10, 2018.

Tags:
  1. Denny Fuchs

    Denny Fuchs Member
    Proxmox VE Subscriber

    Joined:
    Jan 21, 2016
    Messages:
    61
    Likes Received:
    4
    hi,

    we've got notifications from our monitoring (Icinga2), which is a VM on a PVE 5.2 host, with WD RED(WDC WD10JFCX-68N6GN0) 6 x 1TB (2.5") as Raidz2, because of timeouts (check_icmp).
    After a longer investigation, we found out, that these alerts where false positives, because the monitoring VM itself wasn't able to execute checks.
    Our metrics from Icinga2 and Telegraf (both in InfluxDB) shows, that the I/O was going up, at exact 6:25, which is the time from cron.daily.
    The CPU is a E3-1270 v5 @ 3.5Ghz and we have 64GB DDR4 ECC. Arc is limited from min 6GB till max 12GB ram.
    Glad that we are collecting via Telegraf also ZFS stats, but I'm not sure, how to interpret them. Maybe someone can help us out.
    We using the Supermicro X11SSH-TF and a LSI/Broadcom controller (SAS3008). The only thing what we can do: we have a single M.2 slot free .. maybe we can use it as cache ?

    Any suggestions?
     

    Attached Files:

  2. 6uellerbpanda

    6uellerbpanda Member
    Proxmox VE Subscriber

    Joined:
    Sep 15, 2015
    Messages:
    37
    Likes Received:
    5
    what's the output of:
    Code:
    zpool status
    and
    Code:
    arc_summary.py
    I'm not sure if I understand it correctly but is there any problem at all or is it just about what the metrics mean ?
     
  3. Denny Fuchs

    Denny Fuchs Member
    Proxmox VE Subscriber

    Joined:
    Jan 21, 2016
    Messages:
    61
    Likes Received:
    4
    hi,


    Code:
    zpool status
      pool: rpool
     state: ONLINE
      scan: scrub repaired 0B in 35h54m with 0 errors on Mon Sep 10 12:18:32 2018
    config:
    
        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
            sdc2    ONLINE       0     0     0
            sdd2    ONLINE       0     0     0
            sde2    ONLINE       0     0     0
            sdf2    ONLINE       0     0     0
            sdg2    ONLINE       0     0     0
            sdh2    ONLINE       0     0     0
    errors: No known data errors
    
    Code:
    arc_summary
    ------------------------------------------------------------------------
    ZFS Subsystem Report                Fri Oct 12 12:06:06 2018
    ARC Summary: (HEALTHY)
        Memory Throttle Count:            0
    
    ARC Misc:
        Deleted:                40.23M
        Mutex Misses:                1.57k
        Evict Skips:                24.43k
    
    ARC Size:                64.42%    7.73    GiB
        Target Size: (Adaptive)        64.77%    7.77    GiB
        Min Size (Hard Limit):        50.00%    6.00    GiB
        Max Size (High Water):        2:1    12.00    GiB
    
    ARC Size Breakdown:
        Recently Used Cache Size:    30.34%    2.16    GiB
        Frequently Used Cache Size:    69.66%    4.96    GiB
    
    ARC Hash Breakdown:
        Elements Max:                3.66M
        Elements Current:        49.55%    1.81M
        Collisions:                73.65M
        Chain Max:                7
        Chains:                    168.25k
    
    ARC Total accesses:                    5.57G
        Cache Hit Ratio:        98.88%    5.51G
        Cache Miss Ratio:        1.12%    62.29M
        Actual Hit Ratio:        98.35%    5.48G
    
        Data Demand Efficiency:        99.48%    4.62G
        Data Prefetch Efficiency:    58.03%    39.18M
    
        CACHE HITS BY CACHE LIST:
          Anonymously Used:        0.37%    20.12M
          Most Recently Used:        5.04%    277.45M
          Most Frequently Used:        94.43%    5.20G
          Most Recently Used Ghost:    0.09%    4.85M
          Most Frequently Used Ghost:    0.08%    4.64M
    
        CACHE HITS BY DATA TYPE:
          Demand Data:            83.47%    4.60G
          Prefetch Data:        0.41%    22.74M
          Demand Metadata:        15.72%    866.07M
          Prefetch Metadata:        0.40%    21.79M
    
        CACHE MISSES BY DATA TYPE:
          Demand Data:            38.41%    23.92M
          Prefetch Data:        26.40%    16.44M
          Demand Metadata:        33.98%    21.16M
          Prefetch Metadata:        1.21%    754.86k
    
    DMU Prefetch Efficiency:                    284.27M
        Hit Ratio:            19.50%    55.45M
        Miss Ratio:            80.50%    228.83M
    
    ZFS Tunables:
        dbuf_cache_hiwater_pct                            10
        dbuf_cache_lowater_pct                            10
        dbuf_cache_max_bytes                              104857600
        dbuf_cache_max_shift                              5
        dmu_object_alloc_chunk_shift                      7
        ignore_hole_birth                                 1
        l2arc_feed_again                                  1
        l2arc_feed_min_ms                                 200
        l2arc_feed_secs                                   1
        l2arc_headroom                                    2
        l2arc_headroom_boost                              200
        l2arc_noprefetch                                  1
        l2arc_norw                                        0
        l2arc_write_boost                                 8388608
        l2arc_write_max                                   8388608
        metaslab_aliquot                                  524288
        metaslab_bias_enabled                             1
        metaslab_debug_load                               0
        metaslab_debug_unload                             0
        metaslab_fragmentation_factor_enabled             1
        metaslab_lba_weighting_enabled                    1
        metaslab_preload_enabled                          1
        metaslabs_per_vdev                                200
        send_holes_without_birth_time                     1
        spa_asize_inflation                               24
        spa_config_path                                   /etc/zfs/zpool.cache
        spa_load_verify_data                              1
        spa_load_verify_maxinflight                       10000
        spa_load_verify_metadata                          1
        spa_slop_shift                                    5
        zfetch_array_rd_sz                                1048576
        zfetch_max_distance                               8388608
        zfetch_max_streams                                8
        zfetch_min_sec_reap                               2
        zfs_abd_scatter_enabled                           1
        zfs_abd_scatter_max_order                         10
        zfs_admin_snapshot                                1
        zfs_arc_average_blocksize                         8192
        zfs_arc_dnode_limit                               0
        zfs_arc_dnode_limit_percent                       10
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_grow_retry                                0
        zfs_arc_lotsfree_percent                          10
        zfs_arc_max                                       12884901888
        zfs_arc_meta_adjust_restarts                      4096
        zfs_arc_meta_limit                                0
        zfs_arc_meta_limit_percent                        75
        zfs_arc_meta_min                                  0
        zfs_arc_meta_prune                                10000
        zfs_arc_meta_strategy                             1
        zfs_arc_min                                       6442450944
        zfs_arc_min_prefetch_lifespan                     0
        zfs_arc_p_dampener_disable                        1
        zfs_arc_p_min_shift                               0
        zfs_arc_pc_percent                                0
        zfs_arc_shrink_shift                              0
        zfs_arc_sys_free                                  0
        zfs_autoimport_disable                            1
        zfs_checksums_per_second                          20
        zfs_compressed_arc_enabled                        1
        zfs_dbgmsg_enable                                 0
        zfs_dbgmsg_maxsize                                4194304
        zfs_dbuf_state_index                              0
        zfs_deadman_checktime_ms                          5000
        zfs_deadman_enabled                               1
        zfs_deadman_synctime_ms                           1000000
        zfs_dedup_prefetch                                0
        zfs_delay_min_dirty_percent                       60
        zfs_delay_scale                                   500000
        zfs_delays_per_second                             20
        zfs_delete_blocks                                 20480
        zfs_dirty_data_max                                4294967296
        zfs_dirty_data_max_max                            4294967296
        zfs_dirty_data_max_max_percent                    25
        zfs_dirty_data_max_percent                        10
        zfs_dirty_data_sync                               67108864
        zfs_dmu_offset_next_sync                          0
        zfs_expire_snapshot                               300
        zfs_flags                                         0
        zfs_free_bpobj_enabled                            1
        zfs_free_leak_on_eio                              0
        zfs_free_max_blocks                               100000
        zfs_free_min_time_ms                              1000
        zfs_immediate_write_sz                            32768
        zfs_max_recordsize                                1048576
        zfs_mdcomp_disable                                0
        zfs_metaslab_fragmentation_threshold              70
        zfs_metaslab_segment_weight_enabled               1
        zfs_metaslab_switch_threshold                     2
        zfs_mg_fragmentation_threshold                    85
        zfs_mg_noalloc_threshold                          0
        zfs_multihost_fail_intervals                      5
        zfs_multihost_history                             0
        zfs_multihost_import_intervals                    10
        zfs_multihost_interval                            1000
        zfs_multilist_num_sublists                        0
        zfs_no_scrub_io                                   0
        zfs_no_scrub_prefetch                             0
        zfs_nocacheflush                                  0
        zfs_nopwrite_enabled                              1
        zfs_object_mutex_size                             64
        zfs_pd_bytes_max                                  52428800
        zfs_per_txg_dirty_frees_percent                   30
        zfs_prefetch_disable                              0
        zfs_read_chunk_size                               1048576
        zfs_read_history                                  0
        zfs_read_history_hits                             0
        zfs_recover                                       0
        zfs_recv_queue_length                             16777216
        zfs_resilver_delay                                2
        zfs_resilver_min_time_ms                          3000
        zfs_scan_idle                                     50
        zfs_scan_ignore_errors                            0
        zfs_scan_min_time_ms                              1000
        zfs_scrub_delay                                   4
        zfs_send_corrupt_data                             0
        zfs_send_queue_length                             16777216
        zfs_sync_pass_deferred_free                       2
        zfs_sync_pass_dont_compress                       5
        zfs_sync_pass_rewrite                             2
        zfs_sync_taskq_batch_pct                          75
        zfs_top_maxinflight                               32
        zfs_txg_history                                   0
        zfs_txg_timeout                                   5
        zfs_vdev_aggregation_limit                        131072
        zfs_vdev_async_read_max_active                    3
        zfs_vdev_async_read_min_active                    1
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_vdev_async_write_active_min_dirty_percent     30
        zfs_vdev_async_write_max_active                   10
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_cache_bshift                             16
        zfs_vdev_cache_max                                16384
        zfs_vdev_cache_size                               0
        zfs_vdev_max_active                               1000
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_vdev_mirror_rotating_seek_inc                 5
        zfs_vdev_mirror_rotating_seek_offset              1048576
        zfs_vdev_queue_depth_pct                          1000
        zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3 avx2
        zfs_vdev_read_gap_limit                           32768
        zfs_vdev_scheduler                                noop
        zfs_vdev_scrub_max_active                         2
        zfs_vdev_scrub_min_active                         1
        zfs_vdev_sync_read_max_active                     10
        zfs_vdev_sync_read_min_active                     10
        zfs_vdev_sync_write_max_active                    10
        zfs_vdev_sync_write_min_active                    10
        zfs_vdev_write_gap_limit                          4096
        zfs_zevent_cols                                   80
        zfs_zevent_console                                0
        zfs_zevent_len_max                                128
        zil_replay_disable                                0
        zil_slog_bulk                                     786432
        zio_delay_max                                     30000
        zio_dva_throttle_enabled                          1
        zio_requeue_io_start_cut_in_line                  1
        zio_taskq_batch_pct                               75
        zvol_inhibit_dev                                  0
        zvol_major                                        230
        zvol_max_discard_blocks                           16384
        zvol_prefetch_bytes                               131072
        zvol_request_sync                                 0
        zvol_threads                                      32
        zvol_volmode                                      1
    

    cu denny
     
  4. 6uellerbpanda

    6uellerbpanda Member
    Proxmox VE Subscriber

    Joined:
    Sep 15, 2015
    Messages:
    37
    Likes Received:
    5
    Ok but is/was there an actual problem ?? Except the false positive ??
     
  5. Denny Fuchs

    Denny Fuchs Member
    Proxmox VE Subscriber

    Joined:
    Jan 21, 2016
    Messages:
    61
    Likes Received:
    4
    hi,

    problem is, that we have issues with the VMs and a lot of
    Code:
    [Sun Oct 14 02:06:07 2018] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [check_ ....
    
    nearly every day. Every check command, which needs to be executes produces the same messages. I think, it happens, while the underlaying ZFS isn't fast enough with all requests ...
     
  6. WhiteStarEOF

    WhiteStarEOF Member

    Joined:
    Mar 6, 2012
    Messages:
    89
    Likes Received:
    9
    I've gotten a lot of the soft lockups, as well as just experiencing atrocious performance with RAIDz2. RAIDz2/RAID6 just isn't very good for performant small block operations.

    You might be able to skirt around the issue by having a couple of SSDs mirrored for ZIL and L2ARC. But ultimately I would recommend moving away from RAIDz2 towards a pool of mirrored vdevs. Without adding any hardware, I was able to move some of my clients over to a pool of mirrors, and resolve nearly all of their performance issues, and the soft lockups messages.
     
  7. 6uellerbpanda

    6uellerbpanda Member
    Proxmox VE Subscriber

    Joined:
    Sep 15, 2015
    Messages:
    37
    Likes Received:
    5
    as already explained by the whitestareof for random io you need a mirrored vdev zpool
     
  8. Denny Fuchs

    Denny Fuchs Member
    Proxmox VE Subscriber

    Joined:
    Jan 21, 2016
    Messages:
    61
    Likes Received:
    4
    hi,

    exactly that is what we are try to do. The only sad thing is, that we have to reinstall the hypervisor. On a different host I've choosen a 2 x raidz1 with stripping and it works well with the exact same hardware.

    But, one question: We have a single M.2 slot, does it make sense to use it as write cache ? What happens, if this drive dies ?
     
  9. 6uellerbpanda

    6uellerbpanda Member
    Proxmox VE Subscriber

    Joined:
    Sep 15, 2015
    Messages:
    37
    Likes Received:
    5
    if you've a lot of sync writes a slog can of course help

    when slog fails it will use the zil on the disks but you won't lose any data, except in that time frame you also loose the whole storage and tgx hasn't flushed the data to the zil but this is very unlikely, I guess ;)
     
    Denny Fuchs likes this.
  10. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    662
    Likes Received:
    95
    ZIL is a speciall storage zone where is landing most of the sync write I/O(but not all) . If you do not have any SLOG, then zil is on the same disks as the zfs pool. When you have a dedicated SLOG device, then the ZIL zone is located ONLY on the SLOG device. Now, when a sync I/O is need, the data go first in the RAM, and on the SLOG(and the application will receive the message: OK, data is on disk now). When the zfs buffers will need to go to the pool(5 secs by default) all data including sync data that was here is write on the pool disks(excuding the SLOG).
    So at any moment sync data IS present in 2 location: RAM and on SLOG. For this resons, IF you lose your SLOG, is not any problem because the data is also in the RAM, and on next buffer flush will be writen on the disks pools. From the moment when the SLOG is broken, then all future sync writes will be done on the disks directly in SYNC mode(with a write speed degradation).
    SLOG data is READ only after a kernel crash/power problem, and IF are some data that are PRESENT on the SLOG, but not present on the disks pool.
     
    Denny Fuchs likes this.
  11. Denny Fuchs

    Denny Fuchs Member
    Proxmox VE Subscriber

    Joined:
    Jan 21, 2016
    Messages:
    61
    Likes Received:
    4
    hi @guletz,

    thanks for the great response. I think we will extend the hosts with the M.2 card and remove the pressure :)
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice