Proxmox ZFS mirror SSD slow

belrpr

Member
Aug 9, 2017
48
5
6
39
Hi,

I've installed proxmox on 2 Samsung 850pro 256Gb drives during the setup as zfs mirrors.
I've added the ashift value of 12.

Both disk smart values report passed.
But I've noticed that my windows vm was quite slow. So slow that I converted everything to use linux so I could use containers.
The containers are quick and responsive.
But I was copying some video files to what I thought was my media drive (Forgot to bind the other ZFS volume) so I was basically writing to the ssd. It was timing out my filezilla and in my other container I was just installing ffmpeg (apt-get install).

What is wrong?
 
Code:
zpool iostat -v
rpool                                   11.4G   465G      0    191      0  23.9M
  mirror                                11.4G   465G      0    191      0  23.9M
    wwn-0x50025385a0282b11-part2            -      -      0     95      0  12.0M
    wwn-0x50025385a0282a87-part2            -      -      0     95      0  12.0M

atop
Code:
|
DSK |          sdc |  busy    100% |              |  read      46 | write   1128 |               | KiB/r    178 | KiB/w    126  |              | MBr/s    0.8  | MBw/s   13.9 |               | avq     2.24 | avio 8.52 ms  |              |
DSK |          sdh |  busy    100% |              |  read      46 | write   1083 |               | KiB/r    178 | KiB/w    126  |              | MBr/s    0.8  | MBw/s   13.4 |               | avq     2.33 | avio 8.86 ms  |              |
And this happened in file zilla:
Code:
Status:    Starting upload of D:\Movies\Wonder Woman (2017)\Wonder Woman 2017.mkv
Command:    put "D:\Movies\Wonder Woman (2017)\Wonder Woman 2017.mkv" "Wonder Woman 2017.mkv"
Command:    local:D:\Movies\Wonder Woman (2017)\Wonder Woman 2017.mkv => remote:/var/lib/plexmediaserver/Library/Wonder Woman (2017)/Wonder Woman 2017.mkv
Error:    Connection timed out after 20 seconds of inactivity
Error:    File transfer failed after transferring 907,902,976 bytes in 32 seconds

Not nearly as the performance should be
 
As you can see from atop your disks are busy 100%
Try to set sync=disabled to your ZFS to see any changes.
 
Did the following:
zfs set sync=disabled rpool/vm (Where my lxc containers are)
Code:
DSK |          sdc |  busy    100% |              |  read      32 | write    304 |               | KiB/r    256 | KiB/w    128  |              | MBr/s    0.8  | MBw/s    3.8 |               | avq     2.28 | avio 29.8 ms  |              |
DSK |          sdh |  busy    100% |              |  read      32 | write    310 |               | KiB/r    256 | KiB/w    128  |              | MBr/s    0.8  | MBw/s    3.9 |               | avq     2.19 | avio 29.2 ms  |              |
 
Of course, here it is:
Code:
------------------------------------------------------------------------
ZFS Subsystem Report                            Sun Dec 03 00:06:03 2017
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                432.81m
        Mutex Misses:                           90.18k
        Evict Skips:                            90.18k

ARC Size:                               100.01% 48.01   GiB
        Target Size: (Adaptive)         100.00% 48.00   GiB
        Min Size (Hard Limit):          6.15%   2.95    GiB
        Max Size (High Water):          16:1    48.00   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       98.45%  47.26   GiB
        Frequently Used Cache Size:     1.55%   760.73  MiB

ARC Hash Breakdown:
        Elements Max:                           8.46m
        Elements Current:               6.87%   581.28k
        Collisions:                             82.82m
        Chain Max:                              8
        Chains:                                 9.31k

ARC Total accesses:                                     423.30m
        Cache Hit Ratio:                46.87%  198.38m
        Cache Miss Ratio:               53.13%  224.91m
        Actual Hit Ratio:               30.24%  128.00m

        Data Demand Efficiency:         49.35%  229.71m
        Data Prefetch Efficiency:       42.18%  173.49m

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             32.37%  64.21m
          Most Recently Used:           46.73%  92.71m
          Most Frequently Used:         17.79%  35.29m
          Most Recently Used Ghost:     2.04%   4.04m
          Most Frequently Used Ghost:   1.07%   2.13m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  57.14%  113.37m
          Prefetch Data:                36.88%  73.17m
          Demand Metadata:              2.44%   4.84m
          Prefetch Metadata:            3.53%   7.01m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  51.73%  116.35m
          Prefetch Data:                44.60%  100.32m
          Demand Metadata:              3.24%   7.28m
          Prefetch Metadata:            0.43%   973.07k


DMU Prefetch Efficiency:                                        1.45b
        Hit Ratio:                      5.61%   81.49m
        Miss Ratio:                     94.39%  1.37b



ZFS Tunable:
        zvol_volmode                                      1
        l2arc_headroom                                    2
        dbuf_cache_max_shift                              5
        zfs_free_leak_on_eio                              0
        zfs_free_max_blocks                               100000
        zfs_read_chunk_size                               1048576
        metaslab_preload_enabled                          1
        zfs_dedup_prefetch                                0
        zfs_txg_history                                   0
        zfs_scrub_delay                                   4
        zfs_vdev_async_read_max_active                    3
        zfs_read_history                                  0
        zfs_arc_sys_free                                  0
        l2arc_write_max                                   8388608
        zil_slog_bulk                                     786432
        zfs_dbuf_state_index                              0
        zfs_sync_taskq_batch_pct                          75
        metaslab_debug_unload                             0
        zvol_inhibit_dev                                  0
        zfs_abd_scatter_enabled                           1
        zfs_arc_pc_percent                                0
        zfetch_max_streams                                8
        zfs_recover                                       0
        metaslab_fragmentation_factor_enabled             1
        zfs_deadman_checktime_ms                          5000
        zfs_sync_pass_rewrite                             2
        zfs_object_mutex_size                             64
        zfs_arc_min_prefetch_lifespan                     0
        zfs_arc_meta_prune                                10000
        zfs_read_history_hits                             0
        zfetch_max_distance                               8388608
        l2arc_norw                                        0
        zfs_dirty_data_max_percent                        10
        zfs_per_txg_dirty_frees_percent                   30
        zfs_arc_meta_min                                  0
        metaslabs_per_vdev                                200
        zfs_arc_meta_adjust_restarts                      4096
        spa_load_verify_maxinflight                       10000
        spa_load_verify_metadata                          1
        zfs_multihost_history                             0
        zfs_send_corrupt_data                             0
        zfs_delay_min_dirty_percent                       60
        zfs_vdev_sync_read_max_active                     10
        zfs_dbgmsg_enable                                 0
        zfs_metaslab_segment_weight_enabled               1
        zio_requeue_io_start_cut_in_line                  1
        l2arc_headroom_boost                              200
        zfs_zevent_cols                                   80
        zfs_dmu_offset_next_sync                          0
        spa_config_path                                   /etc/zfs/zpool.cache
        zfs_vdev_cache_size                               0
        dbuf_cache_hiwater_pct                            10
        zfs_multihost_interval                            1000
        zfs_multihost_fail_intervals                      5
        zio_dva_throttle_enabled                          1
        zfs_vdev_sync_write_min_active                    10
        zfs_vdev_scrub_max_active                         2
        ignore_hole_birth                                 1
        zvol_major                                        230
        zil_replay_disable                                0
        zfs_dirty_data_max_max_percent                    25
        zfs_expire_snapshot                               300
        zfs_sync_pass_deferred_free                       2
        spa_asize_inflation                               24
        dmu_object_alloc_chunk_shift                      7
        zfs_vdev_mirror_rotating_seek_offset              1048576
        l2arc_feed_secs                                   1
        zfs_autoimport_disable                            1
        zfs_arc_p_aggressive_disable                      1
        zfs_zevent_len_max                                384
        zfs_arc_meta_limit_percent                        75
        l2arc_noprefetch                                  1
        zfs_vdev_raidz_impl                               [fastest] original sca                                                                                                                                                             lar sse2 ssse3
        zfs_arc_meta_limit                                0
        zfs_flags                                         0
        zfs_dirty_data_max_max                            4294967296
        zfs_arc_average_blocksize                         8192
        zfs_vdev_cache_bshift                             16
        zfs_vdev_async_read_min_active                    1
        zfs_arc_dnode_reduce_percent                      10
        zfs_free_bpobj_enabled                            1
        zfs_arc_grow_retry                                0
        zfs_vdev_mirror_rotating_inc                      0
        l2arc_feed_again                                  1
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_arc_lotsfree_percent                          10
        zfs_zevent_console                                0
        zvol_prefetch_bytes                               131072
        zfs_free_min_time_ms                              1000
        zfs_arc_dnode_limit_percent                       10
        zio_taskq_batch_pct                               75
        dbuf_cache_max_bytes                              104857600
        spa_load_verify_data                              1
        zfs_delete_blocks                                 20480
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_multihost_import_intervals                    10
        zfs_dirty_data_max                                4294967296
        zfs_vdev_async_write_max_active                   10
        zfs_dbgmsg_maxsize                                4194304
        zfs_nocacheflush                                  0
        zfetch_array_rd_sz                                1048576
        zfs_arc_meta_strategy                             1
        zfs_dirty_data_sync                               67108864
        zvol_max_discard_blocks                           16384
        zvol_threads                                      32
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_arc_p_dampener_disable                        1
        zfs_txg_timeout                                   5
        metaslab_aliquot                                  524288
        zfs_mdcomp_disable                                0
        zfs_vdev_sync_read_min_active                     10
        zfs_arc_dnode_limit                               0
        dbuf_cache_lowater_pct                            10
        zfs_abd_scatter_max_order                         10
        metaslab_debug_load                               0
        zfs_vdev_aggregation_limit                        131072
        metaslab_lba_weighting_enabled                    1
        zfs_vdev_scheduler                                noop
        zfs_vdev_scrub_min_active                         1
        zfs_no_scrub_io                                   0
        zfs_vdev_cache_max                                16384
        zfs_scan_idle                                     50
        zfs_arc_shrink_shift                              0
        spa_slop_shift                                    5
        zfs_vdev_mirror_rotating_seek_inc                 5
        zfs_deadman_synctime_ms                           1000000
        send_holes_without_birth_time                     1
        metaslab_bias_enabled                             1
        zvol_request_sync                                 0
        zfs_admin_snapshot                                1
        zfs_no_scrub_prefetch                             0
        zfs_metaslab_fragmentation_threshold              70
        zfs_max_recordsize                                1048576
        zfs_arc_min                                       0
        zfs_nopwrite_enabled                              1
        zfs_arc_p_min_shift                               0
        zfs_multilist_num_sublists                        0
        zfs_vdev_queue_depth_pct                          1000
        zfs_mg_fragmentation_threshold                    85
        l2arc_write_boost                                 8388608
        zfs_prefetch_disable                              0
        l2arc_feed_min_ms                                 200
        zio_delay_max                                     30000
        zfs_vdev_write_gap_limit                          4096
        zfs_pd_bytes_max                                  52428800
        zfs_scan_min_time_ms                              1000
        zfs_resilver_min_time_ms                          3000
        zfs_delay_scale                                   500000
        zfs_vdev_async_write_active_min_dirty_percent     30
        zfs_vdev_sync_write_max_active                    10
        zfs_mg_noalloc_threshold                          0
        zfs_deadman_enabled                               1
        zfs_resilver_delay                                2
        zfs_metaslab_switch_threshold                     2
        zfs_arc_max                                       51539607552
        zfs_top_maxinflight                               32
        zfetch_min_sec_reap                               2
        zfs_immediate_write_sz                            32768
        zfs_vdev_async_write_min_active                   2
        zfs_sync_pass_dont_compress                       5
        zfs_vdev_read_gap_limit                           32768
        zfs_compressed_arc_enabled                        1
        zfs_vdev_max_active                               1000
 
It is possible to tune zfs_dirty_* but I think the problem is inside SSD. They are not enterprise SSD and maybe they do cleanup block before new data is written. And that is taking the time.

What is the sector size of these SSD?
 
It is possible to tune zfs_dirty_* but I think the problem is inside SSD. They are not enterprise SSD and maybe they do cleanup block before new data is written. And that is taking the time.

What is the sector size of these SSD?
I used a shift 12 for 4k but they report as 512 but the internet isn't conclusive about what the best value would be.
 
smartctl /dev/sdc -a
Code:
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.17-5-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 PRO Series
Serial Number:    S1AXNSAF714631P
LU WWN Device Id: 5 002538 5a0282b11
Firmware Version: DXM06B0Q
User Capacity:    512,110,190,592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 10 18:51:12 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (53956) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp                                                                                                                                                             ort.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  35) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                                                                                                                                             FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -                                                                                                                                                                    0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -                                                                                                                                                                    25787
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -                                                                                                                                                                    103
177 Wear_Leveling_Count     0x0013   088   088   000    Pre-fail  Always       -                                                                                                                                                                    403
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -                                                                                                                                                                    0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -                                                                                                                                                                    0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -                                                                                                                                                                    0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -                                                                                                                                                                    0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -                                                                                                                                                                    0
190 Airflow_Temperature_Cel 0x0032   079   055   000    Old_age   Always       -                                                                                                                                                                    21
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -                                                                                                                                                                    0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -                                                                                                                                                                    0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -                                                                                                                                                                    102
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -                                                                                                                                                                    60429360077

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  255        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl /dev/sdh -a
Code:
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.17-5-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 PRO Series
Serial Number:    S1AXNSAF714493A
LU WWN Device Id: 5 002538 5a0282a87
Firmware Version: DXM06B0Q
User Capacity:    512,110,190,592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 10 18:52:20 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (53956) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  35) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       25788
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       104
177 Wear_Leveling_Count     0x0013   087   087   000    Pre-fail  Always       -       447
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   080   054   000    Old_age   Always       -       20
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       103
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       60789679172

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  255        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
1. Recreate ZFS pool with ashift=9 because your SSD Sector Size: 512 bytes logical/physical
2. Your SSD have more than 25k hours on and total written data is not small too. Check weak level.

It is 2 things what may impact your performance.
 
Wear level is 18 which is not a lot.
Recreating the pool is more difficult because it host my proxmox installation.
Anyone has a good solution on how to do this?
 
Clone your filesystem to another hdd and after restart you could recreate pool and clone it back.

SSD drop performance then new data need to be written on existing data. It have to clean sector before write (whats why TRIM and GC is needed to SSD)
 
Performance is noticeable better getting write speeds of 110Mbps instead of 30.
I've just reinstalled proxmox after making backups of the ctx and vm's to another zfs volume.
Installed proxmox.
imported the zfs volumes.
Restored both the interface and storage.cfg
rebooted.
Restored the vm's and everything is running.

Will test in a week to see if the performance is still good.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!