huge IO delay ZFS

coldzeek

Member
Feb 4, 2020
15
0
21
36
Hi,

I recently i migrated to Proxmox 6.1-5 from ESXi and have a problem with IO delay.
I have three ZFS pool:
datastore0: 2 NVME Silicon Motion SM2263XT - mirror (ashift=12, atime=off) 1tb
datastore1: 7 HDD Toshiba HDWD130 - raidz1 (ashift=12, compression=lz4, atime=off) 19tb
datastore2: 3 HDD Toshiba DT01ACA200 - raidz1 (ashift=12, atime=off) 5,3tb

CPU AMD Epyc 7551
RAM ECC 128gb
swapoff
options zfs zfs_arc_max=25179869184

Problem:
If I move kvm or lxc volume from datastore 1 to datastore 2, i have io delay near 30%-35%.
And VM on datastore0 have very huge response on this time.
Server is not ofercomited, avarade load ~10%.
 
Last edited:
Hi,

what Controller do you use to connect this Drives?
Generall only HBAs are recommended for Ceph and ZFS
 
Hi,

Can you show the output of your:

arc_summary

Anyway, is not recomanded to use many pools on the same host. And your arc size seems to be low. And not least if you have a vm or ct with a high io, any move will result in a high io on any storage do you may use!

Good luck / Bafta !
 
Hi,

Can you show the output of your:

arc_summary

Anyway, is not recomanded to use many pools on the same host. And your arc size seems to be low. And not least if you have a vm or ct with a high io, any move will result in a high io on any storage do you may use!

Good luck / Bafta !

Thank you for your answer.

ZFS Subsystem Report Thu Feb 06 08:29:06 2020
Linux 5.3.13-3-pve 0.8.3-pve1
Machine: pve0 (x86_64) 0.8.3-pve1

ARC status: HEALTHY
Memory throttle count: 0

ARC size (current): 23.6 % 15.1 GiB
Target size (adaptive): 23.6 % 15.1 GiB
Min size (hard limit): 6.1 % 3.9 GiB
Max size (high water): 16:1 64.0 GiB
Most Frequently Used (MFU) cache size: 16.7 % 437.6 MiB
Most Recently Used (MRU) cache size: 83.3 % 2.1 GiB
Metadata cache size (hard limit): 75.0 % 48.0 GiB
Metadata cache size (current): 17.2 % 8.2 GiB
Dnode cache size (hard limit): 10.0 % 4.8 GiB
Dnode cache size (current): 3.1 % 153.0 MiB

ARC hash breakdown:
Elements max: 80.6M
Elements current: 100.0 % 80.6M
Collisions: 183.9M
Chain max: 20
Chains: 16.0M

ARC misc:
Deleted: 116.7M
Mutex misses: 5.9k
Eviction skips: 14.1M

ARC total accesses (hits + misses): 391.7M
Cache hit ratio: 94.9 % 371.6M
Cache miss ratio: 5.1 % 20.1M
Actual hit ratio (MFU + MRU hits): 94.4 % 369.7M
Data demand efficiency: 91.5 % 121.6M
Data prefetch efficiency: 24.8 % 11.9M

Cache hits by cache type:
Most frequently used (MFU): 97.2 % 361.2M
Most recently used (MRU): 2.3 % 8.4M
Most frequently used (MFU) ghost: 0.4 % 1.3M
Most recently used (MRU) ghost: 0.3 % 948.3k

Cache hits by data type:
Demand data: 29.9 % 111.3M
Demand prefetch data: 0.8 % 3.0M
Demand metadata: 69.0 % 256.5M
Demand prefetch metadata: 0.2 % 841.4k

Cache misses by data type:
Demand data: 51.6 % 10.4M
Demand prefetch data: 44.5 % 9.0M
Demand metadata: 2.7 % 541.5k
Demand prefetch metadata: 1.2 % 241.9k

DMU prefetch efficiency: 370.0M
Hit ratio: 9.6 % 35.7M
Miss ratio: 90.4 % 334.3M

L2ARC status: HEALTHY
Low memory aborts: 146
Free on write: 71.9k
R/W clashes: 9
Bad checksums: 0
I/O errors: 0

L2ARC size (adaptive): 650.4 GiB
Compressed: 96.0 % 624.4 GiB
Header size: 1.1 % 7.1 GiB

L2ARC breakdown: 20.1M
Hit ratio: 5.9 % 1.2M
Miss ratio: 94.1 % 18.9M
Feeds: 34.6k

L2ARC writes:
Writes sent: 100 % 33.5 KiB

L2ARC evicts:
Lock retries: 0
Upon reading: 0

Solaris Porting Layer (SPL):
spl_hostid 0
spl_hostid_path /etc/hostid
spl_kmem_alloc_max 1048576
spl_kmem_alloc_warn 65536
spl_kmem_cache_expire 2
spl_kmem_cache_kmem_limit 2048
spl_kmem_cache_kmem_threads 4
spl_kmem_cache_magazine_size 0
spl_kmem_cache_max_size 32
spl_kmem_cache_obj_per_slab 8
spl_kmem_cache_obj_per_slab_min 1
spl_kmem_cache_reclaim 0
spl_kmem_cache_slab_limit 16384
spl_max_show_tasks 512
spl_panic_halt 0
spl_schedule_hrtimeout_slack_us 0
spl_taskq_kick 0
spl_taskq_thread_bind 0
spl_taskq_thread_dynamic 1
spl_taskq_thread_priority 1
spl_taskq_thread_sequential 4
 
Tunables:
dbuf_cache_hiwater_pct 10
dbuf_cache_lowater_pct 10
dbuf_cache_max_bytes 2111605056
dbuf_cache_shift 5
dbuf_metadata_cache_max_bytes 1055802528
dbuf_metadata_cache_shift 6
dmu_object_alloc_chunk_shift 7
dmu_prefetch_max 134217728
ignore_hole_birth 1
l2arc_feed_again 1
l2arc_feed_min_ms 50
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_headroom_boost 200
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 100000000
l2arc_write_max 100000000
metaslab_aliquot 524288
metaslab_bias_enabled 1
metaslab_debug_load 0
metaslab_debug_unload 0
metaslab_df_max_search 16777216
metaslab_df_use_largest_segment 0
metaslab_force_ganging 16777217
metaslab_fragmentation_factor_enabled 1
metaslab_lba_weighting_enabled 1
metaslab_preload_enabled 1
send_holes_without_birth_time 1
spa_asize_inflation 24
spa_config_path /etc/zfs/zpool.cache
spa_load_print_vdev_tree 0
spa_load_verify_data 1
spa_load_verify_metadata 1
spa_load_verify_shift 4
spa_slop_shift 5
vdev_removal_max_span 32768
vdev_validate_skip 0
zap_iterate_prefetch 1
zfetch_array_rd_sz 1048576
zfetch_max_distance 8388608
zfetch_max_streams 8
zfetch_min_sec_reap 2
zfs_abd_scatter_enabled 1
zfs_abd_scatter_max_order 10
zfs_abd_scatter_min_size 1536
zfs_admin_snapshot 0
zfs_arc_average_blocksize 8192
zfs_arc_dnode_limit 0
zfs_arc_dnode_limit_percent 10
zfs_arc_dnode_reduce_percent 10
zfs_arc_grow_retry 0
zfs_arc_lotsfree_percent 10
zfs_arc_max 68719476736
zfs_arc_meta_adjust_restarts 4096
zfs_arc_meta_limit 0
zfs_arc_meta_limit_percent 75
zfs_arc_meta_min 0
zfs_arc_meta_prune 10000
zfs_arc_meta_strategy 1
zfs_arc_min 0
zfs_arc_min_prefetch_ms 0
zfs_arc_min_prescient_prefetch_ms 0
zfs_arc_p_dampener_disable 1
zfs_arc_p_min_shift 0
zfs_arc_pc_percent 0
zfs_arc_shrink_shift 0
zfs_arc_sys_free 0
zfs_async_block_max_blocks 100000
zfs_autoimport_disable 1
zfs_checksum_events_per_second 20
zfs_commit_timeout_pct 5
zfs_compressed_arc_enabled 1
zfs_condense_indirect_commit_entry_delay_ms 0
zfs_condense_indirect_vdevs_enable 1
zfs_condense_max_obsolete_bytes 1073741824
zfs_condense_min_mapping_bytes 131072
zfs_dbgmsg_enable 1
zfs_dbgmsg_maxsize 4194304
zfs_dbuf_state_index 0
zfs_ddt_data_is_special 1
zfs_deadman_checktime_ms 60000
zfs_deadman_enabled 1
zfs_deadman_failmode wait
zfs_deadman_synctime_ms 600000
zfs_deadman_ziotime_ms 300000
zfs_dedup_prefetch 0
zfs_delay_min_dirty_percent 60
zfs_delay_scale 500000
zfs_delete_blocks 20480
zfs_dirty_data_max 4294967296
zfs_dirty_data_max_max 4294967296
zfs_dirty_data_max_max_percent 25
zfs_dirty_data_max_percent 10
zfs_dirty_data_sync_percent 20
zfs_disable_ivset_guid_check 0
zfs_dmu_offset_next_sync 0
zfs_expire_snapshot 300
zfs_flags 0
zfs_free_bpobj_enabled 1
zfs_free_leak_on_eio 0
zfs_free_min_time_ms 1000
zfs_immediate_write_sz 262144
zfs_initialize_value 16045690984833335022
zfs_key_max_salt_uses 400000000
zfs_lua_max_instrlimit 100000000
zfs_lua_max_memlimit 104857600
zfs_max_missing_tvds 0
zfs_max_recordsize 1048576
zfs_metaslab_fragmentation_threshold 70
zfs_metaslab_segment_weight_enabled 1
zfs_metaslab_switch_threshold 2
zfs_mg_fragmentation_threshold 95
zfs_mg_noalloc_threshold 0
zfs_multihost_fail_intervals 10
zfs_multihost_history 0
zfs_multihost_import_intervals 20
zfs_multihost_interval 1000
zfs_multilist_num_sublists 0
zfs_no_scrub_io 0
zfs_no_scrub_prefetch 0
zfs_nocacheflush 0
zfs_nopwrite_enabled 1
zfs_object_mutex_size 64
zfs_obsolete_min_time_ms 500
 
zfs_override_estimate_recordsize 0
zfs_pd_bytes_max 52428800
zfs_per_txg_dirty_frees_percent 5
zfs_prefetch_disable 0
zfs_read_chunk_size 1048576
zfs_read_history 0
zfs_read_history_hits 0
zfs_reconstruct_indirect_combinations_max 4096
zfs_recover 0
zfs_recv_queue_length 16777216
zfs_removal_ignore_errors 0
zfs_removal_suspend_progress 0
zfs_remove_max_segment 16777216
zfs_resilver_disable_defer 0
zfs_resilver_min_time_ms 3000
zfs_scan_checkpoint_intval 7200
zfs_scan_fill_weight 3
zfs_scan_ignore_errors 0
zfs_scan_issue_strategy 0
zfs_scan_legacy 0
zfs_scan_max_ext_gap 2097152
zfs_scan_mem_lim_fact 20
zfs_scan_mem_lim_soft_fact 20
zfs_scan_strict_mem_lim 0
zfs_scan_suspend_progress 0
zfs_scan_vdev_limit 4194304
zfs_scrub_min_time_ms 1000
zfs_send_corrupt_data 0
zfs_send_queue_length 16777216
zfs_send_unmodified_spill_blocks 1
zfs_slow_io_events_per_second 20
zfs_spa_discard_memory_limit 16777216
zfs_special_class_metadata_reserve_pct 25
zfs_sync_pass_deferred_free 2
zfs_sync_pass_dont_compress 8
zfs_sync_pass_rewrite 2
zfs_sync_taskq_batch_pct 75
zfs_trim_extent_bytes_max 134217728
zfs_trim_extent_bytes_min 32768
zfs_trim_metaslab_skip 0
zfs_trim_queue_limit 10
zfs_trim_txg_batch 32
zfs_txg_history 100
zfs_txg_timeout 1
zfs_unlink_suspend_progress 0
zfs_user_indirect_is_special 1
zfs_vdev_aggregate_trim 0
zfs_vdev_aggregation_limit 1048576
zfs_vdev_aggregation_limit_non_rotating 131072
zfs_vdev_async_read_max_active 3
zfs_vdev_async_read_min_active 1
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_async_write_max_active 10
zfs_vdev_async_write_min_active 2
zfs_vdev_cache_bshift 16
zfs_vdev_cache_max 16384
zfs_vdev_cache_size 0
zfs_vdev_default_ms_count 200
zfs_vdev_initializing_max_active 1
zfs_vdev_initializing_min_active 1
zfs_vdev_max_active 1000
zfs_vdev_min_ms_count 16
zfs_vdev_mirror_non_rotating_inc 0
zfs_vdev_mirror_non_rotating_seek_inc 1
zfs_vdev_mirror_rotating_inc 0
zfs_vdev_mirror_rotating_seek_inc 5
zfs_vdev_mirror_rotating_seek_offset 1048576
zfs_vdev_ms_count_limit 131072
zfs_vdev_queue_depth_pct 1000
zfs_vdev_raidz_implcycle [fastest] original scalar sse2 ssse3 avx2
zfs_vdev_read_gap_limit 32768
zfs_vdev_removal_max_active 2
zfs_vdev_removal_min_active 1
zfs_vdev_scheduler unused
zfs_vdev_scrub_max_active 2
zfs_vdev_scrub_min_active 1
zfs_vdev_sync_read_max_active 10
zfs_vdev_sync_read_min_active 10
zfs_vdev_sync_write_max_active 10
zfs_vdev_sync_write_min_active 10
zfs_vdev_trim_max_active 2
zfs_vdev_trim_min_active 1
zfs_vdev_write_gap_limit 4096
zfs_zevent_cols 80
zfs_zevent_console 0
zfs_zevent_len_max 1024
zfs_zil_clean_taskq_maxalloc 1048576
zfs_zil_clean_taskq_minalloc 1024
zfs_zil_clean_taskq_nthr_pct 100
zil_maxblocksize 131072
zil_nocacheflush 0
zil_replay_disable 0
zil_slog_bulk 786432
zio_deadman_log_all 0
zio_dva_throttle_enabled 1
zio_requeue_io_start_cut_in_line 1
zio_slow_io_ms 30000
zio_taskq_batch_pct 75
zvol_inhibit_dev 0
zvol_major 230
zvol_max_discard_blocks 16384
zvol_prefetch_bytes 131072
zvol_request_sync 0
zvol_threads 32
zvol_volmode 1

VDEV cache disabled, skipping section

ZIL committed transactions: 2.7M
Commit requests: 137.6k
Flushes to stable storage: 137.6k
Transactions to SLOG storage pool: 4.0 GiB 93.9k
Transactions to non-SLOG storage pool: 0 Bytes 0
 
I changed configuration zpool.
Destroy datastore0 and datastore 2
Add to datastore1 zil SATA3 250gb SSD and NVME 1tb l2ARC.
I had tired to change Tunables atributes, but it did not help.
 
Last edited:
I guess you run in a NUMA related problem, but not sure.
Can you show the kernel module parameter

Code:
perl -e 'my $dir="/sys/module/zfs/parameters"; opendir(DH, $dir); my @files = readdir(DH); closedir(DH); foreach my $file (@files){ my $filepath = "$dir/$file"; open(FD, "<:encoding(UTF-8)", $filepath) || die $@; my $para = <FD>; close(FD); print "$file: $para" };'
 
I guess you run in a NUMA related problem, but not sure.
Can you show the kernel module parameter

Code:
perl -e 'my $dir="/sys/module/zfs/parameters"; opendir(DH, $dir); my @files = readdir(DH); closedir(DH); foreach my $file (@files){ my $filepath = "$dir/$file"; open(FD, "<:encoding(UTF-8)", $filepath) || die $@; my $para = <FD>; close(FD); print "$file: $para" };'

Thank you Wolfgang for watching my problem.
This motherboard is support only one CPU.

.: ..: zfs_arc_p_min_shift: 0
zvol_request_sync: 0
vdev_validate_skip: 0
zfs_object_mutex_size: 64
spa_slop_shift: 5
zfs_sync_taskq_batch_pct: 75
zfs_vdev_async_write_max_active: 10
zfs_multilist_num_sublists: 0
zil_nocacheflush: 0
zfs_trim_metaslab_skip: 0
zfs_trim_extent_bytes_min: 32768
zfs_checksum_events_per_second: 20
zfs_no_scrub_prefetch: 0
zfs_vdev_sync_read_min_active: 10
zfs_dmu_offset_next_sync: 0
metaslab_debug_load: 0
zio_deadman_log_all: 0
zfs_vdev_mirror_rotating_seek_inc: 5
zfs_vdev_mirror_non_rotating_inc: 0
zfs_read_history: 0
zfs_multihost_history: 0
zfs_metaslab_switch_threshold: 2
metaslab_fragmentation_factor_enabled: 1
zfs_admin_snapshot: 0
zfs_delete_blocks: 20480
zfs_arc_meta_prune: 10000
zfs_free_min_time_ms: 1000
zfs_removal_suspend_progress: 0
zfs_scrub_min_time_ms: 1000
zfs_vdev_default_ms_count: 200
zfs_dedup_prefetch: 0
zfs_txg_history: 100
zfs_vdev_max_active: 1000
zfs_vdev_sync_write_min_active: 10
spa_load_verify_data: 1
zfs_async_block_max_blocks: 100000
zfs_dirty_data_max_max: 4294967296
dbuf_cache_shift: 5
zfs_send_corrupt_data: 0
dbuf_cache_lowater_pct: 10
zfs_send_queue_length: 16777216
zfs_lua_max_instrlimit: 100000000
zfs_scan_fill_weight: 3
dmu_object_alloc_chunk_shift: 7
zfs_arc_shrink_shift: 0
zfs_resilver_min_time_ms: 3000
zfs_trim_extent_bytes_max: 134217728
zfs_free_bpobj_enabled: 1
zfs_vdev_mirror_non_rotating_seek_inc: 1
zfs_vdev_cache_max: 16384
zfs_condense_min_mapping_bytes: 131072
ignore_hole_birth: 1
zfs_multihost_fail_intervals: 10
zfs_arc_min_prefetch_ms: 0
zfs_arc_sys_free: 0
metaslab_df_use_largest_segment: 0
zfs_sync_pass_dont_compress: 8
zio_taskq_batch_pct: 75
zfs_remove_max_segment: 16777216
zfs_arc_meta_limit_percent: 75
zfs_arc_p_dampener_disable: 1
spa_load_verify_metadata: 1
dbuf_cache_hiwater_pct: 10
zfs_read_chunk_size: 1048576
zfs_arc_grow_retry: 0
zfs_vdev_trim_min_active: 1
metaslab_aliquot: 524288
zfs_vdev_async_read_min_active: 1
zfs_vdev_cache_bshift: 16
metaslab_preload_enabled: 1
zfs_deadman_failmode: wait
l2arc_feed_min_ms: 200
zfs_read_history_hits: 0
zfetch_max_distance: 8388608
send_holes_without_birth_time: 1
zfs_max_recordsize: 1048576
zfs_dbuf_state_index: 0
zio_slow_io_ms: 30000
dbuf_cache_max_bytes: 2111616064
zfs_zevent_cols: 80
zfs_scan_mem_lim_soft_fact: 20
zfs_no_scrub_io: 0
zil_slog_bulk: 786432
spa_asize_inflation: 24
l2arc_write_boost: 8388608
zfs_abd_scatter_min_size: 1536
zfs_arc_meta_limit: 0
zfs_deadman_enabled: 1
zfs_abd_scatter_enabled: 1
zfs_arc_min_prescient_prefetch_ms: 0
zfs_vdev_async_write_active_min_dirty_percent: 30
zfs_free_leak_on_eio: 0
zfs_vdev_cache_size: 0
zfs_vdev_write_gap_limit: 4096
zfs_scan_issue_strategy: 0
zfs_max_missing_tvds: 0
l2arc_headroom: 2
zfs_per_txg_dirty_frees_percent: 5
zfs_compressed_arc_enabled: 1
dbuf_metadata_cache_max_bytes: 1055808032
zfs_scan_ignore_errors: 0
zfs_vdev_removal_max_active: 2
zfs_condense_indirect_commit_entry_delay_ms: 0
zfs_metaslab_segment_weight_enabled: 1
zfs_dirty_data_max_max_percent: 25
metaslab_force_ganging: 16777217
zio_dva_throttle_enabled: 1
zfs_vdev_scrub_min_active: 1
zfs_arc_average_blocksize: 8192
zfs_scan_suspend_progress: 0
zfs_vdev_queue_depth_pct: 1000
zfs_multihost_interval: 1000
zfs_vdev_aggregate_trim: 0
zfs_condense_indirect_vdevs_enable: 1
zio_requeue_io_start_cut_in_line: 1
zfetch_max_streams: 8
zfs_multihost_import_intervals: 20
zfs_ddt_data_is_special: 1
zfs_zevent_console: 0
zfs_zil_clean_taskq_minalloc: 1024
zfs_sync_pass_deferred_free: 2
zfs_vdev_initializing_min_active: 1
zfs_nocacheflush: 0
zfs_arc_dnode_limit: 0
zfs_scan_legacy: 0
zfs_dbgmsg_enable: 1
zfs_scan_vdev_limit: 4194304
zfs_vdev_raidz_impl: cycle [fastest] original scalar sse2 ssse3 avx2 zvol_threads: 32
zfs_vdev_async_write_min_active: 2
zfs_removal_ignore_errors: 0
zfs_vdev_sync_read_max_active: 10
l2arc_headroom_boost: 200
zfs_reconstruct_indirect_combinations_max: 4096
zfs_sync_pass_rewrite: 2
spa_config_path: /etc/zfs/zpool.cache
zfs_pd_bytes_max: 52428800
metaslab_df_max_search: 16777216
zfs_flags: 0
zfs_deadman_checktime_ms: 60000
zap_iterate_prefetch: 1
spa_load_print_vdev_tree: 0
zfs_dirty_data_max_percent: 10
zfs_user_indirect_is_special: 1
zfs_scan_checkpoint_intval: 7200
dbuf_metadata_cache_shift: 6
zfetch_min_sec_reap: 2
zfs_zil_clean_taskq_nthr_pct: 100
zfs_key_max_salt_uses: 400000000
zfs_mg_noalloc_threshold: 0
zfs_deadman_ziotime_ms: 300000
zfs_special_class_metadata_reserve_pct: 25
zfs_arc_meta_min: 0
zvol_prefetch_bytes: 131072
zfs_deadman_synctime_ms: 600000
zfs_send_unmodified_spill_blocks: 1
zfs_autoimport_disable: 1
zfs_arc_min: 0
zfs_trim_queue_limit: 10
l2arc_noprefetch: 1
zfs_nopwrite_enabled: 1
l2arc_feed_again: 1
zfs_vdev_sync_write_max_active: 10
zfs_prefetch_disable: 0
zfetch_array_rd_sz: 1048576
zfs_metaslab_fragmentation_threshold: 70
l2arc_write_max: 8388608
zfs_scan_mem_lim_fact: 20
zfs_dbgmsg_maxsize: 4194304
zfs_override_estimate_recordsize: 0
zfs_vdev_read_gap_limit: 32768
zfs_dirty_data_sync_percent: 20
zfs_delay_min_dirty_percent: 60
zfs_recv_queue_length: 16777216
zfs_vdev_async_write_active_max_dirty_percent: 60
zfs_disable_ivset_guid_check: 0
zfs_arc_lotsfree_percent: 10
zfs_immediate_write_sz: 32768
zil_replay_disable: 0
zil_maxblocksize: 131072
zfs_vdev_mirror_rotating_inc: 0
zvol_volmode: 1
zfs_unlink_suspend_progress: 0
zfs_arc_meta_strategy: 1
zfs_obsolete_min_time_ms: 500
zfs_vdev_trim_max_active: 2
zfs_resilver_disable_defer: 0
metaslab_bias_enabled: 1
zfs_vdev_async_read_max_active: 3
l2arc_feed_secs: 1
zfs_commit_timeout_pct: 5
zfs_arc_max: 0
spa_load_verify_shift: 4
zfs_trim_txg_batch: 32
vdev_removal_max_span: 32768
zfs_zevent_len_max: 1024
zfs_scan_max_ext_gap: 2097152
zfs_scan_strict_mem_lim: 0
zfs_vdev_aggregation_limit_non_rotating: 131072
zfs_arc_meta_adjust_restarts: 4096
l2arc_norw: 0
zfs_recover: 0
zvol_inhibit_dev: 0
zfs_vdev_aggregation_limit: 1048576
zfs_condense_max_obsolete_bytes: 1073741824
dmu_prefetch_max: 134217728
zvol_major: 230
metaslab_debug_unload: 0
zfs_slow_io_events_per_second: 20
zfs_lua_max_memlimit: 104857600
metaslab_lba_weighting_enabled: 1
zfs_zil_clean_taskq_maxalloc: 1048576
zfs_txg_timeout: 5
zfs_vdev_removal_min_active: 1
zfs_vdev_min_ms_count: 16
zfs_vdev_scrub_max_active: 2
zfs_vdev_mirror_rotating_seek_offset: 1048576
zfs_arc_pc_percent: 0
zfs_vdev_scheduler: unused
zvol_max_discard_blocks: 16384
zfs_arc_dnode_reduce_percent: 10
zfs_vdev_ms_count_limit: 131072
zfs_dirty_data_max: 4294967296
zfs_abd_scatter_max_order: 10
zfs_spa_discard_memory_limit: 16777216
zfs_initialize_value: 16045690984833335022
zfs_expire_snapshot: 300
zfs_vdev_initializing_max_active: 1
zfs_arc_dnode_limit_percent: 10
zfs_delay_scale: 500000
zfs_mg_fragmentation_threshold: 95
 
Esterday I updated Proxmox to 6.1-7
HPET (high precision timer) is enabled in bios

In syslog I have strange lines, maybe these problems are related.

Code:
Feb  5 07:28:01 pve0 systemd[1]: Started Proxmox VE replication runner.
Feb  5 07:28:42 pve0 kernel: [24250.826445] Uhhuh. NMI received for unknown reason 2c on CPU 63.
Feb  5 07:28:42 pve0 kernel: [24250.826445] Do you have a strange power saving mode enabled?
Feb  5 07:28:42 pve0 kernel: [24250.826446] Dazed and confused, but trying to continue
 
Last edited:
The first Gen EPYC in on a single Socket a NUMA architecture.
Wow, thanks, this shows that I need to read more bases.

And does you have all power savings disabled in the BIOS?

In bios -> ACPI control I did not have performance setings, only HPET settings

Power / Performance Determinism have auto setting (In the manual for the motherboard is written the value auto = Performance)
 
What MB do you have?
 
Hello forum,
I changed my configuration but the problem is that if I start loading zfs a little (for example, to backup over sftp a large number of small files 30-150kb, copied in 1.5 hours ~48000 files or start a mysql database) then iodelay increases very quickly and the rest of the containers and vm slow down very much. IO delay ~ 20%- 35%.
I tried to make fine adjustments to zfs and mysql and it gave results, but any operations with files cause big brakes in virtual environments.

I have one ZFS pool:
datastore1: SATA 6 HDD Toshiba HDWD130 - raidz1 (ashift=12, compression=lz4, atime=off)
+ logs: msata SSD 256 gb
+ cache: NVME SSD 1024GB
+ spare: SATA 1 HDD Toshiba HDWD130
running 11 LXC and 4 KVM

CPU AMD Epyc 7551
RAM ECC 128gb
swapoff
options zfs zfs_arc_max = 0

proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 2.0.1-1+pve8
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
pve-zsync: 2.0-2
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

any suggestions?
 

Attachments

Last edited:
Hi,

How fast is your ZIL(logs) SSD in 4k sync writes?
Because if this is not fast this can be the bottleneck.
Also, I Would consider removing the cache device because of this needs memory that is not available for ARC.
 
The situation has not changed: (
iotop -a-P -o-d 5
shows that the problem is reading files and not writing
 
How fast are your ZIL devices with 4k sync writes?
You can benchmark it like this.

Code:
fio --size=20G --bs=4k --rw=write --direct=1 --sync=1 --runtime=60  --group_reporting --name=test --ramp_time=5s --filename=/dev/sd<x>