Dell T340
32GB ram
WD Blue 512GB SSDs
No RAID
With a clean install of 6.1 or 5.4 and ZFS either as root or a seperate disk this server appeared slow and unstable with reboots or locking with out of memory messages on the console.
The surefire way to cause the system to reboot is by running a write intensive workload in a VM, for testing i've been running a windows 10 VM with CrystalDiskMark 1GB sequential test and the system will reboot 100% of the time on the write portion of the test. I only have 4GB RAM for the VM and nothing else running on the system and yet right before the system reboots I see it jump from about 21GB free to no free memory. Even if i add some swap it will try to fill the swap and eventually crash the system.
This only happens when the workload is on the zfs storage yet I've tried reducing c_max to 8GB from the default 16GB and nothing changed, it looks like arc is not at fault here but I'm not sure how to prove that.
I've used the same configurations, testing methods and the same model of drives on an old 3rd gen i5/20GB workstation and can't reproduce the issue. I also have a number of other servers and repurposed workstations where I use the same general configurations and workloads and I don't see it anywhere else. i'm having trouble imagining how it could be hardware related but i'm not sure what to test to prove it either way.
I've tried to trigger the same issue unsuccessfully by running something like this on the server: dd if=/dev/urandom of=/rpool/data/output bs=4k count=1000k
Any ideas and troubleshooting steps would be much appreciated.
Last "watch -n1 cat /proc/spl/kstat/zfs/arcstats" before ssh dies:
32GB ram
WD Blue 512GB SSDs
No RAID
With a clean install of 6.1 or 5.4 and ZFS either as root or a seperate disk this server appeared slow and unstable with reboots or locking with out of memory messages on the console.
The surefire way to cause the system to reboot is by running a write intensive workload in a VM, for testing i've been running a windows 10 VM with CrystalDiskMark 1GB sequential test and the system will reboot 100% of the time on the write portion of the test. I only have 4GB RAM for the VM and nothing else running on the system and yet right before the system reboots I see it jump from about 21GB free to no free memory. Even if i add some swap it will try to fill the swap and eventually crash the system.
This only happens when the workload is on the zfs storage yet I've tried reducing c_max to 8GB from the default 16GB and nothing changed, it looks like arc is not at fault here but I'm not sure how to prove that.
I've used the same configurations, testing methods and the same model of drives on an old 3rd gen i5/20GB workstation and can't reproduce the issue. I also have a number of other servers and repurposed workstations where I use the same general configurations and workloads and I don't see it anywhere else. i'm having trouble imagining how it could be hardware related but i'm not sure what to test to prove it either way.
I've tried to trigger the same issue unsuccessfully by running something like this on the server: dd if=/dev/urandom of=/rpool/data/output bs=4k count=1000k
Any ideas and troubleshooting steps would be much appreciated.
Last "watch -n1 cat /proc/spl/kstat/zfs/arcstats" before ssh dies:
Code:
12 1 0x01 98 26656 3915857582 595305586369
name type data
hits 4 2740093
misses 4 11493
demand_data_hits 4 2665846
demand_data_misses 4 46
demand_metadata_hits 4 72653
demand_metadata_misses 4 11383
prefetch_data_hits 4 1592
prefetch_data_misses 4 38
prefetch_metadata_hits 4 2
prefetch_metadata_misses 4 26
mru_hits 4 61878
mru_ghost_hits 4 0
mfu_hits 4 2676621
mfu_ghost_hits 4 0
deleted 4 14377
mutex_miss 4 0
access_skip 4 0
evict_skip 4 3
evict_not_enough 4 0
evict_l2_cached 4 0
evict_l2_eligible 4 1175031808
evict_l2_ineligible 4 32768
evict_l2_skip 4 0
hash_elements 4 161092
hash_elements_max 4 175322
hash_collisions 4 5849
hash_chains 4 3053
hash_chain_max 4 3
p 4 127
c 4 1047373440
c_min 4 1047373440
c_max 4 16757975040
size 4 2708368432
compressed_size 4 1358569472
uncompressed_size 4 1364000768
overhead_size 4 1197100032
hdr_size 4 111337800
data_size 4 2525781504
metadata_size 4 30010880
dbuf_size 4 41038920
dnode_size 4 162336
bonus_size 4 28800
anon_size 4 2271887872
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 281814528
mru_evictable_data 4 277028864
mru_evictable_metadata 4 0
mru_ghost_size 4 1057112064
mru_ghost_evictable_data 4 1039704064
mru_ghost_evictable_metadata 4 17408000
mfu_size 4 2106368
mfu_evictable_data 4 0
mfu_evictable_metadata 4 0
mfu_ghost_size 4 0
mfu_ghost_evictable_data 4 0
mfu_ghost_evictable_metadata 4 0
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_lock_retry 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
memory_throttle_count 4 0
memory_direct_count 4 39994
memory_indirect_count 4 7769
memory_all_bytes 4 33515950080
memory_free_bytes 4 2209509376
memory_available_bytes 3 -240123904
arc_no_grow 4 1
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 182578736
arc_meta_limit 4 12568481280
arc_dnode_limit 4 1256848128
arc_meta_max 4 1627767088
arc_meta_min 4 16777216
async_upgrade_sync 4 18
demand_hit_predictive_prefetch 4 12
demand_hit_prescient_prefetch 4 0
arc_need_free 4 240123904
arc_sys_free 4 523686720
arc_raw_size 4 0