I'm receiving the following from my root pool. It's a 3 drive mirror with lightly used drives. After each scrub, the errors are reset. I'm never seeing more than a single cksum error between monthly scrubs. Data is in offsite and local backups so I'm not worried about it. I looked as the SMART report for each drive and some of the numbers __feel__ bad. I've learned not to take a SMART report at face value as raw values are often not what we feel like they should represent. I'd appreciate some feedback on the reports below from someone who understands what they mean. The drives are 2 years 3 months old and should still be under warranty. If there's a failure I'd like to catch it before that runs out.
Thanks in advance!
Mike
Thanks in advance!
Mike
Code:
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
scan: scrub repaired 0B in 00:09:08 with 0 errors on Sun Jul 9 00:33:11 2023
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZGY8YERE-part3 ONLINE 0 0 1
ata-ST4000VN008-2DR166_ZGY8Y9JM-part3 ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZGY8VE98-part3 ONLINE 0 0 0
errors: No known data errors
Code:
>>>>>sda<<<<<
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.16-3-pve] (local build)
=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf
Device Model: ST4000VN008-2DR166
Firmware Version: SC60
Device is: In smartctl database 7.3/5319
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 083 064 044 Pre-fail Always - 204360152
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 75
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 083 060 045 Pre-fail Always - 192076093
9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 20003 (179 221 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 75
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 069 062 040 Old_age Always - 31 (Min/Max 26/33)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 83
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 217
194 Temperature_Celsius 0x0022 031 040 000 Old_age Always - 31 (0 21 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 19999h+50m+13.821s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 7447341988
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 21686710736
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10 -
>>>>>sdb<<<<<
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.16-3-pve] (local build)
=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf
Device Model: ST4000VN008-2DR166
Device is: In smartctl database 7.3/5319
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 082 064 044 Pre-fail Always - 166619920
3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 75
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 083 060 045 Pre-fail Always - 206699984
9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 20003 (187 2 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 75
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 070 062 040 Old_age Always - 30 (Min/Max 26/33)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 82
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 217
194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 21 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 20000h+27m+04.503s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 7413541580
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 21675610634
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10 -
>>>>>sdc<<<<<
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.16-3-pve] (local build)
=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf
Device Model: ST4000VN008-2DR166
Device is: In smartctl database 7.3/5319
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 082 064 044 Pre-fail Always - 166652104
3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 75
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 085 060 045 Pre-fail Always - 349344955
9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 20003 (120 5 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 75
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 070 061 040 Old_age Always - 30 (Min/Max 26/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 80
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 216
194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 21 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 19999h+18m+21.956s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 7383484330
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 21682821832
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10 -