Hello,
I'm having an odd issue with one of our servers. I replaced one of the disks because it was bad but the new one keeps giving SMART errors as well.
When doing an short or extended manual test I don't see these errors at all. Could it be cached from the failed drive (which was also sdb)?
Syslog:
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 1 Raw_Read_Error_Rate.
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 172 Erase_Fail_Count.
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 173 Ave_Block-Erase_Count.
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 68 to 69
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 206 Write_Error_Rate.
After extended smart test:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 000 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 70
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 000 000 000 Old_age Always - 2
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 0
180 Unused_Reserve_NAND_Blk 0x0033 100 100 000 Pre-fail Always - 228
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 064 039 000 Old_age Always - 36 (Min/Max 24/61)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Remain 0x0030 100 100 001 Old_age Offline - 0
206 Write_Error_Rate 0x000e 000 000 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 775601670
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 24237552
248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 10450944
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 70 -
# 2 Short offline Completed without error 00% 69 -
Does anyone have an idea?
I'm having an odd issue with one of our servers. I replaced one of the disks because it was bad but the new one keeps giving SMART errors as well.
When doing an short or extended manual test I don't see these errors at all. Could it be cached from the failed drive (which was also sdb)?
Syslog:
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 1 Raw_Read_Error_Rate.
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 172 Erase_Fail_Count.
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 173 Ave_Block-Erase_Count.
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 68 to 69
Jul 16 08:20:32 intern smartd[18216]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 206 Write_Error_Rate.
After extended smart test:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 000 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 70
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 000 000 000 Old_age Always - 2
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 0
180 Unused_Reserve_NAND_Blk 0x0033 100 100 000 Pre-fail Always - 228
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 064 039 000 Old_age Always - 36 (Min/Max 24/61)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Remain 0x0030 100 100 001 Old_age Offline - 0
206 Write_Error_Rate 0x000e 000 000 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 775601670
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 24237552
248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 10450944
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 70 -
# 2 Short offline Completed without error 00% 69 -
Does anyone have an idea?