Smartctl Offline_Uncorrectable in SSD

rlljorge

Active Member
Jun 2, 2020
30
4
28
44
Hello !

I am receiving a lot of email about Offline_Uncorrectable in my SSD, in different servers and multiple devices.

Code:
The following warning/error was logged by the smartd daemon:

Device: /dev/sdc [SAT], 16 Offline uncorrectable sectors

Device info:
HFS1T9G32FEH-BA10A, S/N:KJB6N8223I0707348, WWN:5-ace42e-025536fd7, FWD02, 1.92 TB
Full smart output
Code:
root@pve-prod-01:~# smartctl -a /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-6-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     HFS1T9G32FEH-BA10A
Serial Number:    KJB6N8223I150731P
LU WWN Device Id: 5 ace42e 025535ff0
Add. Product Id:  DELL(tm)
Firmware Version: DD02
User Capacity:    1,920,383,410,176 bytes [1.92 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        Not in smartctl database 7.3/5577
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Mar 28 12:07:53 2024 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 1730) seconds.
Offline data collection
capabilities:                    (0x19) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000e   100   100   006    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   002    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       4369
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       19
 13 Read_Soft_Error_Rate    0x002e   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       33
175 Program_Fail_Count_Chip 0x0032   100   100   000    Old_age   Always       -       11
179 Used_Rsvd_Blk_Cnt_Tot   0x0033   100   100   002    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0010   100   100   000    Old_age   Offline      -       4584
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   065   057   000    Old_age   Always       -       35 (Min/Max 23/43)
195 Hardware_ECC_Recovered  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       16
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
201 Unknown_SSD_Attribute   0x0033   100   100   050    Pre-fail  Always       -       0
202 Unknown_SSD_Attribute   0x0033   100   100   050    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       12124
235 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       12124
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       24202
245 Unknown_Attribute       0x0033   100   100   001    Pre-fail  Always       -       100

SMART Error Log not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported

This look like a bug the ssd is not too old and happens in multiple servers and devices.

Can I ignore this mensagens or I need some type adicional check ?

Regards,

Rodrigo
 
Device is: Not in smartctl database 7.3/5577

I would start with a: update-smart-drivedb: [1] on the host(s). Your disk model is (meanwhile) present; at least in the latest version: [2].
I also would run a long SMART-test in any way.

But all this is not specific to PVE or Proxmox products in general. So, if you suspect some kind of bug, I suggest to reach out to the Smartmontools: [3] and/or disk manufacturer support channels.

[1] https://www.smartmontools.org/wiki/Download#Updatethedrivedatabase
[2] https://www.smartmontools.org/browser/trunk/smartmontools/drivedb.h#L3102
[3] https://www.smartmontools.org