Smart errors

leussink

New Member
Mar 16, 2022
10
0
1
33
Every day I'm getting 2 emails from my Proxmox server where I have Home Assistant installed with the following subject line:

- SMART error (CurrentPendingSector) detected on host: pve

The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors

- SMART error (OfflineUncorrectableSector) detected on host: pve

The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 3 Offline uncorrectable sectors

Any idea what actions I have to take now?
 
Could you provide the output of smartctl -a /dev/sda? You could also run a self-test with smartctl -t /dev/sda. It looks like you have some bad sectors, this might be an indication that your disk is dying and you should backup your data as soon as possible. If you are running zfs or a different RAID solution you could also look into replacing your disk.
 
Could you provide the output of smartctl -a /dev/sda? You could also run a self-test with smartctl -t /dev/sda. It looks like you have some bad sectors, this might be an indication that your disk is dying and you should backup your data as soon as possible. If you are running zfs or a different RAID solution you could also look into replacing your disk.

Sure!

Code:
Linux pve 5.13.19-6-pve #1 SMP PVE 5.13.19-14 (Thu, 10 Mar 2022 16:24:52 +0100) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Mar 14 23:02:19 CET 2022 on pts/0
root@pve:~# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19-6-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Silicon Motion based SSDs
Device Model:     TS120GMTS420S
Serial Number:    G838720635
LU WWN Device Id: 5 7c3548 1d339627b
Firmware Version: U1001A0
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      M.2
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Mar 16 09:51:50 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       2
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       1299
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       32
160 Uncorrectable_Error_Cnt 0x0032   100   100   050    Old_age   Always       -       3
161 Valid_Spare_Block_Cnt   0x0033   100   100   050    Pre-fail  Always       -       95
163 Initial_Bad_Block_Count 0x0032   100   100   050    Old_age   Always       -       4
164 Total_Erase_Count       0x0032   100   100   050    Old_age   Always       -       21382
165 Max_Erase_Count         0x0032   100   100   050    Old_age   Always       -       54
166 Min_Erase_Count         0x0032   100   100   050    Old_age   Always       -       2
167 Average_Erase_Count     0x0032   100   100   050    Old_age   Always       -       29
168 Max_Erase_Count_of_Spec 0x0032   100   100   050    Old_age   Always       -       1000
169 Remaining_Lifetime_Perc 0x0032   100   100   050    Old_age   Always       -       98
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Runtime_Invalid_Blk_Cnt 0x0032   100   100   050    Old_age   Always       -       2
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       31
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       43
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       3
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       3
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       95
241 Host_Writes_32MiB       0x0030   100   100   050    Old_age   Offline      -       22992
242 Host_Reads_32MiB        0x0030   100   100   050    Old_age   Offline      -       9515
245 TLC_Writes_32MiB        0x0032   100   100   050    Old_age   Always       -       17548

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

It's a quite new SSD that I've bought, so I might have some guarantee.
 
I think so too. If your disk is still covered under some kind of warranty, I'd suggest you backup your data and replace it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!