smartd false positive SSD CurrentPendingSector?

Apr 7, 2016
14
0
1
41
Since April 30, I've gotten 7 warning emails from one host, 5 from another, and 2 from another;

each email claims there is 1 CurrentPendingSector failed; that is, currently unreadable (pending).

On each host, it's the same type of drive, a CT1000MX500SSD1 (Crucial 1TB)

Running smartctl manually shows *no* sectors failed.

systemctl status shows info like this:
Jun 13 21:29:19 pm2 smartd[1196]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jun 14 00:29:18 pm2 smartd[1196]: Device: /dev/sda [SAT], No more Currently unreadable (pending) sectors, warning condition reset after 1 email

Does this imply that the drives are actually kicking up errors but fixing them? (I thought SSDs did that silently, until wearout, with a different SMART attribute indicating percentage.)

It doesn't appear to be service impacting, but I'm not finding good info via Google.

Has anybody else seen this, or is this an obvious question for Crucial?
 

Andrew Hart

Member
Dec 1, 2017
68
9
8
47
At least with the newest INTEL ssd drives Pending Sector no longer means the same thing. It is used to indicate that a block will be re-mapped soon, (as far as I can tell.)
On hdd it always meant that a sector could not be read and the drive is hoping that you write to it so that it can be re-mapped. (Also, as far as I know.)
 

123paul

New Member
Aug 31, 2018
4
0
1
39
On each host, it's the same type of drive, a CT1000MX500SSD1 (Crucial 1TB)
Did you manage to find a solution for this? As I started to get these messages on my homelab since today.

I did find this specification from Micron for all there SMART variables btw, might be useful for someone who is having the same problems and start worrying about wearout on their disks.
 

Andrew Hart

Member
Dec 1, 2017
68
9
8
47
If it is the same problem, you'll find the pending sectors will increase maybe up to 17 and then reset to 0 and remapped will increase by just 1.

17 is the highest I've seen I think. So keep an eye on it and check that your ssd has new firmware.

If you think that there are pending sectors you can read the disk "dd if=/dev/sda of=/dev/null bs=1M". It will crash your system if there are real pending sectors probably.
 

stormtronix

New Member
Jul 23, 2014
3
0
1
We have the same Issue with the same SSDs CT1000MX500SSD1 (Crucial 1TB).
I also noticed that smart does not know many attributes from the ssds:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 561
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0032 098 098 000 Old_age Always - 34
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 6
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 000 000 000 Pre-fail Always - 42
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 074 055 000 Old_age Always - 26 (Min/Max 0/45)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Unknown_SSD_Attribute 0x0030 098 098 001 Old_age Offline - 2
206 Unknown_SSD_Attribute 0x000e 100 100 000 Old_age Always - 0
210 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
246 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4555280040
247 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 74855687
248 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 991046634

I did not find anything about what these unknown attributes could be - any idea?
 

123paul

New Member
Aug 31, 2018
4
0
1
39
I see I forgot to link the document I found in my previous reply. Seems I can't paste external links as my account is too new.

Just Google this "tnfd22_client_ssd_smart_attributes.pdf"

I'm still having this issue, seems to be a Crucial specific issue.
 

123paul

New Member
Aug 31, 2018
4
0
1
39
Found out there has been released a firmware (M3CR022) update in june to fix this issue. I will try this and report back in a couple of days to inform if this fixed it.
 

123paul

New Member
Aug 31, 2018
4
0
1
39
I wasn't able to update the firmware yet as I didn't manage to boot from a USB drive with the firmware release from crucial. I would need to attach it to a windows machine to try the firmware update there. But only have macs around for now.

So if someone else manages to test it sooner I would be curious to know the outcome.
 

Paspao

Member
Aug 1, 2017
52
1
8
50
I am getting same error since a couple of days and seems to auto fix after an hour.

Apr 23 14:27:14 proxmox smartd[1495]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
...
Apr 23 15:27:14 proxmox smartd[1495]: Device: /dev/sda [SAT], No more Currently unreadable (pending) sectors, warning condition reset after 1 email

My drive is an Intel SSD DC S3520 1.2TB with latest firmware (N2010121).

Is one drive part of 10 Ceph OSDs.

I will keep it monitored.
 

Paspao

Member
Aug 1, 2017
52
1
8
50
Hello,

I am still getting the same messages only for one of my Ceph OSDs.

I run smartd tests that are successful:
SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1707 -
# 2 Short offline Completed without error 00% 1705 -

And I see 1 Reallocated_Sector_Ct:

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       1
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1793
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5
170 Unknown_Attribute       0x0033   099   099   010    Pre-fail  Always       -       0
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       3
175 Program_Fail_Count_Chip 0x0033   100   100   010    Pre-fail  Always       -       69088392870
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   073   063   000    Old_age   Always       -       27 (Min/Max 16/37)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       27
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
225 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       343954
226 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       327
227 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       19
228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       107596
232 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
234 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       343954
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       85337
243 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       698199

As this is a new SSD do I have to worry, ask for replacement or do you suggest to run other tests on it?

Thank you.
P.
 

jjd

New Member
May 17, 2019
1
0
1
52
Hi,

The drive database is out of date.

Update it by grabbing the latest db file from here:-
www-smartmontools-org/export/4914/trunk/smartmontools/drivedb.h
replace dashes with dots. Stupid site wont let me post a url.

Stick the file in here:-
/var/lib/smartmontools/drivedb/drivedb.h

And restart your smartd service on proxmox vm or the proxmox server.

systemctl restart smartd.service

/var/lib/smartmontools/smartd.CT1000MX500SSD1-1912E1F3465D.ata.state

for example on my box: smartctl -P show /dev/sdb
shows:-

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-12-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke

Drive found in smartmontools Database. Drive identity strings:
MODEL: CT1000MX500SSD1
FIRMWARE: M3CR023
match smartmontools Drive Database entry:
MODEL REGEXP: Crucial_CT(128|256|512)MX100SSD1|Crucial_CT(200|250|256|500|512|1000|1024)MX200SSD[1346]|Crucial_CT(275|525|750|1050|2050)MX300SSD[14]|Crucial_CT(120|240|480|960)M500SSD[134]|Crucial_CT(128|256|512|1024)M550SSD[134]|CT(120|240|480)BX300SSD1|CT(120|240|480|960)BX500SSD1|CT(250|500|1000|2000)MX500SSD[14]|Micron_M500_MTFDDA[KTV](120|240|480|960)MAV|Micron_M500DC_(EE|MT)FDDA[AK](120|240|480|800)MBB|(Micron[_ ])?M500IT[_ ]MTFDDA[KTY](032|050|060|064|120|128|240|256)[MS]BD|(Micron_)?M510[_-]MTFDDA[KTV](128|256)MAZ|MICRON_M510DC_(EE|MT)FDDAK(120|240|480|800|960)MBP|(Micron_)?M550[_-]MTFDDA[KTV](064|128|256|512|1T0)MAY|Micron_M600_(EE|MT)FDDA[KTV](128|256|512|1T0)MBF[25Z]?|(Micron_1100_)?MTFDDA[KV](256|512|1T0|2T0)TBN|Micron 1100 SATA (256G|512G|1T|2T)B
FIRMWARE REGEXP: .*
MODEL FAMILY: Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs
ATTRIBUTE OPTIONS: 005 Reallocate_NAND_Blk_Cnt
170 Reserved_Block_Count
171 Program_Fail_Count
172 Erase_Fail_Count
173 Ave_Block-Erase_Count
174 Unexpect_Power_Loss_Ct
180 Unused_Reserve_NAND_Blk
183 SATA_Interfac_Downshift
184 Error_Correction_Count
195 Cumulativ_Corrected_ECC
202 Percent_Lifetime_Remain
206 Write_Error_Rate
210 Success_RAIN_Recov_Cnt
246 Total_Host_Sector_Write
247 Host_Program_Page_Count
248 FTL_Program_Page_Count



I am still assessing whether this will stop the error.
Am waiting in anticipation.
But at least with the drive recognized you stand a better chance.

According to pdf ( google tnfd22_client_ssd_smart_attributes.pdf ) I have found 202 is the variable that holds the value of wear.

In my case.

202 Percent_Lifetime_Remain 0x0030 100 100 001

So 1st column is actual value so 100% remaining. When it gets to 001 then its a fail and the device will go read only.

"This value gives the threshold inverted value of the raw data value below. That is, if 30% of the lifetime has been used, this value will report 70%. A value of 0% indicates that 100% of the expected lifetime has been used."

Regards
Joe.
 

Paspao

Member
Aug 1, 2017
52
1
8
50
Hello,

thank you, I updated the drive database and I have not received alerts since days.

So those alert are to consider false positives?

Thanks,
P.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!