negative SSD "Wearout -140%"

udotirol · 2025-05-07T00:38:45+0200

One of our servers uses two Crucial MX500 SSDs in a ZFS RAID1 setup as boot drives. By chance, I checked the servers' SMART values in the UI and it shows a whopping negative -140% wearout. Not sure what to make of this?

In the shell, smartctl -a doesn't show anything extraordinary except the "Percent_Lifetime_Remaining", which happens to be 116%.

Code:

$ smartctl -a /dev/sda
[...]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       19715
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       20
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   240   240   000    Old_age   Always       -       1161
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       30
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   078   050   000    Old_age   Always       -       22 (Min/Max 0/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   240   240   001    Old_age   Offline      -       116
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       116458851999
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       1084277588
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       8546353571

The drives have about 20K power on hours and about 55TB written so far, so given their 180TBW specs they should be far away from wearout ...

No self test errors have been logged either and so I'm wondering what's going on?

guruevi · 2025-05-07T02:03:39+0200

Percent lifetime remaining says 240, so 140% over the 100% the manufacturer has specified. It starts at 0 and goes to n, Proxmox does 100 - $value to get you a ‘remaining’ life value.

Given a 4kB LBA size, it seems you have written 465TB or ~2.6x your rated TBW, so the -140% is probably accurate. Given the age of these things, and the fact I’ve seen one after the other fail of that model, I’m surprised they still function at all, there was a bug in the firmware when these were ‘new’ (what, 7 years ago) that caused them to brick under certain conditions.

udotirol · 2025-05-07T13:10:27+0200

guruevi said:
Percent lifetime remaining says 240, so 140% over the 100% the manufacturer has specified. It starts at 0 and goes to n, Proxmox does 100 - $value to get you a ‘remaining’ life value.

Given a 4kB LBA size, it seems you have written 465TB or ~2.6x your rated TBW, so the -140% is probably accurate. Given the age of these things, and the fact I’ve seen one after the other fail of that model, I’m surprised they still function at all, there was a bug in the firmware when these were ‘new’ (what, 7 years ago) that caused them to brick under certain conditions.

No, percent lifetime remaining says 116 (RAW_VALUE). The 240 you are referring to are from the VALUE column. The VALUE column represents the "normalized", current value of the attribute on a scale of usually 1 to 253 (sometimes also 1 to 100). WORST is the worst value observed so far in the drive's lifetime and THRESH is the threshold when the drive manufacturer considers the drive to fail. And as you see in my example, with a VALUE of 240, the drive is not even remotely close to the THRESH of 001

RAW_VALUE is the actual raw data coming from the drive and the "magic" required to convert this into something meaningful is often difficult to figure out, because it is highly dependend on the manufacturer.

See https://www.smartmontools.org/wiki/Howto_ReadSmartctlReports_ATA here for a comprehensive explanation.

As per the LBA size, I am not so sure. There is the logical block size and the physical block size, and I am uncertain which one to use for the LBA calculation. From what I've read, this isn't consistent between drive vendors. In my case, the drives report a "512 bytes logical, 4096 bytes physical" sector size.

But yes, if the physical block size is taken, the drive has exceeded it 180TBW, yet on the other hand I highly doubt this, because the SSDs are only used as (cheap) boot drives. Hard to believe, just running proxmox would exceed those 180TBW. The actual storage for the VMs comes from ZFS via iSCSI.

guruevi · 2025-05-07T13:59:13+0200

Raw values are converted to values using a table referenced by the smartctl software. Users should always look at the normalized value, the raw value is converted through logic in drivedb.h which has an entry for your model.

-140+256=116

Per Crucial: The new SSD's Attribute 202 will report “0”, and when its specified lifetime has been reached, it will show “100,” reporting that 100 percent of the lifetime has been used. On these models, the percentage can exceed 100 as more write operations are done, but the data retention concerns are the same.

guruevi · 2025-05-07T15:38:29+0200

The Crucial MX500 SSD has a 4K physical alignment. If you read/write 512b sync then internally (this is worst case scenario, if it has a capacitor, it will optimize this better through on-board write cache) it will read the 4K block, modify 512b and write the 4K block. So for every 4K written over a 512b reported size, you do 8 4K writes (again, this is only worst case scenario, you do sync writes with long pauses). I believe these MX500s reported a 512b and could not be changed to 4K AF.

That is why you see such a difference between TB written to the interface and internal block written, there is potentially 8x write amplification baked in these early SSD for compatibility with Windows 7.

udotirol · 2025-05-07T16:08:56+0200

guruevi said:
The Crucial MX500 SSD has a 4K physical alignment. If you read/write 512b sync then internally (this is worst case scenario, if it has a capacitor, it will optimize this better through on-board write cache) it will read the 4K block, modify 512b and write the 4K block. So for every 4K written over a 512b reported size, you do 8 4K writes (again, this is only worst case scenario, you do sync writes with long pauses). I believe these MX500s reported a 512b and could not be changed to 4K AF.

That is why you see such a difference between TB written to the interface and internal block written, there is potentially 8x write amplification baked in these early SSD for compatibility with Windows 7.

yes, I'm aware of that limitation (that is shared by many cheap SSDs, even today).

Like I said, hard to believe though that a pure boot device running PVE can generate so much written data.

udotirol · 2025-05-07T16:24:09+0200

guruevi said:
Raw values are converted to values using a table referenced by the smartctl software. Users should always look at the normalized value, the raw value is converted through logic in drivedb.h which has an entry for your model.

-140+256=116

Per Crucial: The new SSD's Attribute 202 will report “0”, and when its specified lifetime has been reached, it will show “100,” reporting that 100 percent of the lifetime has been used. On these models, the percentage can exceed 100 as more write operations are done, but the data retention concerns are the same.

your calculation makes sense, yes ... yet again, it is somehow hard to believe ...

And so I decided to have a look at some other servers and devices coming with those devices and what shall I say, it appears, you're correct after all

With a similar drive and the identical firmware I see this for example:
202 Percent_Lifetime_Remain 0x0030 047 047 001 Old_age Offline - 53

And a bigger variant, CT1000MX500SSD1 (=1 TB) with an older firmware:
202 Percent_Lifetime_Remain 0x0030 100 100 001 Old_age Offline - 0

TL;DR: the drives will get replaced - thanks for your support!

guruevi · 2025-05-07T16:25:52+0200

(55 terabytes) / (22 000 hours) =
694.444444 kBps

Don't see a problem here (packages, logs, sync tokens, status, config files ...).

I am at 22TB after ~3 years in a large 14-node cluster (wearout is still low as these are rated for 3x daily writes).

Search

Search

negative SSD "Wearout -140%"

udotirol

Well-Known Member

guruevi

Well-Known Member

udotirol

Well-Known Member

guruevi

Well-Known Member

guruevi

Well-Known Member

udotirol

Well-Known Member

udotirol

Well-Known Member

guruevi

Well-Known Member

We value your privacy