Disk overview: Wearout percentage shows 0%, IPMI shows 17% ...

Rainerle

Renowned Member
Jan 29, 2019
120
33
68
Hi,
we are running an older Proxmox Ceph cluster here and I am currently looking through the disks.

1634639453671.png


So the OS disks have a Waerout of two percent but the Ceph OSDs still have 0%?!?!?!?

So I looked into the Lenovo XClarity Controller:

1634639632671.png

So for the OS disks it looks the same, but the Ceph OSDs show 17% wear out.

Looking at smartctl output:

1634639810424.png

So how to be save then???

Best regards
Rainer
 

Attachments

  • 1634639775376.png
    1634639775376.png
    50.3 KB · Views: 6
Last edited:
As far as I know PVE parses the output of smartctl to get the disk wearout so it is just looking for the "Wear_Leveling_Count" and similar lines. The problem is that SMART attributes aren't standardized and every manufacturer and often each disk model uses different SMART attributes. And this "Wear_Leveling_Count" attribute name isn't provided by the disk, the disk is just providing the ID "177" and the values. Smartctl is providing a crowdsourced database which tells smartctl which ID stands for what attribute name. So it can always be that the smart attributes are just wrong because someone inserted some wrong data into the database or the SSD is too new and not supported yet.
If you really want to be sure what of the attributes your disk wearout is, you need to look if your manufacturer published a datasheet for your disk model explaining the SMART attributes in detail. The problem is that most manufactures won't provide this information (of all my drives only Intel does this) and the smartctl team needs to reverse engineer or guess the meaning of the attributes.

All your "Unknown_Attribute" entries are a sign that your drives aren't fully covered by smartctl yet.

I can't see any smart attribute explanation in the Micron 5100 Eco 1,92TB datasheet.

Edit:
Here is a additional datasheet for SMART for a Micron 5100. But not sure if it is your your ECO model too. There is for example no ID 177. But 173 would be "173 AD Average Block Erase Count No Average erase count of all good blocks". There ID 202 should be wearout "202 CAh Percentage of Lifetime Remaining No Percentage lifetime remaining" which you don't got.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!