Received SMART error for a disk. How to map to the right disk?

edtrumbull

New Member
Jun 24, 2022
29
5
3
I have received the following error message from one of my nodes:

Code:
This message was generated by the smartd daemon running on:

   host name:  ceph-00
   DNS domain: snc.as2inc.com

The following warning/error was logged by the smartd daemon:

Device: /dev/bus/0 [megaraid_disk_32] [SAT], ATA error count increased from 0 to 288

Device info:
ST8000NM000A-2KE101, S/N:WSD1D271, WWN:5-000c50-0c95f01f4, FW:SN03, 8.00 TB

When I look at the disks known to that host, I see the following:

ceph-disks.png

I don't see the mapping between either the serial number or the WWN for the disk that smartmon is telling me about to the /dev/sd# names that CEPH and the host know. I've looked in dmesg and smartmon and various other places, and cannot figure out how to make the connection - but I know it must be possible, since smartd knows it. I'm sure I'm missing something... Could one of you point me in the right direction?
 
I suspect you're right.
But that does not really answer the question of how do I find which drive the email points to.
 
You could run lsblk -o NAME,FSTYPE,UUID,SIZE,STATE,TYPE,MOUNTPOINT,LABEL,MODEL and see if you can identify the serial or WNN there. Then you could compare which /dev/sdX that relates to in the webUI. Or maybe you need to use some broadcom software.
 
Last edited:
S/N:WSD1D271
There is your "identification". That number is written on your physical disk, so you "just" have to find the right one. If you have not written your S/N on your caddies or have not a picture of your machine with S/N mapped directy to the drives you see on the disk, now is the time to do that.

For enterprise hardware, the S/N ist often written directly on the caddy. Most RAID controllers have also a "identify" command that will blink the identify led on your caddy, so that you just "see" the drive. Often, if the drive "really" fails, you will also have a red light. But "simple" SMART errors does not mean that the disk is already faulty. There can be a degradation that will yield a failed disk, but not immediately.