SMART error mails from node(s), can't find issue outside of syslog.

Kephin

Renowned Member
Apr 21, 2015
25
5
68
We have a 9 node cluster.
Now I have two nodes that start sending me SMART mails.
One of which started doing this this morning, after I installed the latest updates and rebooted it yesterday. The other one started logging this about a month ago.

Subject: SMART error (Health) detected on host: <node>

Snippet from mail:

The following warning/error was logged by the smartd daemon:

Device: /dev/bus/0 [megaraid_disk_00], SMART Failure: WARNING: ascq=0x4

Both of these nodes are PowerEdge R620 machines.

When I check smart status in PX gui its shows as OK on both these boxes.
When i check array status in iDRAC my arrays are healthy.

I'm not quite sure where to go from here, proxmox will mail me daily now.

Syslog regularly logs:

May 17 13:31:32 px6pve4 smartd[886]: Device: /dev/bus/0 [megaraid_disk_00], SMART Failure: WARNING: ascq=0x4

How can I zoom in on this, and above all make sure i'm actually replacing defective disks and this is not some sort of software induced problem?

I have a hard time accepting two actual disk failures in the same sort of timeframe, these disks are from mid 2020 and mid 2018 respectively and somehow neither of them trigger health warnings on the raid controller?
 
Dove into SMART codes a little more.

https://en.wikipedia.org/wiki/S.M.A.R.T. where 0x04 referes Start/Stop Count.

Using smartctl -a -d megaraid,<disk#> /dev/sda I can get some individual disk data.

On both servers with disk 0 they log:

=== START OF READ SMART DATA SECTION ===
SMART Health Status: WARNING: ascq=0x4 [asc=b, ascq=4]

(other drives report Health Status OK)

However "Accumulated start-stop cycles:" is either 9 or 10 on all my disks... so im assuming the error that smartctl logs there isn't the actual smart code?
 
Last edited:
For anybody that stumbles onto this topic.
They still do this, and according IDRAC lights out management my arrays are still healthy.

I've elected to just ignore this.
Haven't found the culprit.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!