Incorrect Temps detected on all HDD's via System Log

ashstavegas

New Member
Jul 28, 2024
1
0
1
Hello.

Running on Virtual Environment 8.2.4
I have:
2x 4tb WD red spinning disk (both logging incorrect temps)
2x 2TB m.2 (Reporting temps ok, not an issue).
1x500gb 2.5" ssd. (logging incorrect temps)


In my system logs I constantly see the 2x 4tb WD Red and the 500gb 2.5" ssd complain about temps that are pretty much impossible:

Jul 28 16:58:42 proxmox smartd[1651]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 124 to 125
Jul 28 15:28:42 proxmox smartd[1651]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 126 to 125
Jul 28 15:28:42 proxmox smartd[1651]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 77 to 76

As per the above, sda/sdb/sdc

1722145308472.png


From what I understand, doing a "smartctl -a /dev/sda gives me the correct values:

SDA:
1722145403890.png

SDB: (500gb crucial ssd)
1722145498695.png

SDC:
1722145532037.png


How do I get proxmox to pickup the correct values?
Or...... and I know this may be a frowned upon question, how to I potentially suppress those system logs.... or maybe double the threshold so it logs a syslog msg when temps would be at a temperature of concern, for example, setting threshold to 250 if currently reporting 125 when its actually running around 20degrees in the "smartctl show command". (this way, it would be 40degrees celsius) before it logs an syslog entry.

The server case housing this gear is well ventilated, good airflow and overall good thermals. Plus the 4TB WD Reds sit infront of 3x120mm fan intakes, doing very little workload.

Thanks in advance.

Regards,

Ash.
 
AFAIK all your values for Temperature_Celsius are perfectly normal. The way it works is it takes a number (usually) 150 & takes off the RAW_VALUE from this. This is done to cause the general threshold warning system to work as normal; above threshold is considered good & below threshold is considered bad - so for temps where the reverse is true, it "flips" the mechanism with the above. Hope this helps you.

Please note, that in general SMART values are at best counter-intuitive, & at worst completely misleading/wrong.
 
Code:
Jul 28 16:58:42 proxmox smartd[1651]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 124 to 125
Jul 28 15:28:42 proxmox smartd[1651]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 126 to 125
Jul 28 15:28:42 proxmox smartd[1651]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 77 to 76


From what I understand, doing a "smartctl -a /dev/sda gives me the correct values:

SDA:
View attachment 71972

No, they give you exactly same values, look at the VALUE column, that's what smartd logs. It is "normalised", you are however after the raw value.

Or...... and I know this may be a frowned upon question, how to I potentially suppress those system logs....

If you worry about "clean" logs, you need to read them with filtering, e.g.:

Code:
journalctl -p 0..3 -u smartd

And if you really want to push it, you canget logs processed somewhere centrally, if you care, make a dashboard etc. That way, you are not concerned about INFO entries.

If you check the log, you will notice those are INFO level messages.

The smartd.conf man page then allows you tweak what is logged at which level:

-W DIFF[,INFO[,CRIT]] Report if the current temperature had changed by at least DIFF degrees since last report, or if new min or max temperature is detected. Report or Warn if the temperature is greater or equal than one of INFO or CRIT degrees Celsius. If the limit CRIT is reached, a message with loglevel 'LOG_CRIT' will be logged to syslog and a warning email will be send if '-m' is specified. If only the limit INFO is reached, a message with loglevel 'LOG_INFO' will be logged.

The server case housing this gear is well ventilated, good airflow and overall good thermals.

And that's why you only get INFO messages.
 
Please note, that in general SMART values are at best counter-intuitive, & at worst completely misleading/wrong.

:D This is sometimes more fun than with others, e.g. a brand new Seagate drive appear to report errors when looking at the raw values when in fact it's about how to interpret those numbers ... it can be what you expect or it can require beyond trivial calculations.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!