SSD Temperature issues smartd

AngryAdm

Member
Sep 5, 2020
145
30
18
94
Hi, we have 4 servers with various SSD's from various vendors. Today I observed something interesting in the syslog of all servers that I went to investigate.

I even went so far to order a trainee to the serverroom, ready to pull out the disk I specified when the syslog entry appeared, to check if it is really 112C. He did not cry in the phone as it was in reality not 112C.

When viewing the disk info panel smart status, they show the below temperatures. I do not see how that is possible. They live in a 24x3.5" trays in a 16C room with fans pulling cold air in front cold isle, exiting to hot isle in the back

I get this in syslog, note it's from 4 different servers:
Code:
Nov 01 15:37:51 pve01 smartd[4926]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 69 to 70
Nov 01 15:37:51 pve01 smartd[4926]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 68 to 69
Nov 01 15:37:52 pve01 smartd[4926]: Device: /dev/sdj [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 80 to 81
Nov 01 15:37:52 pve01 smartd[4926]: Device: /dev/sdk [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 69 to 70


Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 194 Temperature_Celsius changed from 110 to 112
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 82 to 83
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdf [SAT], SMART Prefailure Attribute: 194 Temperature_Celsius changed from 110 to 111
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdg [SAT], SMART Prefailure Attribute: 194 Temperature_Celsius changed from 111 to 112
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdh [SAT], SMART Prefailure Attribute: 194 Temperature_Celsius changed from 111 to 112
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdi [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 80 to 81
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdj [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 80 to 82
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdk [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 81 to 82
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdl [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 20 to 19
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdm [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 82 to 83
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdn [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 81 to 82
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sds [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 80 to 81
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdt [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 81 to 82
Nov 01 15:40:38 pve02 smartd[6221]: Device: /dev/sdu [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 81 to 82

Nov 01 15:32:32 pve03 smartd[2032]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 72 to 71
Nov 01 15:32:32 pve03 smartd[2032]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 62
Nov 01 15:32:32 pve03 smartd[2032]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 25 to 24


Nov 01 15:37:41 pve04 smartd[2433]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 39 to 40
Nov 01 15:37:41 pve04 smartd[2433]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 65
Nov 01 15:37:41 pve04 smartd[2433]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 73 to 72
 
Last edited:
Those are not temperatures in degrees. SMART attributes got a value and a rawvalue.
Look here for example:
Code:
root@Hypervisor:~# smartctl -a /dev/sda
...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
...
190 Temperature_Case        0x0022   073   072   000    Old_age   Always       -       27 (Min/Max 23/28)
...
What you see in the logs should be the VALUE (in my example "73") but that means the actual temperature (RAWVALUE) is 27 degree C.

So if you want to check the actual temperatures you should run smartctl.
 
Last edited:
there is nothing to worry about
it is the temperature on the Fahrenheit scale
for example:
75F = 24C
85F = 30C
115F = 46C
Worry about the SSD when the temperature reaches:
149F - 65C = attention- check cooling
158F - 70C = critical - replace the drive
 
Last edited: