[SOLVED] Disk Error

Oct 1, 2020
68
3
13
43
Hi,
we get error side proxmox .please look at the below issue.
Oct 15 10:10:41 Nokta-1 smartd[5945]: Device: /dev/bus/0 [megaraid_disk_04], SMART Failure: DATA CHANNEL IMPENDING FAILURE GENERAL HARD DRIVE FAILURE
Oct 15 10:10:44 Nokta-1 smartd[5945]: Device: /dev/bus/0 [megaraid_disk_16] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 79 to 80
Oct 15 10:10:44 Nokta-1 smartd[5945]: Device: /dev/bus/0 [megaraid_disk_16] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 79 to 80
Oct 15 10:10:44 Nokta-1 smartd[5945]: Device: /dev/bus/0 [megaraid_disk_17] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 79 to 81
Oct 15 10:10:44 Nokta-1 smartd[5945]: Device: /dev/bus/0 [megaraid_disk_17] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 79 to 81

we cannot see any error on server.cabin room 18 degrees.
What we can do?
please check and help.
 
maybe you Disk failed
[megaraid_disk_04], SMART Failure: DATA CHANNEL IMPENDING FAILURE GENERAL HARD DRIVE FAILURE

and the other Disk have to be working harder - but still i think the temperature is to high and cooling is not working in the server
 
Maybe smartd is misinterpreting the SMART information wrong, because it does not (yet) know the hard drive because it is a very new type?
Can you carefully touch the metal side of the drive while it is operating? If it is indeed 80 degrees Celcius, it will be too hot to touch for more than a few seconds.
 
hi i am facing the same issue.
my machine is in a garden shed outside its mostly cold. environment. Not sure why its throwing 190 and 194 errors
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 85 to 84
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 15 to 16
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 84 to 83
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 16 to 17

is it too hot? how did you resolve the isssue
 
hi i am facing the same issue.
my machine is in a garden shed outside its mostly cold. environment. Not sure why its throwing 190 and 194 errors
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 85 to 84
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 15 to 16
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 84 to 83
Mar 18 13:12:50 homepxmx smartd[746]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 16 to 17

is it too hot? how did you resolve the isssue
Maybe "85" is just the value and not the raw value? My SSD is for example telling me this:
190 Temperature_Case 0x0022 069 067 000 Old_age Always - 31 (Min/Max 23/34)
In my case a value of 69 is not 69 degree C but 31 degree C if I look the the raw value.

What is smartctl -a /dev/sda returning?
 
Maybe "85" is just the value and not the raw value? My SSD is for example telling me this:
190 Temperature_Case 0x0022 069 067 000 Old_age Always - 31 (Min/Max 23/34)
In my case a value of 69 is not 69 degree C but 31 degree C if I look the the raw value.

What is smartctl -a /dev/sda returning?
Hi

this is what I am getting
root@homepxmx:~# smartctl -a /dev/sda
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 085 051 040 Old_age Always - 15 (Min/Max 15/20)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 3
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 290
194 Temperature_Celsius 0x0022 015 049 000 Old_age Always - 15 (0 7 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count

but in the syslog events i see this

Mar 21 19:50:01 homepxmx systemd[1]: Started Proxmox VE replication runner.
Mar 21 19:50:48 homepxmx smartd[766]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 84 to 85
Mar 21 19:50:48 homepxmx smartd[766]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 16 to 15

so not able to understand is it getting hot, eventhough its in a shed and the weather is still cold.
 
190 Airflow_Temperature_Cel 0x0022 085 051 040 Old_age Always - 15 (Min/Max 15/20)
Smart is always logging if something changes, not only if it is a critical change. So if it tells you that "SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 84 to 85" that is no problem because "85" represents "15 degree C"
 
Smart is always logging if something changes, not only if it is a critical change. So if it tells you that "SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 84 to 85" that is no problem because "85" represents "15 degree C"
Hence, i am thinking it must be something else, which is triggring my vm's shutdown without any reason. thanks.77
i would like to figure out why VMs are shutdown by proxmox and solution to this.
 
Did you checked for OOM logs? Most of the time VMs are shutdown because your host runs out of memory.
 
there is nothing to worry about
it is the temperature on the Fahrenheit scale
for example:
75F = 24C
85F = 30C
115F = 46C
Worry about the SSD when the temperature reaches:
149F - 65C = attention - check cooling
158F - 70C = critical - replace the drive
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!