Hi all:
I'm getting daily e-mails of this:
The following warning/error was logged by the smartd daemon:
Device: /dev/nvme0, Critical Warning (0x02): Temperature
When I run smartctl -a /dev/nvme0, I get:
I've run this within 30 minutes of getting the e-mail. I know we had an hvac failure one night, and the room got HOT. That was promptly corrected, but every night since, i've continued to receive this e-mail. It appears to me that there is no current temperature warning, just a past one.
Is there some way to acknowledge or accept this past issue and reset it/proxmox so it doesn't continue to send these now-false warnings daily?
Thanks!
--Jim
I'm getting daily e-mails of this:
The following warning/error was logged by the smartd daemon:
Device: /dev/nvme0, Critical Warning (0x02): Temperature
When I run smartctl -a /dev/nvme0, I get:
Code:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.16-14-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 980 1TB
Serial Number: xxxx
Firmware Version: 2B4QFXO7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 5
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization: 186,906,210,304 [186 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 d12142ee3d
Local Time is: Sat Mar 9 17:47:04 2024 PST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0055): Comp DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Namespace 1 Features (0x10): NP_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 5.24W - - 0 0 0 0 0 0
1 + 4.49W - - 1 1 1 1 0 0
2 + 2.19W - - 2 2 2 2 0 500
3 - 0.0500W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 1000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 43 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 5,076,879 [2.59 TB]
Data Units Written: 9,893,908 [5.06 TB]
Host Read Commands: 156,410,825
Host Write Commands: 287,179,799
Controller Busy Time: 1,299
Power Cycles: 8
Power On Hours: 2,396
Unsafe Shutdowns: 2
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 748
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 43 Celsius
Temperature Sensor 2: 47 Celsius
Thermal Temp. 2 Transition Count: 171808
Thermal Temp. 2 Total Time: 37434
Error Information (NVMe Log 0x01, 16 of 64 entries)
I've run this within 30 minutes of getting the e-mail. I know we had an hvac failure one night, and the room got HOT. That was promptly corrected, but every night since, i've continued to receive this e-mail. It appears to me that there is no current temperature warning, just a past one.
Is there some way to acknowledge or accept this past issue and reset it/proxmox so it doesn't continue to send these now-false warnings daily?
Thanks!
--Jim