I have proxmox installed on a dell optiplex.
Recently the nvme disappears after few hours and the only way to bring it back is by physically rebooting.
When the nvme disappears,
Output of lvs :
Output of pvs:
As soon as I physically reboot the system, everything comes up normal for few hours,
Output of lvs when rebooted:
Output of pvs when rebooted:
Could someone please help with what else I should check ?
Recently the nvme disappears after few hours and the only way to bring it back is by physically rebooting.
Code:
root@pve:~# smartctl -a /dev/nvme0
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.16-6-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: XPG GAMMIX S70 BLADE
Serial Number: 2N242L1J4KJA
Firmware Version: 3.2.F.83
PCI Vendor/Subsystem ID: 0x1cc1
IEEE OUI Identifier: 0x707c18
Controller ID: 0
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 707c18 242010400a
Local Time is: Tue Jan 23 12:28:28 2024 NZDT
Firmware Updates (0x0e): 7 Slots
Optional Admin Commands (0x0016): Format Frmw_DL Self_Test
Optional NVM Commands (0x0014): DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x0e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 100 Celsius
Critical Comp. Temp. Threshold: 110 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.50W - - 0 0 0 0 5 5
1 + 3.30W - - 1 1 1 1 50 100
2 + 2.80W - - 2 2 2 2 50 200
3 - 0.1700W - - 3 3 3 3 500 7500
4 - 0.0200W - - 4 4 4 4 2000 70000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 51 Celsius
Available Spare: 98%
Available Spare Threshold: 25%
Percentage Used: 0%
Data Units Read: 2,499,837,497 [1.27 PB]
Data Units Written: 1,971,315 [1.00 TB]
Host Read Commands: 35,663,715,100
Host Write Commands: 112,663,478
Controller Busy Time: 540
Power Cycles: 78
Power On Hours: 931
Unsafe Shutdowns: 41
Media and Data Integrity Errors: 41
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 51 Celsius
Thermal Temp. 1 Transition Count: 176
Thermal Temp. 1 Total Time: 39422
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
free(): invalid pointer
Aborted
When the nvme disappears,
Output of lvs :
Code:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data pve twi-aotz-- 360.68g 17.17 1.03
root pve -wi-ao---- 96.00g
swap pve -wi-ao---- 8.00g
vm-215-disk-0 pve Vwi-a-tz-- 4.00m data 14.06
vm-215-disk-1 pve Vwi-a-tz-- 200.00g data 30.96
vm-215-disk-2 pve Vwi-a-tz-- 4.00m data 1.56
Code:
PV VG Fmt Attr PSize PFree
/dev/sda3 pve lvm2 a-- <488.05g 16.00g
As soon as I physically reboot the system, everything comes up normal for few hours,
Output of lvs when rebooted:
Code:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data pve twi-aotz-- 360.68g 17.27 1.03
root pve -wi-ao---- 96.00g
swap pve -wi-ao---- 8.00g
vm-215-disk-0 pve Vwi-aotz-- 4.00m data 14.06
vm-215-disk-1 pve Vwi-aotz-- 200.00g data 31.14
vm-215-disk-2 pve Vwi-aotz-- 4.00m data 1.56
vm-212-disk-0 vmstoragenvme Vwi-aotz-- 200.00g vmstoragenvme 18.92
vm-213-disk-0 vmstoragenvme Vwi-aotz-- 100.00g vmstoragenvme 38.02
vmstoragenvme vmstoragenvme twi-aotz-- 1.83t 4.04 0.31
Output of pvs when rebooted:
Code:
PV VG Fmt Attr PSize PFree
/dev/nvme0n1 vmstoragenvme lvm2 a-- 1.86t 376.00m
/dev/sda3 pve lvm2 a-- <488.05g 16.00g
Could someone please help with what else I should check ?