Server hangs for no apparent reason

itamarbudin

New Member
Aug 13, 2024
4
0
1
Hi everyone

I have been experiencing an annoying issue with my home lab "server" for the last week, and I hope someone can help.

The server "hangs" or "freezes" without any error message. I have been running this setup for two years now without any issues.

Setup details:

Core Server:
  • CPU: AMD Ryzen 9 5900x
  • Motherboard: ASUS ROG Crosshair VIII Hero (WI-FI) with the latest BIOS (5002)
  • RAM: 4X Kingston KHX3000C16D4/32GX 32GB - 128GB Total (Memory was on the motherboard QVL list)
  • Storage:
  • 2x WD Blue SA510 2TB running as ZFS Mirror
  • NIC: Intel Corporation Ethernet Controller I225-V (I am not using the NIC on my motherboard as it is only 2.5GBs)
The primary VM on this server is a TrueNAS scale with the following hardware configured as a passthrough:
  • ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card Supports 4 NVMe M.2 hosting the following NVMe drives:
    • 2 x INTEL MEMPEK1J064GA
  • 2 X SABRENT 256GB Rocket NVMe PCIe M.2 2280
  • LSI 9300-16i 16-Port 12Gb/s SAS Controller HBA Card with the following HDDs connected to it:
    • 6 x Seagate IronWolf 10TB connected to the HBA
    • 6 x Seagate IronWolf 6TB connected through an external QNAP TL-D800S 8 Bay SATA 6Gbps JBOD Storage Enclosure using a nifty external Mini SAS 26pin (SFF-8088) Male to Mini SAS 26 (SFF-8088)
  • NVIDIA Quadro 4000

The TrueNAS exposes the primary storage for my other VMs, but I only run a simple Windows and Ubuntu machine.

This setup was working fine for the last two years without any issues.

Here is some additional information:

Code:
root@pve:~# pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-11-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8: 6.8.12-11
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.8-pve2
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.11
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-4
pve-ha-manager: 4.0.7
pve-i18n: 3.4.4
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2

There is no apparent error message in the log file. For example, the server crash happened on 6/28/2025 at 23:49:24, and I had to reset the server at 6:36:53 this morning physically. I can see these instances of logs every time the server crashes.

Note: the server crashed while I was typing this post as well

Code:
Jun 28 23:49:23 pve kernel: scsi host4: iSCSI Initiator over TCP/IP
Jun 28 23:49:23 pve kernel:  connection1200:0: detected conn error (1020)
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.21.5,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.213.154,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.177.242,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.0.10,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.0.1,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.16.0.1,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.212.255,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.97.104,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.60.9,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.125.90,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.201.100,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.225.244,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.229.164,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Could not set session1200 priority. READ/WRITE throughout and latency could be affected.
Jun 28 23:49:24 pve iscsid[2458]: connection1200:0 login rejected: initiator error - target not found (02/03)
Jun 28 23:49:24 pve iscsid[2458]: Connection1200:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 192.168.1.232,3260] through [iface: default] is shutdo>
-- Boot b2298eda57b040af8f7baa73314b624f --
Jun 29 06:36:53 pve kernel: Linux version 6.8.12-11-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP>
Jun 29 06:36:53 pve kernel: Command line: initrd=\EFI\proxmox\6.8.12-11-pve\initrd.img-6.8.12-11-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
Jun 29 06:36:53 pve kernel: KERNEL supported cpus:
Jun 29 06:36:53 pve kernel:   Intel GenuineIntel
Jun 29 06:36:53 pve kernel:   AMD AuthenticAMD
Jun 29 06:36:53 pve kernel:   Hygon HygonGenuine
Jun 29 06:36:53 pve kernel:   Centaur CentaurHauls
Jun 29 06:36:53 pve kernel:   zhaoxin   Shanghai 
Jun 29 06:36:53 pve kernel: BIOS-provided physical RAM map:
Jun 29 06:36:53 pve kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable

What did I try already?
  1. I thought it might be a thermal issue, so I refreshed my CPU thermal paste (it was a bit dry), and temps are stable:

  2. Code:
    root@pve:~# sensors
    nouveau-pci-0500
    Adapter: PCI adapter
    fan1:        2460 RPM
    temp1:        +50.0°C  (high = +95.0°C, hyst =  +3.0°C)
                           (crit = +105.0°C, hyst =  +5.0°C)
                           (emerg = +135.0°C, hyst =  +5.0°C)
    
    nct6798-isa-0290
    Adapter: ISA adapter
    in0:                        1.42 V  (min =  +0.00 V, max =  +1.74 V)
    in1:                      992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
    in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in3:                        3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in4:                        1.74 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in5:                      592.00 mV (min =  +0.00 V, max =  +0.00 V)
    in6:                      992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
    in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in8:                        3.34 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in9:                        1.78 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in10:                       0.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in11:                     192.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
    in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in13:                       1.34 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    in14:                     888.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
    fan1:                     1679 RPM  (min =    0 RPM)
    fan2:                      809 RPM  (min =    0 RPM)
    fan3:                        0 RPM  (min =    0 RPM)
    fan4:                        0 RPM  (min =    0 RPM)
    fan5:                        0 RPM  (min =    0 RPM)
    fan6:                        0 RPM  (min =    0 RPM)
    fan7:                        0 RPM  (min =    0 RPM)
    SYSTIN:                    +45.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
    CPUTIN:                    +46.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
    AUXTIN0:                   +26.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
    AUXTIN1:                  +127.0°C  (high = +80.0°C, hyst = +75.0°C)  ALARM
                                        (crit = +125.0°C)  sensor = thermistor
    AUXTIN2:                  +100.0°C  (high = +80.0°C, hyst = +75.0°C)  ALARM
                                        (crit = +125.0°C)  sensor = thermistor
    AUXTIN3:                   +32.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +100.0°C)  sensor = thermistor
    AUXTIN4:                   +50.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +100.0°C)
    PECI Agent 0 Calibration:  +51.0°C  (high = +80.0°C, hyst = +75.0°C)
    PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
    PCH_CHIP_TEMP:              +0.0°C 
    PCH_CPU_TEMP:               +0.0°C 
    PCH_MCH_TEMP:               +0.0°C 
    TSI0_TEMP:                 +63.5°C 
    TSI1_TEMP:                 +66.8°C 
    intrusion0:               ALARM
    intrusion1:               ALARM
    beep_enable:              disabled
    
    asusec-isa-0000
    Adapter: ISA adapter
    CPU Core:      1.45 V 
    CPU_Opt:        0 RPM
    Chipset:     2839 RPM
    Water_Flow:     0 RPM
    Chipset:      +66.0°C 
    CPU:          +50.0°C 
    Motherboard:  +45.0°C 
    T_Sensor:     -40.0°C 
    VRM:          +44.0°C 
    Water_In:     -40.0°C 
    Water_Out:    -40.0°C 
    CPU:          21.00 A 
    
    k10temp-pci-00c3
    Adapter: PCI adapter
    Tctl:         +65.0°C 
    Tccd1:        +47.2°C 
    Tccd2:        +59.8°C
  3. I shut down all other VMs and kept the TrueNAS one running, but it didn't work.
  4. I detached the NVIDIA card from the TrueNAS VM to see if I saw any errors on the screen. Nothing showed up, and it didn't help.
  5. I run memtest86 to ensure the memory is ok and the test completes successfully:
    20250628_184201.jpg

What's bugging me is that there is no error message or anything, it just freezes...

Do you know what else I can do to get more data on why this happens?

Thanks, in advance

Itamar
 
your WD Blue SA510 2TB, i cant find any information about the flash cell type, nor PLP or DRAM cache.
Do you have any information?

# https://documents.sandisk.com/conte...-ssd/product-brief-wd-blue-sa510-sata-ssd.pdf

How "full" are your zfs pools? did you set a systemwide quota over every zfs pool?

What did smartctl -a <ssd-device> report for the WD Blue SA510 2TB, EOL TWB > 500 TB?

I don't know if the drives have DRAM. I "cheaped out" on these, as this home lab's I/O throughput was not projected to be high. Most of the I/O is on the proper NAS storage I manage through my TrueNAS setup. These drives only contain the Proxmox binaries and the TrueNAS VM.

I am not using that much space on it.

Untitled.png

Here is the output of the smartctl command.


Code:
root@pve:~# smartctl -a /dev/sdg
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-11-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WD Blue SA510 2.5 2TB
Serial Number:    23492G448702
LU WWN Device Id: 5 001b44 4a558364a
Firmware Version: 530400WD
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Jun 29 11:53:43 2025 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   3) minutes.
SCT capabilities:              (0x0035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1509
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33
165 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       3710
166 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       91
167 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       51
168 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       192
170 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       139
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       20
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   100   100   000    Old_age   Always       -       41 (Min/Max 11/56)
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
230 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       4
232 Available_Reservd_Space 0x0033   100   100   001    Pre-fail  Always       -       100
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       286924
234 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       36525
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       41053
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       3432
244 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:
root@pve:~# smartctl -a /dev/sdh
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-11-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WD Blue SA510 2.5 2TB
Serial Number:    23492G448710
LU WWN Device Id: 5 001b44 4a5583666
Firmware Version: 530400WD
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Jun 29 11:54:38 2025 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   3) minutes.
SCT capabilities:              (0x0035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1541
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
165 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       3097
166 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       109
167 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       59
168 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       209
170 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       160
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       18
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   100   100   000    Old_age   Always       -       44 (Min/Max 12/60)
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       1
230 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       5
232 Available_Reservd_Space 0x0033   100   100   001    Pre-fail  Always       -       100
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       327701
234 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       30486
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       41015
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       3400
244 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:
Please check if the CPU is concerned by the curse of the C6 Sleepstate. My 2400GE is, and now I have 4 Watt more power consumption when idle, but it runs after disabling C6 in BIOS.

If the CPU is not concerned it just was an idea.
 
  • Like
Reactions: itamarbudin
Please check if the CPU is concerned by the curse of the C6 Sleepstate. My 2400GE is, and now I have 4 Watt more power consumption when idle, but it runs after disabling C6 in BIOS.

If the CPU is not concerned it just was an idea.
Thanks! I used to disable this feature regularly, but for some reason, I forgot to do it this time. The machine had been functioning well for over two years. I've disabled it now, and I'm hopeful it will work properly.

I have a good feeling about this because yesterday, while I was using the machine, everything was fine. The crash only occurred when I left it unattended.

I'll provide an update soon.
 
Please check if the CPU is concerned by the curse of the C6 Sleepstate. My 2400GE is, and now I have 4 Watt more power consumption when idle, but it runs after disabling C6 in BIOS.

If the CPU is not concerned it just was an idea.

So far, so good. The server has not crashed. Everything is looking good.

The server hasn’t crashed in 18 hours, so it seems that was the issue.

Thank you so much :)