Hi everyone
I have been experiencing an annoying issue with my home lab "server" for the last week, and I hope someone can help.
The server "hangs" or "freezes" without any error message. I have been running this setup for two years now without any issues.
Setup details:
Core Server:
The TrueNAS exposes the primary storage for my other VMs, but I only run a simple Windows and Ubuntu machine.
This setup was working fine for the last two years without any issues.
Here is some additional information:
There is no apparent error message in the log file. For example, the server crash happened on 6/28/2025 at 23:49:24, and I had to reset the server at 6:36:53 this morning physically. I can see these instances of logs every time the server crashes.
Note: the server crashed while I was typing this post as well
What did I try already?
What's bugging me is that there is no error message or anything, it just freezes...
Do you know what else I can do to get more data on why this happens?
Thanks, in advance
Itamar
I have been experiencing an annoying issue with my home lab "server" for the last week, and I hope someone can help.
The server "hangs" or "freezes" without any error message. I have been running this setup for two years now without any issues.
Setup details:
Core Server:
- CPU: AMD Ryzen 9 5900x
- Motherboard: ASUS ROG Crosshair VIII Hero (WI-FI) with the latest BIOS (5002)
- RAM: 4X Kingston KHX3000C16D4/32GX 32GB - 128GB Total (Memory was on the motherboard QVL list)
- Storage:
- 2x WD Blue SA510 2TB running as ZFS Mirror
- NIC: Intel Corporation Ethernet Controller I225-V (I am not using the NIC on my motherboard as it is only 2.5GBs)
- ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card Supports 4 NVMe M.2 hosting the following NVMe drives:
- 2 x INTEL MEMPEK1J064GA
- 2 X SABRENT 256GB Rocket NVMe PCIe M.2 2280
- LSI 9300-16i 16-Port 12Gb/s SAS Controller HBA Card with the following HDDs connected to it:
- 6 x Seagate IronWolf 10TB connected to the HBA
- 6 x Seagate IronWolf 6TB connected through an external QNAP TL-D800S 8 Bay SATA 6Gbps JBOD Storage Enclosure using a nifty external Mini SAS 26pin (SFF-8088) Male to Mini SAS 26 (SFF-8088)
- NVIDIA Quadro 4000
The TrueNAS exposes the primary storage for my other VMs, but I only run a simple Windows and Ubuntu machine.
This setup was working fine for the last two years without any issues.
Here is some additional information:
Code:
root@pve:~# pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-11-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8: 6.8.12-11
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.8-pve2
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.11
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-4
pve-ha-manager: 4.0.7
pve-i18n: 3.4.4
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
There is no apparent error message in the log file. For example, the server crash happened on 6/28/2025 at 23:49:24, and I had to reset the server at 6:36:53 this morning physically. I can see these instances of logs every time the server crashes.
Note: the server crashed while I was typing this post as well
Code:
Jun 28 23:49:23 pve kernel: scsi host4: iSCSI Initiator over TCP/IP
Jun 28 23:49:23 pve kernel: connection1200:0: detected conn error (1020)
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.21.5,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.213.154,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.177.242,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.0.10,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.0.1,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.16.0.1,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.212.255,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.97.104,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.60.9,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.125.90,3260] through [iface: default] is shutdown.
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.201.100,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.225.244,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Connection-1:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 172.17.229.164,3260] through [iface: default] is shutdow>
Jun 28 23:49:24 pve iscsid[2458]: Could not set session1200 priority. READ/WRITE throughout and latency could be affected.
Jun 28 23:49:24 pve iscsid[2458]: connection1200:0 login rejected: initiator error - target not found (02/03)
Jun 28 23:49:24 pve iscsid[2458]: Connection1200:0 to [target: iqn.2005-10.org.freenas.ctl:vm, portal: 192.168.1.232,3260] through [iface: default] is shutdo>
-- Boot b2298eda57b040af8f7baa73314b624f --
Jun 29 06:36:53 pve kernel: Linux version 6.8.12-11-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP>
Jun 29 06:36:53 pve kernel: Command line: initrd=\EFI\proxmox\6.8.12-11-pve\initrd.img-6.8.12-11-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
Jun 29 06:36:53 pve kernel: KERNEL supported cpus:
Jun 29 06:36:53 pve kernel: Intel GenuineIntel
Jun 29 06:36:53 pve kernel: AMD AuthenticAMD
Jun 29 06:36:53 pve kernel: Hygon HygonGenuine
Jun 29 06:36:53 pve kernel: Centaur CentaurHauls
Jun 29 06:36:53 pve kernel: zhaoxin Shanghai
Jun 29 06:36:53 pve kernel: BIOS-provided physical RAM map:
Jun 29 06:36:53 pve kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
What did I try already?
- I thought it might be a thermal issue, so I refreshed my CPU thermal paste (it was a bit dry), and temps are stable:
-
Code:
root@pve:~# sensors nouveau-pci-0500 Adapter: PCI adapter fan1: 2460 RPM temp1: +50.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) nct6798-isa-0290 Adapter: ISA adapter in0: 1.42 V (min = +0.00 V, max = +1.74 V) in1: 992.00 mV (min = +0.00 V, max = +0.00 V) ALARM in2: 3.39 V (min = +0.00 V, max = +0.00 V) ALARM in3: 3.30 V (min = +0.00 V, max = +0.00 V) ALARM in4: 1.74 V (min = +0.00 V, max = +0.00 V) ALARM in5: 592.00 mV (min = +0.00 V, max = +0.00 V) in6: 992.00 mV (min = +0.00 V, max = +0.00 V) ALARM in7: 3.39 V (min = +0.00 V, max = +0.00 V) ALARM in8: 3.34 V (min = +0.00 V, max = +0.00 V) ALARM in9: 1.78 V (min = +0.00 V, max = +0.00 V) ALARM in10: 0.00 V (min = +0.00 V, max = +0.00 V) ALARM in11: 192.00 mV (min = +0.00 V, max = +0.00 V) ALARM in12: 1.02 V (min = +0.00 V, max = +0.00 V) ALARM in13: 1.34 V (min = +0.00 V, max = +0.00 V) ALARM in14: 888.00 mV (min = +0.00 V, max = +0.00 V) ALARM fan1: 1679 RPM (min = 0 RPM) fan2: 809 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) fan6: 0 RPM (min = 0 RPM) fan7: 0 RPM (min = 0 RPM) SYSTIN: +45.0°C (high = +80.0°C, hyst = +75.0°C) (crit = +125.0°C) sensor = thermistor CPUTIN: +46.0°C (high = +80.0°C, hyst = +75.0°C) (crit = +125.0°C) sensor = thermistor AUXTIN0: +26.0°C (high = +80.0°C, hyst = +75.0°C) (crit = +125.0°C) sensor = thermistor AUXTIN1: +127.0°C (high = +80.0°C, hyst = +75.0°C) ALARM (crit = +125.0°C) sensor = thermistor AUXTIN2: +100.0°C (high = +80.0°C, hyst = +75.0°C) ALARM (crit = +125.0°C) sensor = thermistor AUXTIN3: +32.0°C (high = +80.0°C, hyst = +75.0°C) (crit = +100.0°C) sensor = thermistor AUXTIN4: +50.0°C (high = +80.0°C, hyst = +75.0°C) (crit = +100.0°C) PECI Agent 0 Calibration: +51.0°C (high = +80.0°C, hyst = +75.0°C) PCH_CHIP_CPU_MAX_TEMP: +0.0°C PCH_CHIP_TEMP: +0.0°C PCH_CPU_TEMP: +0.0°C PCH_MCH_TEMP: +0.0°C TSI0_TEMP: +63.5°C TSI1_TEMP: +66.8°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled asusec-isa-0000 Adapter: ISA adapter CPU Core: 1.45 V CPU_Opt: 0 RPM Chipset: 2839 RPM Water_Flow: 0 RPM Chipset: +66.0°C CPU: +50.0°C Motherboard: +45.0°C T_Sensor: -40.0°C VRM: +44.0°C Water_In: -40.0°C Water_Out: -40.0°C CPU: 21.00 A k10temp-pci-00c3 Adapter: PCI adapter Tctl: +65.0°C Tccd1: +47.2°C Tccd2: +59.8°C
- I shut down all other VMs and kept the TrueNAS one running, but it didn't work.
- I detached the NVIDIA card from the TrueNAS VM to see if I saw any errors on the screen. Nothing showed up, and it didn't help.
- I run memtest86 to ensure the memory is ok and the test completes successfully:
What's bugging me is that there is no error message or anything, it just freezes...
Do you know what else I can do to get more data on why this happens?
Thanks, in advance
Itamar