Random restarts/shut offs, At My Wits’ End

elospace

New Member
Mar 8, 2024
4
0
1
Hey everyone. I’ve been experiencing frustrating random restarts on my Proxmox server and I can’t seem to pinpoint the cause. There is no shutdown process visible in the logs, just what seems to be a straight power cut, and then due to BIOS setting being to return to on state on power recovery, it turns on again. Here are the specs:

  • Motherboard: asus prime b760m a d4 csm (recently replaced, problem continues)
  • CPU: i5-12500T (bought second hand)
  • RAM: 128 GB (Memtested with no errors, and running no expo)
  • Storage: 2× Intel DC SSDs (ZFS mirror for boot/VMs) + 6× HDDs for media
  • HBA: Fujitsu D3307-A12
  • NICs: 2× i226v (added a different NIC around when reboots started, but could be coincidence or misremembering)
  • PSU: Fractal Ion Gold 750W, About to replace it, just in case.
  • Cooling: Cranked up all fans, plus a PCIe dual-fan expansion to cool HBA & NIC
The server is hooked up to a UPS alongside two other machines that never experience any issues (UPS load ~20%). Restarts happen sporadically—sometimes multiple times in a single day, other times weeks apart. I’ve scoured the logs and haven’t found errors or abnormal CPU/RAM usage or temps before these events.

So far I have:

  1. Memtested all the RAM (no errors).
  2. Swapped out the motherboard entirely.
  3. Checked logs for CPU usage, temps, etc.
  4. Adding extra cooling with pcie fan expansion.
  5. PSU replacement is next.
  6. Set motherboard BIOS settings to default, disabled c-states.
Is it possible that some settings like pcie ASPM are causing issues?

Nothing has conclusively fixed the issue. Has anyone else here dealt with random restarts? Any suggestions on further troubleshooting steps or weird one-off issues I might be overlooking? I’d appreciate any advice. Thanks in advance!
 
There is no shutdown process visible in the logs, just what seems to be a straight power cut
Just wondering, did you see that all LEDs were switched off, fans stopped working, etc? I'm just wondering because the way you describe it, it seems to be a hardware issue - maybe a PSU issue. If this is not the case (e.g. system restarts but does NOT turn off first), there are some things we can check.
There is no shutdown process visible in the logs
Are you talking about the journal? What does journalctl --since <TIME>, with a time around the time of the crash, show?
 
Just wondering, did you see that all LEDs were switched off, fans stopped working, etc? I'm just wondering because the way you describe it, it seems to be a hardware issue - maybe a PSU issue. If this is not the case (e.g. system restarts but does NOT turn off first), there are some things we can check.
Unfortunately, I have not been able to visually see the machine restart. However, there is no shutdown process activated (in the sense of PVE shutting down LXCs/VMs etc. I've replaced the PSU now and turned of "Restore on Power loss" so I'll know the next time (if) it happens if its just a cut or an actual reboot.


Are you talking about the journal? What does journalctl --since <TIME>, with a time around the time of the crash, show?
journalctl shows nothing around the time of the crash. During each crash I also regularly checked the system logs in the Proxmox web UI, and it showed nothing out of the ordinary, and then suddenly it says --reboot-- .

Code:
Feb 26 03:38:46 bedrock pveproxy[125731]: worker exit
Feb 26 03:38:46 bedrock pveproxy[2553]: worker 125731 finished
Feb 26 03:38:46 bedrock pveproxy[2553]: starting 1 worker(s)
Feb 26 03:38:46 bedrock pveproxy[2553]: worker 141001 started
-- Reboot --
Feb 26 03:41:45 bedrock kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-8 (2025-01-24T12:32Z) ()
Feb 26 03:41:45 bedrock kernel: Command line: initrd=\EFI\proxmox\6.8.12-8-pve\initrd.img-6.8.12-8-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on
Feb 26 03:41:45 bedrock kernel: KERNEL supported cpus: