Proxmox crashing seemingly at random

bawjaws

Member
Apr 27, 2023
23
0
6
I've had proxmox running for a couple of years now without issue. However, over the last few weeks I've noticed that the server seems to be crashing at random times, although it does generally seem to be in the early hours of the morning.

I've attached an excerpt from the log file from around the time of the crash, but here is where the crash seems to happen:

Code:
Oct 21 04:37:44 pve kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000064841fff] usable
Oct 21 04:37:44 pve kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
Oct 21 04:37:44 pve kernel: BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] usable
Oct 21 04:37:44 pve kernel: BIOS-e820: [mem 0x000000000009e000-0x000000000009efff] reserved
Oct 21 04:37:44 pve kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
Oct 21 04:37:44 pve kernel: BIOS-provided physical RAM map:
Oct 21 04:37:44 pve kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Oct 21 04:37:44 pve kernel: x86/tme: not enabled by BIOS
Oct 21 04:37:44 pve kernel:   zhaoxin   Shanghai 
Oct 21 04:37:44 pve kernel:   Centaur CentaurHauls
Oct 21 04:37:44 pve kernel:   Hygon HygonGenuine
Oct 21 04:37:44 pve kernel:   AMD AuthenticAMD
Oct 21 04:37:44 pve kernel:   Intel GenuineIntel
Oct 21 04:37:44 pve kernel: KERNEL supported cpus:
Oct 21 04:37:44 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.11-4-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_aspm=off pcie_acs_override=downstream,multi>
Oct 21 04:37:44 pve kernel: Linux version 6.14.11-4-pve (build@proxmox) (gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC PMX 6.14.11-4 (2025-10>
-- Boot 5eca0a1af73b44e382716c32e152e741 --
Oct 21 04:36:40 pve pvestatd[1799]: pbs: error fetching datastores - 500 Can't connect to 192.168.1.81:8007 (No route to host)
Oct 21 04:36:30 pve pvestatd[1799]: pbs: error fetching datastores - 500 Can't connect to 192.168.1.81:8007 (No route to host)
Oct 21 04:36:21 pve pvestatd[1799]: pbs: error fetching datastores - 500 Can't connect to 192.168.1.81:8007 (No route to host)

The "error fetching datastores" is expected as I tend to leave the Proxmox Backup Server offline until I'm doing my monthly backups. From what I can see the crash is sudden and not logged. I'm wondering if anyone else can spot anything which might hint as to what the problem is? Failing that, is there anything else I can do to try to work out what is causing the server to crash and reboot?
 

Attachments

Hi, I can't spot the reason in the log.
If the server crashes and stays so (doesn't reboot on its own), I would check for any messages left on the console.
Another idea is checking / reinserting all cables. And executing the memtest at least for a night.
 
Hi, I can't spot the reason in the log.
If the server crashes and stays so (doesn't reboot on its own), I would check for any messages left on the console.
Another idea is checking / reinserting all cables. And executing the memtest at least for a night.
Thanks - it reboots automatically. I'll try the memtest and will check the cables.
 
Are you running on a UPS? It's possible you are experiencing power fluctuations/momentary outages causing reboots.
Funny you should suggest that. A mains power wobble was my initial suspect, as it always seems to happen in the early hours when our solar battery is charging from the mains. I had a look at the voltage monitor on some Meross smart plugs we have that log mains voltage to Home Assistant and the voltage was pretty steady in that time period. I think it only logs every minute so it's possible there was a short-lived spike or drop.

I have been thinking about getting a UPS but would rather not add more kit to my home office, especially if it's going to be noisy. I do need to get a remote syslog server set up though so I can look at proxmox and VM logs in one place to see if that gives any hints.
 
hi bawjaws,

mhm, the interesting part might be a little bit in advance of the reboot.

Which log level is configured on the system?
With the regular occurrence you might want to increase it temporarily.

BR, Lucas