node went down and I had to restart it to get it back online

m0t0k0

New Member
Nov 27, 2022
3
0
1
Around 4 am this morning my proxmox node went offline. When I woke up I reset the box and it came back up like had not been an issue.

I have the syslogs for when it happened but it looks like all my backups were run correctly then the node stopped logging anything until the system was rebooted.

I have had this happen once before but as it was isolated I thought nothing of it and that maybe we had had a power cut in the night or something. That was at least over a month ago now.

It seems similar to this post https://forum.proxmox.com/threads/sporadic-node-crashes-no-ssh-no-logs.118613/ but we have different hardware

How would I go about diagnosing the issue, please?

System
Proxmox 7.2-11
Intel i5-5200U


Nov 27 04:01:16 proxmox kernel: vmbr0: port 4(tap101i0) entered disabled state
Nov 27 04:01:16 proxmox qmeventd[1470]: read: Connection reset by peer
Nov 27 04:01:17 proxmox systemd[1]: 101.scope: Succeeded.
Nov 27 04:01:17 proxmox systemd[1]: 101.scope: Consumed 26.349s CPU time.
Nov 27 04:01:17 proxmox pvescheduler[2055325]: INFO: Finished Backup of VM 101 (00:00:44)
Nov 27 04:01:18 proxmox pvescheduler[2055325]: INFO: Starting Backup of VM 200 (lxc)
Nov 27 04:01:18 proxmox qmeventd[2061635]: Starting cleanup for 101
Nov 27 04:01:18 proxmox qmeventd[2061635]: Finished cleanup for 101
Nov 27 04:02:08 proxmox pvescheduler[2055325]: INFO: Finished Backup of VM 200 (00:00:51)
Nov 27 04:02:08 proxmox pvescheduler[2055325]: INFO: Starting Backup of VM 201 (lxc)
Nov 27 04:03:20 proxmox pvescheduler[2055325]: INFO: Finished Backup of VM 201 (00:01:12)
Nov 27 04:03:20 proxmox pvescheduler[2055325]: INFO: Backup job finished successfully
Nov 27 04:17:01 proxmox CRON[2078095]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 27 04:17:01 proxmox CRON[2078096]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Nov 27 04:17:01 proxmox CRON[2078095]: pam_unix(cron:session): session closed for user root
-- Reboot --
Nov 27 09:05:44 proxmox kernel: Linux version 5.15.53-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.53-1 (Fri, 26 Aug 2022 16:53:52 +0200) ()
Nov 27 09:05:44 proxmox kernel: Command line: BOOT_IMAGE=/vmlinuz-5.15.53-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on
Nov 27 09:05:44 proxmox kernel: KERNEL supported cpus:
Nov 27 09:05:44 proxmox kernel: Intel GenuineIntel
Nov 27 09:05:44 proxmox kernel: AMD AuthenticAMD
Nov 27 09:05:44 proxmox kernel: Hygon HygonGenuine
Nov 27 09:05:44 proxmox kernel: Centaur CentaurHauls
Nov 27 09:05:44 proxmox kernel: zhaoxin Shanghai
 
Usually, sporadic resets or crashes without any logs hint at hardware failure of some kind, but are usually hard to diagnose. The most simple things to rule out would be whether the disks are in good condition with smartctl and running a memtest. You could also try whether this problem appears on another installation/distro, but if it doesn't happen very often, this might simply be too inconvenient.

As user "apoc" put it very well in the post you linked:
If the behaviour is so unpredictable then it is really hard to get to the bottom of it. You only can rule out things one by one. From the hardware side it could be a bad power supply, which reacts on environmental changes (e.g. heat) and/or what comes from the socket. It also can be an issue with your memory, but typically this leads to a kernel panic which you should be able to see on the console. Then there might be the disk/ssd, but I would expect some more stable behaviour in this case, but again you never know. I'd expect the file system to go read only... Theoretically it also can be an out of memory condition. Have you limited the zfs memory usage? Start investigating 1by1. Take your time. Be patient.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!