Unexpected reboot last night

SamTzu

Renowned Member
Mar 27, 2009
523
17
83
Helsinki, Finland
sami.mattila.eu
I can't explain this reboot on KVM host last night.

Code:
Jun 09 00:24:01 vm2405 CRON[2014686]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ]; then /usr/lib/zfs-linux/scrub; fi)
Jun 09 00:24:01 vm2405 zed[2014704]: eid=97 class=scrub_start pool='sdd'
Jun 09 00:24:04 vm2405 zed[2015034]: eid=99 class=scrub_start pool='vdd'
Jun 09 00:24:25 vm2405 zed[2015314]: eid=102 class=scrub_finish pool='sdd'
Jun 09 00:24:30 vm2405 CRON[2014685]: pam_unix(cron:session): session closed for user root
-- Reboot --
Jun 09 00:45:17 vm2405 kernel: Linux version 6.5.13-1-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-1 (2024-02-05T13:50Z) ()
Jun 09 00:45:17 vm2405 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.13-1-pve root=/dev/mapper/pve-root ro quiet
Jun 09 00:45:17 vm2405 kernel: KERNEL supported cpus:
Jun 09 00:45:17 vm2405 kernel:   Intel GenuineIntel
 
There is nothing in the log to go on. It was a power interruption (PSU or wall socket) and/or the logs could not be written to disk (drive or connector issue) or some hardware issue (memory or CPU or motherboard). It could be as simple as a voltage drop because of very active drives (vdd scrub) and an worn-down (and/or hot) PSU.
Or something completely different. Unless you can reproduce it, it's impossible to test what it might be.
 
@SamTzu Don't suppose the box that rebooted is a server with some form of BMC or similar controller?

If it is, that would generally have a record of any hardware problems/faults that caused a reboot.

From a different angle, is the host part of a cluster? If there was some form of network problem causing the host to lose quorum, then it's possible it could have been rebooted by the watchdog.
 
  • Like
Reactions: leesteken
@SamTzu Don't suppose the box that rebooted is a server with some form of BMC or similar controller?

If it is, that would generally have a record of any hardware problems/faults that caused a reboot.
Good point!
From a different angle, is the host part of a cluster? If there was some form of network problem causing the host to lose quorum, then it's possible it could have been rebooted by the watchdog.
Would that not show up in the system log? I don't have much experience, and maybe the OP did not show all of the logs, but I would expect some information from this mechanism.
 
Would that not show up in the system log?
When I was testing the watchdog recently in a 2 node cluster, prior to learning how to use a Qdevice, I was seeing watchdog reboots occur that didn't make it to the log on disk.

Using some kind of remote (off host) syslog collector would probably help with that particular log collection case, but even then if the watchdog reboots things fast enough it might not. Not super sure. :confused:
 
There are 5 KVM hosts on that server hardware.
vm2405 was the only Proxmox that rebooted.
All logs are fed to log server but unfortunately they do not reveal anything suspicious.
 
Last edited:
There are 5 KVM hosts on that server hardware.
Do you mean 5 VMs on Proxmox or 5 Proxmox on your server (using which hypervisor?)?
vm2405 was the only Proxmox that rebooted.
All logs are fed to log server but unfortunately they do not reveal anything suspicious.
Are we looking at the logs of a Proxmox or looking at a log of a VM? I assumed you showed Proxmox logs and the system rebooted, but now it sound like only a single VM rebooted. In case of the latter: check the Proxmox system log, for example for Out Of Memory (oom) killing the VM.
 
  • Like
Reactions: justinclift
Do you mean 5 VMs on Proxmox or 5 Proxmox on your server (using which hypervisor?)
I mean exactly what I wrote. 5 KVM hosts most of them hosting Proxmox that are nesting LXC containers.

The log snippet I posted was copied from the Proxmox host that rebooted.
 
Do you mean 5 VMs on Proxmox or 5 Proxmox on your server (using which hypervisor?)
I mean exactly what I wrote. 5 KVM hosts most of them hosting Proxmox that are nesting LXC containers.
That was not clear from the first post. Maybe check the logs from the KVM host that runs the Proxmox. Maybe it killed Proxmox for a reason (like OoM)?
Do you run the various Proxmoxes (on KVM hosts) as a cluster or as separate independent nodes?
The log snippet I posted was copied from the Proxmox host that rebooted.
Thank you for clearing that up. There is still no clue in that log but that makes a hardware error unlikely (since the hardware did not reboot but a KVM host rebooted). It does however raise questions why the KVM process was restarted. Maybe you can find out by looking at the logging of your KVM hosts?
 
There are 5 KVM hosts on that server hardware.
Weirdly enough, that's actually an unclear statement.

Is that meaning?
  1. You have 5 separate physical machines, and all five are running the same server hardware?
  2. You have 1 physical server, and you have 5 instances of Proxmox running as VM's (or similar) on that one server?
  3. Something else?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!