Are reboots logged before, or after, a reboot?

surfrock66

Well-Known Member
Feb 10, 2020
51
10
48
41
I'm troubleshooting random host reboots which have been causing me huge headaches. I see them in the logs, and nothing appears to precede them.
Code:
Mar 17 17:06:28 sr66-prox-03 pveproxy[348253]: Clearing outdated entries from certificate cache
Mar 17 17:07:10 sr66-prox-03 snmpd[4026]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Mar 17 17:07:37 sr66-prox-03 pveproxy[340366]: worker exit
Mar 17 17:07:37 sr66-prox-03 pveproxy[4548]: worker 340366 finished
Mar 17 17:07:37 sr66-prox-03 pveproxy[4548]: starting 1 worker(s)
Mar 17 17:07:37 sr66-prox-03 pveproxy[4548]: worker 348605 started
Mar 17 17:07:45 sr66-prox-03 pveproxy[348605]: Clearing outdated entries from certificate cache
Mar 17 17:08:10 sr66-prox-03 snmpd[4026]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Mar 17 17:09:09 sr66-prox-03 pmxcfs[4186]: [status] notice: received log
Mar 17 17:09:10 sr66-prox-03 snmpd[4026]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Mar 17 17:10:10 sr66-prox-03 snmpd[4026]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Mar 17 17:11:10 sr66-prox-03 snmpd[4026]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Mar 17 17:12:10 sr66-prox-03 snmpd[4026]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
-- Reboot --
Mar 17 17:16:52 sr66-prox-03 kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-8 (2025-01-24T12:32Z) ()
Mar 17 17:16:52 sr66-prox-03 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-8-pve root=UUID=4fbd2c0b-dcd7-44d9-9139-495d8f107f19 ro quiet
Mar 17 17:16:52 sr66-prox-03 kernel: KERNEL supported cpus:
Mar 17 17:16:52 sr66-prox-03 kernel:   Intel GenuineIntel
Mar 17 17:16:52 sr66-prox-03 kernel:   AMD AuthenticAMD

I have correlated this with iDRAC hardware logs, and though I see a backplane reset, my research shows that this should NOT be an issue and is a normal operation, and if anything, the OS would see the disk in slot 8 disappear and reappear (which I have not observed, and I have seen this log at times that do NOT trigger a reboot, as is the case at the top of this screenshot):

1742310947661.png

So, going back to the PVE logs, I see the "-- Reboot --" line, and I'm wondering, is this logged at the END of the session, or the BEGINNING of the session? It's the difference to whether the OS knows a reboot is happening, or if it is figuring out that it happened afterwards. The iDRAC log is indicating the CPU reset is caused by the power cycle, but doesn't indicate the source, if the OS triggered it, or if it's some sort of hard reset.
 
the -- Reboot -- gets inserted as a marker between boots, if the reboot was initiated by the OS that should be clearly visible in the lines before it (services being shut down in an orderly fashion, shutdown.target being reached, ..). so this looks more like a hard crash or power reset.
 
Frustrating, as the idrac logs expose no smoking gun; I actually got a new backplane thinking it was the issue and it didn't reduce that message. There's a loose correlation with the drive reset message, but it feels like it doesn't make sense to cause an unexpected reboot. Nothing is exposed in the UPS log either.
 
if you can get a serial or netconsole (or a persistent SSH connection tailing the logs) you might get more information if it happens again, sometimes the kernel is able to print something on the console but no longer able to persist anything to disk. of course, if it's a more fundamental hardware problem even that might not happen/be possible.