Random node shutdown/reboots

Republicus

Well-Known Member
Aug 7, 2017
137
22
58
41
I have one node that reboots on its own.
I haven't pinned down whats causing the system to shutdown/reboot

Ive replaced the all of the memory (which tested fine before replacing it), temps appear okay on the CPUs, otherwise I have a 10 gbe fiber card and an infiniband card that will next replace.


These are the best logs I have identified but I'm still not sure what exactly is causing my woes.
IPMI shutdown?

Anyone think they can help?

Code:
/var/log/messages:Dec  1 00:00:00 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="1029" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
/var/log/messages:Dec  1 03:58:36 node01 kernel: [    1.787465] reboot: Dell PowerEdge C6100 series board detected. Selecting PCI-method for reboots.
/var/log/messages:Dec  1 03:58:36 node01 rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.1901.0]
/var/log/messages:Dec  1 03:58:36 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="1049" x-info="https://www.rsyslog.com"] start
/var/log/messages:Dec  1 03:58:36 node01 kernel: [    6.607460] raid6: using ssse3x2 recovery algorithm
/var/log/messages:Dec  1 16:13:35 node01 kernel: [44108.234490] EXT4-fs (loop0): recovery complete
/var/log/messages:Dec  1 16:21:25 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="1049" x-info="https://www.rsyslog.com"] exiting on signal 15.
/var/log/messages:Dec  1 16:21:25 node01 rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.1901.0]
/var/log/messages:Dec  1 16:21:25 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="30737" x-info="https://www.rsyslog.com"] start
/var/log/syslog:Dec  1 00:00:00 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="1029" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
/var/log/syslog:Dec  1 00:00:00 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="1029" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
/var/log/syslog:Dec  1 03:58:36 node01 kernel: [    1.787465] reboot: Dell PowerEdge C6100 series board detected. Selecting PCI-method for reboots.
/var/log/syslog:Dec  1 03:58:36 node01 systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Dec  1 03:58:36 node01 rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.1901.0]
/var/log/syslog:Dec  1 03:58:36 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="1049" x-info="https://www.rsyslog.com"] start
/var/log/syslog:Dec  1 03:58:36 node01 kernel: [    6.607460] raid6: using ssse3x2 recovery algorithm
/var/log/syslog:Dec  1 03:58:39 node01 corosync[1438]:   [WD    ] resource load_15min missing a recovery key.
/var/log/syslog:Dec  1 03:58:39 node01 corosync[1438]:   [WD    ] resource memory_used missing a recovery key.
/var/log/syslog:Dec  1 16:13:35 node01 kernel: [44108.234490] EXT4-fs (loop0): recovery complete
/var/log/syslog:Dec  1 16:21:16 node01 pve-firewall[1454]: server shutdown (restart)
/var/log/syslog:Dec  1 16:21:22 node01 pve-ha-crm[1479]: server received shutdown request
/var/log/syslog:Dec  1 16:21:25 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="1049" x-info="https://www.rsyslog.com"] exiting on signal 15.
/var/log/syslog:Dec  1 16:21:25 node01 rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.1901.0]
/var/log/syslog:Dec  1 16:21:25 node01 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="30737" x-info="https://www.rsyslog.com"] start
grep: /var/log/apcupsd*/var/log/syslog:Dec  1 16:21:37 node01 pvedaemon[1473]: server shutdown (restart)
/var/log/syslog:Dec  1 16:21:38 node01 pveproxy[1481]: server shutdown (restart)
/var/log/syslog:Dec  1 16:21:39 node01 spiceproxy[1487]: server shutdown (restart)
: No such file or directory
/var/log/syslog:Dec  1 16:21:39 node01 pvestatd[1456]: server shutdown (restart)
 
Hi,
this is an requested shutdown.
The origin of this shutdown is not shown here.