Strange Incident - Server Self-Powered Off

Elfy

Well-Known Member
Dec 29, 2016
58
56
58
34
I recently encountered a rather bizarre situation and thought I'd share it. I have this server situated in a secure data center where only one other person and I have access. Nevertheless, the server unexpectedly shut itself down.

The system in question is a SuperMicro 2028TP-HC0R 2U, outfitted with the X10DRT-P motherboard. I've attached a screenshot of the logs, capturing the power-down event for reference.

Any ideas on how such an occurrence could even take place? Your thoughts and suggestions are greatly appreciated.
Screenshot 2023-08-07 101923.jpg
It then shuts itself down again shortly after being powered back on via IPMI:
1691428471470.png

Code:
root@Janeway:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.39-1-pve: 5.15.39-1
ceph: 16.2.13-pve1
ceph-fuse: 16.2.13-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

(For the Star Trek nerds, it can only be Q).
 
Last edited:
I recently encountered a rather bizarre situation and thought I'd share it. I have this server situated in a secure data center where only one other person and I have access. Nevertheless, the server unexpectedly shut itself down.

The system in question is a SuperMicro 2028TP-HC0R 2U, outfitted with the X10DRT-P motherboard. I've attached a screenshot of the logs, capturing the power-down event for reference.

Any ideas on how such an occurrence could even take place? Your thoughts and suggestions are greatly appreciated.
View attachment 53939
It then shuts itself down again shortly after being powered back on via IPMI:
View attachment 53943

Code:
root@Janeway:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.39-1-pve: 5.15.39-1
ceph: 16.2.13-pve1
ceph-fuse: 16.2.13-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

(For the Star Trek nerds, it can only be Q).
i am having the same exact issue. the only thing that i've found in the logs is that power button press. the current fix i am trying out is that i have went in and disabled the power button with the following:

change power button behavior:

nano /etc/systemd/logind.conf

#HandlePowerKey=poweroff change to HandlePowerKey=ignore

systemctl restart systemd-logind
 
  • Like
Reactions: Elfy
Thank you so much @RolandK and @johndoe297 for your suggestions. After a lot of trial and error I was able to determine the cause of the problem was most likely a CPU that was overheating, triggering the IPMI (BMC, IDRAC whatever you want to call it) to shut down the system. The strange thing is that there is nothing in the IPMI logs to indicate this, except for a few lines about "CPU temperature assertion". However, I purchased new CPUs and swapped out the old ones and haven't had the problem happen again since.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!