ghostly reboot at midnight

Nemesiz

Renowned Member
Jan 16, 2009
770
69
93
Lithuania
The server has rebooted unattended for the last 3 nights.

The first night I thought maybe it had something to do with sending backups to another server. In this process, the backup server is started using IPMI and the backup server is shut down after completion. But in the morning the backup server was still working.

The last 2 reboots happened outside of the backup process.

The log shows nothing interesting.

I don't want to switch to ZFS 2.2 yet, so I'm running 6.2.16-20-pve with 2.1.14-pve1

Has anyone else had a similar problem?


# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.2.16-20-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5: 6.5.3-1
proxmox-kernel-6.5.3-1-pve: 6.5.3-1
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.5
pve-qemu-kvm: 8.1.2-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1


Code:
# journalctl --list-boots
IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
 -5 42a4ac5b43e1487ea25b3fda78a0e935 Wed 2023-11-29 19:07:43 UTC Sun 2024-01-07 01:31:54 UTC
 -4 78f9ee8b95ce45bba68478cac9daec0b Sun 2024-01-07 01:37:54 UTC Sun 2024-01-07 01:52:00 UTC
 -3 98344cf970554554b5db4746a90c1a5f Sun 2024-01-07 01:53:49 UTC Sun 2024-01-14 23:34:29 UTC
 -2 999d72e4c5b247aaae478cc8b3034496 Sun 2024-01-14 23:38:16 UTC Mon 2024-01-15 21:41:53 UTC
 -1 a5f3a9845e114b86a806b433998ef28d Mon 2024-01-15 21:45:17 UTC Wed 2024-01-17 00:10:01 UTC
  0 a7247037834c442b9006bd201c39e281 Wed 2024-01-17 00:14:17 UTC Wed 2024-01-17 06:43:29 UTC
 
Hi,
The server has rebooted unattended for the last 3 nights.
any update you could correlate this with? Do you have latest BIOS update and CPU microcode installed: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_firmware_cpu ?

The log shows nothing interesting.
Might still be worth sharing the last bit before the reboots. It might be a crash where the log doesn't make it to the disk. You could connect from another host via SSH, run journalctl -f and wait for the next time it crashes. If you are lucky you'll get more log output there.
 
  • Like
Reactions: jsterr
Hi,

any update you could correlate this with? Do you have latest BIOS update and CPU microcode installed: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_firmware_cpu ?


Might still be worth sharing the last bit before the reboots. It might be a crash where the log doesn't make it to the disk. You could connect from another host via SSH, run journalctl -f and wait for the next time it crashes. If you are lucky you'll get more log output there.

Hello fiona,

This night it happened again. BIOS and CPU microcode are up to date. External log monitoring didn't give a clue.
System worked very well for a long time before. On 2024-01-07 I did update, perhaps I need to go back to proxmox-kernel-6.2.16-18-pve
 

Attachments

  • update.txt
    80.4 KB · Views: 2
Hello fiona,

This night it happened again. BIOS and CPU microcode are up to date. External log monitoring didn't give a clue.
That's unfortunate. Without any logs, we're in the dark.
System worked very well for a long time before. On 2024-01-07 I did update, perhaps I need to go back to proxmox-kernel-6.2.16-18-pve
You can certainly try booting an older kernel to see if the issue lies there.
 
No need to wait for the night. Rebooted again.

I`m running kernel 6.2.16-18-pve now with ZFS 2.1.13-pve1 - is it possible to update this kernel with ZFS 2.1.14 ?
 
I`m running kernel 6.2.16-18-pve now with ZFS 2.1.13-pve1 - is it possible to update this kernel with ZFS 2.1.14 ?
I'm afraid you'd have to compile it yourself. At least the version of the kernel module is fixed in a kernel build.
 
Is this a standalone Proxmox VE node or is it part of a HA cluster? If the latter is the case maybe your node gets fenced because of increased latency on the corosync link around that time?
 
Server is running for 19 days without random reboot. I changed network card to Intel 10GB and speeded CPU cooler. One reason in my mind was CPU temp spike.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!