Unexpected Shutdown

Iceman13

New Member
Aug 15, 2021
13
0
1
33
Hey Everyone,

I have a proxmox server running for about 2 years now. Recently the server has been shutting down and not rebooting. Usually if power fails the server reboot and functions properly. In the past week it has gone down 3 times with no explanation and has to be manually started up again. I have checked the web ui system logs to no avail.
Any advice on how to troubleshoot this behavior.

thanks
 
If a machine does not start automatically when power is restored, it is usually a BIOS setting (or did I not understand your question correctly?).
If the BIOS is set correctly, then it might not be unexpected power loss but a software shutdown, configured somewhere (sorry, I would not know where to look).
Is your journal persistent? Is there any useful information found in journalctl -b -1 (or how many boots ago if not -1)?
If power restore is intermittent, the machine might interpret it as a failure to POST or boot and decide to quit trying. In data centers it sometimes happens when power is restored all server pull a lot of power to start up (like many hard drives) and the power distribution network fails again (with a breaker tripping for example). Resetting the breaker only results in the same effect, tripping it again. If power loss is common, a UPS with delayed start might be helpful.
 
Hey avw
If a machine does not start automatically when power is restored, it is usually a BIOS setting (or did I not understand your question correctly?).
If the BIOS is set correctly, then it might not be unexpected power loss but a software shutdown, configured somewhere (sorry, I would not know where to look).
Is your journal persistent? Is there any useful information found in journalctl -b -1 (or how many boots ago if not -1)?
If power restore is intermittent, the machine might interpret it as a failure to POST or boot and decide to quit trying. In data centers it sometimes happens when power is restored all server pull a lot of power to start up (like many hard drives) and the power distribution network fails again (with a breaker tripping for example). Resetting the breaker only results in the same effect, tripping it again. If power loss is common, a UPS with delayed start might be helpful.
Thanks for the reply. I dont thinkits a bios issue since the retarts were fine for nearly two years. These shutdowns were happening after upgrading to proxmox 7.
here is what i got from the journalctl a few of the lines were in yellow.
Aug 12 21:10:56 PVE kernel: .... node #1, CPUs: #6
Aug 12 21:10:56 PVE kernel: smpboot: CPU 6 Converting physical 0 to logical die 1
Aug 12 21:10:56 PVE kernel: #7 #8 #9 #10 #11
Aug 12 21:10:56 PVE kernel: .... node #0, CPUs: #12
Aug 12 21:10:56 PVE kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
Aug 12 21:10:56 PVE kernel: #13 #14 #15 #16 #17
Aug 12 21:10:56 PVE kernel: .... node #1, CPUs: #18 #19 #20 #21 #22 #23
Aug 12 21:10:56 PVE kernel: smp: Brought up 2 nodes, 24 CPUs
Aug 12 21:10:56 PVE kernel: smpboot: Max logical packages: 2
Aug 12 21:10:56 PVE kernel: pci_bus 0000:08: resource 1 [mem 0xc4000000-0xc60fffff]
Aug 12 21:10:56 PVE kernel: pci_bus 0000:80: resource 4 [io 0xc000-0xffff window]
Aug 12 21:10:56 PVE kernel: pci_bus 0000:80: resource 5 [mem 0xc8000000-0xfbffbfff window]
Aug 12 21:10:56 PVE kernel: pci 0000:00:05.0: disabled boot interrupts on device [8086:2f28]
Aug 12 21:10:56 PVE kernel: pci 0000:01:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)
Aug 12 21:10:56 PVE kernel: pci 0000:01:00.0: CLS mismatch (64 != 32), using 64 bytes
Aug 12 21:10:56 PVE kernel: pci 0000:08:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
Aug 12 21:10:56 PVE kernel: pci 0000:80:05.0: disabled boot interrupts on device [8086:2f28]
Aug 12 21:10:56 PVE kernel: Trying to unpack rootfs image as initramfs...
Aug 12 21:10:56 PVE kernel: tun: Universal TUN/TAP device driver, 1.6
Aug 12 21:10:56 PVE kernel: PPP generic driver version 2.4.2
Aug 12 21:10:56 PVE kernel: i8042: PNP: No PS/2 controller found.
Aug 12 21:10:56 PVE kernel: mousedev: PS/2 mouse device common for all mice
Aug 12 21:10:56 PVE kernel: rtc_cmos 00:00: RTC can wake from S4
Aug 12 21:10:56 PVE kernel: rtc_cmos 00:00: registered as rtc0
Aug 12 21:10:56 PVE kernel: rtc_cmos 00:00: setting system clock to 2021-08-13T01:10:53 UTC (1628817053)
Aug 12 21:10:56 PVE kernel: rtc_cmos 00:00: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
Aug 12 21:10:56 PVE kernel: i2c /dev entries driver
Aug 12 21:10:56 PVE kernel: device-mapper: uevent: version 1.0.3
Aug 12 21:10:56 PVE kernel: device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com
Aug 12 21:10:56 PVE kernel: platform eisa.0: Probing EISA bus 0
Aug 12 21:10:56 PVE kernel: platform eisa.0: EISA: Cannot allocate resource for mainboard
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 1
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 2
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 3
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 4
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 5
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 6
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 7
Aug 12 21:10:56 PVE kernel: platform eisa.0: Cannot allocate resource for EISA slot 8
Aug 12 21:10:56 PVE kernel: platform eisa.0: EISA: Detected 0 cards
Aug 12 21:10:56 PVE kernel: intel_pstate: Intel P-state driver initializing
Aug 12 21:10:56 PVE kernel: ledtrig-cpu: registered to indicate activity on CPUs
Aug 12 21:10:56 PVE kernel: drop_monitor: Initializing network drop monitor service
Aug 12 21:10:56 PVE kernel: NET: Registered protocol family 10
Aug 12 21:10:56 PVE kernel: nvme nvme0: missing or invalid SUBNQN field.
Aug 12 21:10:56 PVE kernel: nvme nvme0: Shutdown timeout set to 8 seconds
Aug 12 21:10:56 PVE systemd[1]: Finished Create Static Device Nodes in /dev.
Aug 12 21:10:56 PVE systemd[1]: Starting Rule-based Manager for Device Events and Files...
Aug 12 21:10:56 PVE kernel: iscsi: registered transport (iser)
Aug 12 21:10:56 PVE kernel: spl: loading out-of-tree module taints kernel.
Aug 12 21:10:56 PVE kernel: znvpair: module license 'CDDL' taints kernel.
Aug 12 21:10:56 PVE kernel: Disabling lock debugging due to kernel taint
Aug 12 21:10:56 PVE systemd[1]: Started Rule-based Manager for Device Events and Files.
Aug 12 21:10:56 PVE systemd-journald[688]: Journal started
Aug 12 21:10:56 PVE systemd-journald[688]: Runtime Journal (/run/log/journal/8950705d296240d9b6f728752bac0e06) is 8.0M, max 320.2M, 312.2M free.
Aug 12 21:10:56 PVE systemd-modules-load[689]: Inserted module 'vfio'Aug 12 21:10:56 PVE kernel: RAPL PMU: hw unit of domain dram 2^-16 Joules
Aug 12 21:10:56 PVE kernel: cryptd: max_cpu_qlen set to 1000
Aug 12 21:10:56 PVE kernel: power_meter ACPI000D:00: Found ACPI power meter.
Aug 12 21:10:56 PVE kernel: power_meter ACPI000D:00: Ignoring unsafe software power cap!
Aug 12 21:10:56 PVE kernel: power_meter ACPI000D:00: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
Aug 12 21:10:56 PVE kernel: AVX2 version of gcm_enc/dec engaged.

also i noticed another thread on this topic where members were showing their logs and it was the same as mine, where the replication runner was working adn then it just says reboot, but does not reboot. https://forum.proxmox.com/threads/pve-server-reboot.94502/
thats the thread

thanks for the help

thanks for the help
 
Last edited: