Uncontrolled shutdown

innostus

Member
Feb 15, 2022
9
0
6
43
Ones in a while, my PVE server shuts itself down. But not a clean shutdown. VM's are not shutdown before the servers stops. What I can find in the logs:

Mar 07 00:00:52 pveserver pvefw-logger[28347]: received terminate request (signal)​
Mar 07 00:00:52 pveserver systemd[1]: Reloaded PVE SPICE Proxy Server.​
Mar 07 00:00:52 pveserver spiceproxy[1929]: server shutdown (restart)​
Mar 07 00:00:52 pveserver spiceproxy[1929]: server closing​
Mar 07 00:00:52 pveserver spiceproxy[1929]: received signal HUP​
Mar 07 00:00:52 pveserver spiceproxy[1982172]: send HUP to 1929​
Mar 07 00:00:52 pveserver systemd[1]: Reloading PVE SPICE Proxy Server.​
Mar 07 00:00:52 pveserver systemd[1]: Reloaded PVE API Proxy Server.​
Mar 07 00:00:52 pveserver pveproxy[1923]: server shutdown (restart)​
Mar 07 00:00:52 pveserver pveproxy[1923]: server closing​
Mar 07 00:00:52 pveserver pveproxy[1923]: received signal HUP​
Mar 07 00:00:52 pveserver pveproxy[1982149]: send HUP to 1923​

However, after this, the server is still running for 2 hours before it actually stopped. It seems to be waiting for some internal processes to end. Normally I see a message like this at arround 03:05.

Mar 07 02:05:30 pveserver pmxcfs[1747]: [dcdb] notice: data verification successful​

This message is not being logged. Because one of the VM's is a zabbix server, I can determine the time of actually stopping to be between 02:50 and 02:53.

Has anyone any explanation on why pveproxy and spiceproxy received a HUP signal?

Kind regards,

Arnold Boer
 
logrotate most likely, and thus likely unrelated to your shutdown issue..
 
But why does the logrotate result in a shutdown of pveproxy and spiceproxy? And why is that not happening every day at that same time?
 
it doesn't shut it down, it reloads the service so that the log files are flushed. it happens daily by default.
 
setting up serial or network console access might also help to catch errors..
 
I already have network console access. The server seems to be just power off. First I thought it was a mainboard issue. I had help from the mainboard manufacturer, but no success. No clues. Can be mainboard, cpu, psu, memory or OS. The only log line that appears in the IPMI event log is:


Tuesday, March 7th 2023, 2:50:56 amUnknown602chSPS / ME16hmicrocontroller_or_coprocessor16h0ah00hffhffhME Power State - Asserted
Event Data1 ME Power State
Event Data2 Transition to Running
Event Data3 N/A

No good explanation from the mainboard manufacturer. Help is very much appreciated. I already had to buy a second server, to be able to deal with these outages.
 
that does sound like a hardware issue
 
I ran memtest several times. It passed. I also tried to stress the server with stress-ng. No problem at all. During normal operation, the load on the server is low. I do not expect that load is an issue. Any ideas on how to proceed?
 
if there is no output on the console whatsoever there's nothing to go on - it could be an issue with thermals/power (maybe not dependent on the load on the system itself, but something else that's roughly happening at the same time every day?). I'd contact your hardware vendor and find out what exactly the IPMI event entails (e.g., does that mean somebody pressed the virtual/physical power button? does this mean the ME triggered "restart after powerloss"? does it say that for regular reboots? ...)
 
It happened again. This time clearly nothing in the logs on Proxmox. So I contacted the manufacturer, again. Now we wait.....
 
It happened again. This time clearly nothing in the logs on Proxmox. So I contacted the manufacturer, again. Now we wait.....
Interestingly I'm having the same issue, at the exact same time as the log rotation too.

Even more strangely, mine is a check_mk server, yours is zabbix.

Did you get anywhere?
 
Interestingly I'm having the same issue, at the exact same time as the log rotation too.

Even more strangely, mine is a check_mk server, yours is zabbix.

Did you get anywhere?
I got the mainboard replaced by the manufacturer. That solved the issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!