PBS random shutdown

bunk3m

Member
Feb 22, 2023
11
4
8
In front of my computer in Canada
Hi,

I've been running PBS as a backup of my home Proxmox for at least 6-9 months without any issues.

Over the past couple of weeks, I've had backups fail because PBS has shut down. This appears to have started with the second last update which I did maybe a month ago.

It is getting frustrating as I can't figure out why it randomly restarts or shuts down. (Two days ago it shutdown at 09:22. Yesterday it shutdown at 22:53.)

When checking the log it usually says:
Code:
May 15 22:39:47 pbs systemd-logind[1021]: Power key pressed short.
May 15 22:39:47 pbs systemd-logind[1021]: Powering off...
May 15 22:39:47 pbs systemd-logind[1021]: System is powering down.

Given that no one was around nor has access to the PBS ... is there any reason that PBS would randomly decide that the "Power key pressed short" and thereby initiate a shutdown?

Thanks in advance.
 
Hi,
that is indeed a strange behavior, have you check hardware issues already? Most likely either a bad connection or faulty power button? You could try to physically disconnect the power button from the motherboard and see if the issue persists.

Also, see https://bbs.archlinux.org/viewtopic.php?id=282064 and maybe try to use the described workaround by setting HandlePowerKey=ignore in /etc/systemd/logind.conf.
 
  • Like
Reactions: bunk3m
Hi,
that is indeed a strange behavior, have you check hardware issues already? Most likely either a bad connection or faulty power button? You could try to physically disconnect the power button from the motherboard and see if the issue persists.

Also, see https://bbs.archlinux.org/viewtopic.php?id=282064 and maybe try to use the described workaround by setting HandlePowerKey=ignore in /etc/systemd/logind.conf.
Thank you @chris. I'll open it up, clean it up and look to discharge any static first. Then I'll see what happens. Next step after that will be to `HandlePowerKey=ignore`. I usually try to be as "stock" as possible so if there is an update, I don't have to remember to manually change any settings.
 
Thanks @Chris for your suggestions. I completed the cleaning and re-seated the ram and various sata cables. Since the cleaning PBS has been running without issue. I'm crossing my fingers and assuming that this was the fix. Thanks again for your quick reply and help!
 
  • Like
Reactions: Chris
Now 11 days later and it started acting up again.

Code:
Jun 08 12:09:26 pbs systemd-logind[1024]: Power key pressed short.
Jun 08 12:09:26 pbs systemd-logind[1024]: Powering off...
Jun 08 12:09:26 pbs systemd-logind[1024]: System is powering down.
Jun 08 12:09:26 pbs systemd-logind[1024]: Power key pressed short.
Then three hours later it reboots ...????
Code:
Jun 08 12:09:29 pbs systemd-journald[490]: Journal stopped
-- Reboot --
Jun 08 15:45:38 pbs kernel: Linux version 6.8.12-10-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) ()
J

Then a bit over 2 days later
Code:
Jun 10 18:40:00 pbs proxmox-backup-proxy[1336]: rrd journal successfully committed (33 files in 0.065 seconds)
Jun 10 19:04:33 pbs systemd-logind[1024]: Power key pressed short.
Jun 10 19:04:33 pbs systemd-logind[1024]: Powering off...
Jun 10 19:04:33 pbs systemd-logind[1024]: System is powering down.
and random restart 5 hrs later
Code:
Jun 10 19:04:37 pbs systemd-journald[493]: Journal stopped
-- Reboot --
Jun 11 00:05:19 pbs kernel: Linux version 6.8.12-10-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) ()

Going to have to collect some more data and watch the log some more.
 
Last edited:
  • Like
Reactions: tcabernoch
What kind of server do ya have there?

Backup is usually an afterthought. Backup quite frequently happens on the oldest, lousiest piece of gear you have. Does the PBS server have a large dent and several coffee stains? Maybe its running an original Itanium?

Is it suffocating? Got enough air? ... or is it in the shed ... without any AC at all?

How's the power situation? Any possibility of brownouts when your AC kicks on? As its likely to have started doing w/hot weather. Maybe try a different circuit?

Does your cat sleep in that room? Does she hate servers?
 
Last edited:
  • Like
Reactions: cwt
Or do you use an UPS which is overloaded? Your PSU is sufficient and not china-fire-cracker-quality?
 
Now 11 days later and it started acting up again.

Code:
Jun 08 12:09:26 pbs systemd-logind[1024]: Power key pressed short.
Jun 08 12:09:26 pbs systemd-logind[1024]: Powering off...
Jun 08 12:09:26 pbs systemd-logind[1024]: System is powering down.
Jun 08 12:09:26 pbs systemd-logind[1024]: Power key pressed short.
Then three hours later it reboots ...????
Code:
Jun 08 12:09:29 pbs systemd-journald[490]: Journal stopped
-- Reboot --
Jun 08 15:45:38 pbs kernel: Linux version 6.8.12-10-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) ()
J

Then a bit over 2 days later
Code:
Jun 10 18:40:00 pbs proxmox-backup-proxy[1336]: rrd journal successfully committed (33 files in 0.065 seconds)
Jun 10 19:04:33 pbs systemd-logind[1024]: Power key pressed short.
Jun 10 19:04:33 pbs systemd-logind[1024]: Powering off...
Jun 10 19:04:33 pbs systemd-logind[1024]: System is powering down.
and random restart 5 hrs later
Code:
Jun 10 19:04:37 pbs systemd-journald[493]: Journal stopped
-- Reboot --
Jun 11 00:05:19 pbs kernel: Linux version 6.8.12-10-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) ()

Going to have to collect some more data and watch the log some more.
For reference a possible cause: https://forum.proxmox.com/threads/strange-incident-server-self-powered-off.131826/
 
What kind of server do ya have there?

Backup is usually an afterthought. Backup quite frequently happens on the oldest, lousiest piece of gear you have. Does the PBS server have a large dent and several coffee stains? Maybe its running an original Itanium?

Is it suffocating? Got enough air? ... or is it in the shed ... without any AC at all?

How's the power situation? Any possibility of brownouts when your AC kicks on? As its likely to have started doing w/hot weather. Maybe try a different circuit?

Does your cat sleep in that room? Does she hate servers?
Thank you everyone for your help and suggestions!

@Chris , I'll review that thread and cross my fingers that it helps.

To answer @tcabernoch
I have an older Lenovo workstation with 8 core Intel Core i7-2600 CPU @ 3.40GHz, 8GB RAM. OS is on a WD Black 1 TB drive. So yes, it is on older hardware but still capable. It has enough air as it sits in the open in the basement. The power is fine. It hasn't been hot enough for the AC to kick in and there have been no power brownouts.

@cwt At the moment it is plugged directly into a wall power, bypassing the APS, as I try to make changes to reduce the variables in debugging. Our cat passed away before covid so that isn't a variable. The APS is a few years old and is APS brand.

What I scratch my head about is that it turns off by itself. Then randomly starts up again.

I'm not sure if the power supply is getting finicky but not sure how to check that. Or perhaps the fan is going. There must be some logs about this somewhere but haven't found it yet.
 
  • Like
Reactions: tcabernoch
If its not enterprise gear, I don't know what to expect here, but if was enterprise gear, I'd enable the Watchdog Timer.

Watchdog is a BIOS-level function. Go into the BIOS and see if you have that feature.

If you do have Watchdog and its enabled, the machine will continually check itself for a hung CPU.
If Watchdog finds that it has crashed, the machine will log a Watchdog event and reboot.
And then you have the event log to refer to. It doesn't tell you much, mostly just CPU or RAM faults.

Or no Watchdog event is registered.
You can't infer too much from the negative case, but if Watchdog is _not_ triggering, then its likely to not be CPU or RAM.