Random Shut Down – pveproxy Triggering Reboot

D

Deleted member 306531

Guest
Hello everyone,

I've been a long-time Proxmox user, but recently I’ve encountered an issue that I just can't figure out.
About a week ago, my Proxmox server started randomly shutting down instantly, without any prior warning.
Journalctl doesnt show anything unusual right before the shutdown occurs; instead, it displays a normal shutdown sequence:

Code:
Jun 05 18:25:00 Ripper systemd[1]: Stopping user@0.service - User Manager for UID 0...
Jun 05 18:25:00 Ripper systemd[47517]: Activating special unit exit.target...
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target default.target - Main User Target.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target basic.target - Basic System.
Jun 05 18:25:00 Ripper systemd[47517]: Reached target shutdown.target - Shutdown.
Jun 05 18:25:00 Ripper systemd[47517]: Finished systemd-exit.service - Exit the Session.
Jun 05 18:25:00 Ripper systemd[1]: user@0.service: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: Removed slice user-0.slice - User Slice of UID 0.

Additionally, there is nothing relevant in dmesg, and the BMC doesn't report any hardware-related errors or issues.
Interestingly, this issue persisted even after I reinstalled Proxmox completely, making me suspect hardware problems. However, all hardware tests, including memtest and a 24-hour stress test, showed no issues.
Temps are looking completly normal and Disks are working as expected.
I then decided to monitor all system_shutdown calls using auditd. Surprisingly, I found the pve-proxy frequently calling reboot, sometimes multiple times per second:
Code:
Jun 05 22:28:04 Ripper audit[34422]: SYSCALL ... comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" key="system_shutdown"
Jun 05 22:28:05 Ripper audit[34724]: SYSCALL ... comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" key="system_shutdown"
Jun 05 22:28:10 Ripper audit[34422]: SYSCALL ... comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" key="system_shutdown"
(70766570726F787920776F726B6572 -> pveproxy worker)

Has anyone experienced a similar issue? Why would the server only sporadically shut down, despite pve-proxy calling reboot multiple times per second?
Any insight or guidance would be greatly appreciated!

My Specs:
MB: WRX80 Creator R2.0
CPU: AMD Ryzen Threadripper PRO 3955WX
RAM: 8x32GB DDR4 ECC
DISK: Running on a nvme raid 1, though the problem is also present when installed on other Disks.

I would greatly appreciate any help!
Thank you!
 
Hello everyone,

I've been a long-time Proxmox user, but recently I’ve encountered an issue that I just can't figure out.
About a week ago, my Proxmox server started randomly shutting down instantly, without any prior warning.
Journalctl doesnt show anything unusual right before the shutdown occurs; instead, it displays a normal shutdown sequence:

Code:
Jun 05 18:25:00 Ripper systemd[1]: Stopping user@0.service - User Manager for UID 0...
Jun 05 18:25:00 Ripper systemd[47517]: Activating special unit exit.target...
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target default.target - Main User Target.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target basic.target - Basic System.
Jun 05 18:25:00 Ripper systemd[47517]: Reached target shutdown.target - Shutdown.
Jun 05 18:25:00 Ripper systemd[47517]: Finished systemd-exit.service - Exit the Session.
Jun 05 18:25:00 Ripper systemd[1]: user@0.service: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: Removed slice user-0.slice - User Slice of UID 0.

that's not a system shutdown, that's a root user session being torn down ;)

Additionally, there is nothing relevant in dmesg, and the BMC doesn't report any hardware-related errors or issues.
Interestingly, this issue persisted even after I reinstalled Proxmox completely, making me suspect hardware problems. However, all hardware tests, including memtest and a 24-hour stress test, showed no issues.
Temps are looking completly normal and Disks are working as expected.
I then decided to monitor all system_shutdown calls using auditd. Surprisingly, I found the pve-proxy frequently calling reboot, sometimes multiple times per second:
Code:
Jun 05 22:28:04 Ripper audit[34422]: SYSCALL ... comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" key="system_shutdown"
Jun 05 22:28:05 Ripper audit[34724]: SYSCALL ... comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" key="system_shutdown"
Jun 05 22:28:10 Ripper audit[34422]: SYSCALL ... comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" key="system_shutdown"
(70766570726F787920776F726B6572 -> pveproxy worker)

could you maybe post the complete line? I think those mean something else, but without more data it's hard to tell what..

Has anyone experienced a similar issue? Why would the server only sporadically shut down, despite pve-proxy calling reboot multiple times per second?
Any insight or guidance would be greatly appreciated!

My Specs:
MB: WRX80 Creator R2.0
CPU: AMD Ryzen Threadripper PRO 3955WX
RAM: 8x32GB DDR4 ECC
DISK: Running on a nvme raid 1, though the problem is also present when installed on other Disks.

I'd run memtest on the memory just to make sure it's not faulty.. and install any available system firmware updates
 
that's not a system shutdown, that's a root user session being torn down ;)



could you maybe post the complete line? I think those mean something else, but without more data it's hard to tell what..



I'd run memtest on the memory just to make sure it's not faulty.. and install any available system firmware updates
Hey,
Thank you for your answer.

I must have somehow truncated my journal log, here is the full output, though it seems like it really is just the session:
Code:
Jun 05 18:25:00 Ripper systemd[1]: Stopping user@0.service - User Manager for UID 0...
Jun 05 18:25:00 Ripper systemd[47517]: Activating special unit exit.target...
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target default.target - Main User Target.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target basic.target - Basic System.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target paths.target - Paths.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target sockets.target - Sockets.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target timers.target - Timers.
Jun 05 18:25:00 Ripper systemd[47517]: Closed dirmngr.socket - GnuPG network certificate management daemon.
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Jun 05 18:25:00 Ripper systemd[47517]: Removed slice app.slice - User Application Slice.
Jun 05 18:25:00 Ripper systemd[47517]: Reached target shutdown.target - Shutdown.
Jun 05 18:25:00 Ripper systemd[47517]: Finished systemd-exit.service - Exit the Session.
Jun 05 18:25:00 Ripper systemd[47517]: Reached target exit.target - Exit the Session.
Jun 05 18:25:00 Ripper systemd[1]: user@0.service: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: Stopped user@0.service - User Manager for UID 0.
Jun 05 18:25:00 Ripper systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Jun 05 18:25:00 Ripper systemd[1]: run-user-0.mount: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: user-runtime-dir@0.service: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Jun 05 18:25:00 Ripper systemd[1]: Removed slice user-0.slice - User Slice of UID 0.
--- Reboot happend ---

You were completely right about the syscalls being something else:
Although syscall 48 is called shutdown, its for shutting down a SOCKET :(:
Code:
Jun 05 22:28:10 Ripper audit[34724]: SYSCALL arch=c000003e syscall=48 success=yes exit=0 a0=d a1=1 a2=5ac3f7700210 a3=5ac40feced60 items=0 ppid=2075 pid=34724 auid=4294967295 uid=33 gid=33 euid=33 suid=33 fsuid=33 egid=33 sgid=33 fsgid=33 tty=(none) ses=4294967295 comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" subj=unconfined key="system_shutdown"
Jun 05 22:28:15 Ripper audit[34303]: SYSCALL arch=c000003e syscall=48 success=yes exit=0 a0=a a1=1 a2=5ac3f7700210 a3=5ac40fadd4e0 items=0 ppid=2075 pid=34303 auid=4294967295 uid=33 gid=33 euid=33 suid=33 fsuid=33 egid=33 sgid=33 fsgid=33 tty=(none) ses=4294967295 comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" subj=unconfined key="system_shutdown"
Jun 05 22:28:17 Ripper audit[34303]: SYSCALL arch=c000003e syscall=48 success=yes exit=0 a0=a a1=1 a2=5ac3f7700210 a3=5ac40fb9c628 items=0 ppid=2075 pid=34303 auid=4294967295 uid=33 gid=33 euid=33 suid=33 fsuid=33 egid=33 sgid=33 fsgid=33 tty=(none) ses=4294967295 comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" subj=unconfined key="system_shutdown"

I am now running another memtest, hopefully my problem will reveal itself.
 
Hey,
Thank you for your answer.

I must have somehow truncated my journal log, here is the full output, though it seems like it really is just the session:
Code:
Jun 05 18:25:00 Ripper systemd[1]: Stopping user@0.service - User Manager for UID 0...
Jun 05 18:25:00 Ripper systemd[47517]: Activating special unit exit.target...
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target default.target - Main User Target.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target basic.target - Basic System.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target paths.target - Paths.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target sockets.target - Sockets.
Jun 05 18:25:00 Ripper systemd[47517]: Stopped target timers.target - Timers.
Jun 05 18:25:00 Ripper systemd[47517]: Closed dirmngr.socket - GnuPG network certificate management daemon.
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Jun 05 18:25:00 Ripper systemd[47517]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Jun 05 18:25:00 Ripper systemd[47517]: Removed slice app.slice - User Application Slice.
Jun 05 18:25:00 Ripper systemd[47517]: Reached target shutdown.target - Shutdown.
Jun 05 18:25:00 Ripper systemd[47517]: Finished systemd-exit.service - Exit the Session.
Jun 05 18:25:00 Ripper systemd[47517]: Reached target exit.target - Exit the Session.
Jun 05 18:25:00 Ripper systemd[1]: user@0.service: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: Stopped user@0.service - User Manager for UID 0.
Jun 05 18:25:00 Ripper systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Jun 05 18:25:00 Ripper systemd[1]: run-user-0.mount: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: user-runtime-dir@0.service: Deactivated successfully.
Jun 05 18:25:00 Ripper systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Jun 05 18:25:00 Ripper systemd[1]: Removed slice user-0.slice - User Slice of UID 0.
--- Reboot happend ---

no, it really is just a user session. you can see systemd[1] there, which is the system-wide systemd instance, and it is logging that it is tearing down the user session for uid 0. all the systemd[47517] log lines are logged by the systemd instance in that user session, including the "Reached target shutdown.target - Shutdown" message - that just refers to the user session being shut down ;)

a (clean) system shutdown would end with something like:

Code:
Jun 05 16:43:27 host systemd[1]: Reached target shutdown.target - System Shutdown.
Jun 05 16:43:27 host systemd[1]: Reached target final.target - Late Shutdown Services.
Jun 05 16:43:27 host systemd[1]: systemd-poweroff.service: Deactivated successfully.
Jun 05 16:43:27 host systemd[1]: Finished systemd-poweroff.service - System Power Off.
Jun 05 16:43:27 host systemd[1]: Reached target poweroff.target - System Power Off.
Jun 05 16:43:27 host systemd[1]: Shutting down.
Jun 05 16:43:27 host systemd-shutdown[1]: Syncing filesystems and block devices.
Jun 05 16:43:27 host systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Jun 05 16:43:27 host systemd-journald[545]: Received SIGTERM from PID 1 (systemd-shutdow).
Jun 05 16:43:27 host systemd-journald[545]: Journal stopped

You were completely right about the syscalls being something else:
Although syscall 48 is called shutdown, its for shutting down a SOCKET :(:
Code:
Jun 05 22:28:10 Ripper audit[34724]: SYSCALL arch=c000003e syscall=48 success=yes exit=0 a0=d a1=1 a2=5ac3f7700210 a3=5ac40feced60 items=0 ppid=2075 pid=34724 auid=4294967295 uid=33 gid=33 euid=33 suid=33 fsuid=33 egid=33 sgid=33 fsgid=33 tty=(none) ses=4294967295 comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" subj=unconfined key="system_shutdown"
Jun 05 22:28:15 Ripper audit[34303]: SYSCALL arch=c000003e syscall=48 success=yes exit=0 a0=a a1=1 a2=5ac3f7700210 a3=5ac40fadd4e0 items=0 ppid=2075 pid=34303 auid=4294967295 uid=33 gid=33 euid=33 suid=33 fsuid=33 egid=33 sgid=33 fsgid=33 tty=(none) ses=4294967295 comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" subj=unconfined key="system_shutdown"
Jun 05 22:28:17 Ripper audit[34303]: SYSCALL arch=c000003e syscall=48 success=yes exit=0 a0=a a1=1 a2=5ac3f7700210 a3=5ac40fb9c628 items=0 ppid=2075 pid=34303 auid=4294967295 uid=33 gid=33 euid=33 suid=33 fsuid=33 egid=33 sgid=33 fsgid=33 tty=(none) ses=4294967295 comm=70766570726F787920776F726B6572 exe="/usr/bin/perl" subj=unconfined key="system_shutdown"

suspected as much :) this is a normal part of connection teardown.

I am now running another memtest, hopefully my problem will reveal itself.

keep in mind it could also be some other kind of instablity or hardware issue. on consumer hardware, it's sometimes necessary to disable power saving features/states or other UEFI options, for example. if there is an UEFI update, I'd recommend installing it!