Need help troubleshooting Proxmox failing to restart - syslog included

Jay L

Active Member
Jan 27, 2018
19
3
43
53
Hi,

I have an issue with Proxmox not coming up after a reboot.

Code:
uname -a
Linux proxmox1 4.13.16-2-pve #1 SMP PVE 4.13.16-47 (Mon, 9 Apr 2018 09:58:12 +0200) x86_64 GNU/Linux

Update: adding PVE version:

Code:
root@proxmox1:~# pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.2-pve4
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-25
pve-container: 2.0-21
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9

I am running PM with LXC containers and experience the following:
  1. The core Linux system comes up as I can SSH into the Proxmox host using the root login
  2. None of the containers come up even though they are configured to auto start. I see the following
    1. "pct list" and "pct status <vmid>" time out
    2. "pct start" fails too
    3. The web gui is inaccesible
    4. I even tried "reboot" and the system hung without rebooting. ("reboot -f" worked)
This is a recurring problem and the system eventually comes up after multiple reboots.

I have never seen this issue in previous Proxmox versions and am wondering how I troubleshoot it. (My concern is that this could be a hardware issue. Could it be a bug?) I have full root access to the server and so can access any logs that are required.

Thank you in advance.
 
Last edited:
Hi,

So I have been reviewing the logs and find something very interesting. This morning around 8:30 I changed the network settings in the PM host and then issued the "reboot" command. However, wtmp does not show a reboot until 8:40. As I recall, I did issue a "shutdown -r" around 8:40.

I think that the problem could have been that the the system did not fully shutdown initially and potentially got stuck somehow. It is not clear if my "shutdown -r" helped at all or if it just happened to coincide when the system successfully rebooted.

The relevant syslog from this time is below. A couple of things jump out at me:
  1. I triggered the shutdown at 8:30:14, and it does not appear to complete until 8:39:16.
  2. There is also a weird gap from 8:31:01 and 8:37:46.
  3. SSH was still running on the Proxmox through 8:38:13 because I was able to SSH in at that time
Is there something obviously problematic here?

Code:
Apr 25 08:30:14 proxmox1 pvedaemon[1564]: <root@pam> successful auth for user 'root@pam'
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target Sound Card.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping User Manager for UID 0...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target Graphical Interface.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target Multi-User System.
Apr 25 08:30:49 proxmox1 systemd[10648]: Stopped target Default.
Apr 25 08:30:49 proxmox1 systemd[10648]: Stopped target Basic System.
Apr 25 08:30:49 proxmox1 systemd[10648]: Stopped target Sockets.
Apr 25 08:30:49 proxmox1 systemd[10648]: Closed GnuPG cryptographic agent (access for web browsers).
Apr 25 08:30:49 proxmox1 systemd[10648]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Apr 25 08:30:49 proxmox1 smartd[850]: smartd received signal 15: Terminated
Apr 25 08:30:49 proxmox1 systemd[10648]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Apr 25 08:30:49 proxmox1 smartd[850]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.LITEONIT_LMH_256V2M_41_MSATA_256GB_SED-TW035DNT5508565KH0UZ.ata.state
Apr 25 08:30:49 proxmox1 systemd[10648]: Closed GnuPG network certificate management daemon.
Apr 25 08:30:49 proxmox1 smartd[850]: smartd is exiting (exit status 0)
Apr 25 08:30:49 proxmox1 systemd[10648]: Closed GnuPG cryptographic agent and passphrase cache.
Apr 25 08:30:49 proxmox1 systemd[10648]: Reached target Shutdown.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping LXC Container Monitoring Daemon...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target Login Prompts.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Regular background program processing daemon...
Apr 25 08:30:49 proxmox1 systemd[10648]: Starting Exit the Session...
Apr 25 08:30:49 proxmox1 systemd[10648]: Stopped target Paths.
Apr 25 08:30:49 proxmox1 systemd[10648]: Stopped target Timers.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Postfix Mail Transport Agent.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Postfix Mail Transport Agent (instance -)...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target ZFS startup target.
Apr 25 08:30:49 proxmox1 systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target RPC Port Mapper.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Kernel Samepage Merging (KSM) Tuning Daemon...
Apr 25 08:30:49 proxmox1 systemd[1]: Unmounting RPC Pipe File System...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped ZFS file system shares.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target ZFS pool import target.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped target Timers.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Proxmox VE replication runner.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Daily PVE download activities.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Daily apt upgrade and clean activities.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Daily apt download activities.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Session 92 of user root.
Apr 25 08:30:49 proxmox1 postfix[12144]: Postfix is running with backwards-compatible default settings
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping PVE guests...
Apr 25 08:30:49 proxmox1 postfix[12144]: See http://www.postfix.org/COMPATIBILITY_README.html for details
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Self Monitoring and Reporting Technology (SMART) Daemon...
Apr 25 08:30:49 proxmox1 postfix[12144]: To disable backwards compatibility use "postconf compatibility_level=2" and "postfix reload"
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping D-Bus System Message Bus...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping LSB: Ceph RBD Mapping...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Getty on tty1...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Kernel Samepage Merging (KSM) Tuning Daemon.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped LXC Container Monitoring Daemon.
Apr 25 08:30:49 proxmox1 systemd[10648]: Received SIGRTMIN+24 from PID 12117 (kill).
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Getty on tty1.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Regular background program processing daemon.
Apr 25 08:30:49 proxmox1 systemd[1]: Failed to propagate agent release message: Connection reset by peer
Apr 25 08:30:49 proxmox1 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
Apr 25 08:30:49 proxmox1 systemd[1]: Unmounted RPC Pipe File System.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped User Manager for UID 0.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Session 92 of user root.
Apr 25 08:30:49 proxmox1 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
Apr 25 08:30:49 proxmox1 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
Apr 25 08:30:49 proxmox1 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
Apr 25 08:30:49 proxmox1 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
Apr 25 08:30:49 proxmox1 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
Apr 25 08:30:49 proxmox1 systemd[1]: Removed slice User Slice of root.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Login Service...
Apr 25 08:30:49 proxmox1 systemd[1]: Stopping Permit User Sessions...
Apr 25 08:30:49 proxmox1 systemd[1]: Removed slice system-getty.slice.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Permit User Sessions.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped D-Bus System Message Bus.
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped LSB: Ceph RBD Mapping.
Apr 25 08:30:49 proxmox1 postfix/postfix-script[12164]: stopping the Postfix mail system
Apr 25 08:30:49 proxmox1 postfix/master[1474]: terminating on signal 15
Apr 25 08:30:49 proxmox1 systemd[1]: Stopped Postfix Mail Transport Agent (instance -).
Apr 25 08:30:49 proxmox1 systemd[1]: Removed slice system-postfix.slice.
Apr 25 08:30:50 proxmox1 systemd[1]: Stopped Login Service.
Apr 25 08:30:53 proxmox1 pve-guests[12209]: <root@pam> starting task UPID:proxmox1:00002FB9:004301A9:5AE074FD:stopall::root@pam:
Apr 25 08:30:53 proxmox1 pvesh[12209]: Stopping CT 139 (timeout = 60 seconds)
Apr 25 08:30:53 proxmox1 pve-guests[12217]: <root@pam> starting task UPID:proxmox1:00002FBB:004301AD:5AE074FD:vzshutdown:139:root@pam:
Apr 25 08:30:53 proxmox1 pve-guests[12219]: shutdown CT 139: UPID:proxmox1:00002FBB:004301AD:5AE074FD:vzshutdown:139:root@pam:
Apr 25 08:30:53 proxmox1 pvesh[12209]: Stopping CT 138 (timeout = 60 seconds)
Apr 25 08:30:53 proxmox1 pve-guests[12217]: <root@pam> starting task UPID:proxmox1:00002FBD:004301B0:5AE074FD:vzshutdown:138:root@pam:
Apr 25 08:30:53 proxmox1 pve-guests[12221]: shutdown CT 138: UPID:proxmox1:00002FBD:004301B0:5AE074FD:vzshutdown:138:root@pam:
Apr 25 08:30:53 proxmox1 pvesh[12209]: Stopping CT 137 (timeout = 60 seconds)
Apr 25 08:30:53 proxmox1 pve-guests[12217]: <root@pam> starting task UPID:proxmox1:00002FC0:004301B2:5AE074FD:vzshutdown:137:root@pam:
Apr 25 08:30:53 proxmox1 pve-guests[12224]: shutdown CT 137: UPID:proxmox1:00002FC0:004301B2:5AE074FD:vzshutdown:137:root@pam:
Apr 25 08:30:53 proxmox1 pvesh[12209]: Stopping CT 135 (timeout = 60 seconds)
Apr 25 08:30:53 proxmox1 pve-guests[12217]: <root@pam> starting task UPID:proxmox1:00002FD0:004301B8:5AE074FD:vzshutdown:135:root@pam:
Apr 25 08:30:53 proxmox1 pve-guests[12240]: shutdown CT 135: UPID:proxmox1:00002FD0:004301B8:5AE074FD:vzshutdown:135:root@pam:
Apr 25 08:30:53 proxmox1 kernel: [43914.803598] vmbr0: port 4(veth138i0) entered disabled state
Apr 25 08:30:53 proxmox1 kernel: [43914.829356] device veth138i0 left promiscuous mode
Apr 25 08:30:53 proxmox1 kernel: [43914.829367] vmbr0: port 4(veth138i0) entered disabled state
Apr 25 08:30:53 proxmox1 kernel: [43914.862761] vmbr0: port 5(veth139i0) entered disabled state
Apr 25 08:30:53 proxmox1 kernel: [43914.870542] device veth139i0 left promiscuous mode
Apr 25 08:30:53 proxmox1 kernel: [43914.870556] vmbr0: port 5(veth139i0) entered disabled state
Apr 25 08:30:57 proxmox1 kernel: [43918.223566] vmbr0: port 2(veth135i0) entered disabled state
Apr 25 08:30:57 proxmox1 kernel: [43918.231937] device veth135i0 left promiscuous mode
Apr 25 08:30:57 proxmox1 kernel: [43918.231949] vmbr0: port 2(veth135i0) entered disabled state
Apr 25 08:31:00 proxmox1 pve-guests[12217]: end task UPID:proxmox1:00002FBD:004301B0:5AE074FD:vzshutdown:138:root@pam:
Apr 25 08:31:00 proxmox1 pve-guests[12217]: end task UPID:proxmox1:00002FBB:004301AD:5AE074FD:vzshutdown:139:root@pam:
Apr 25 08:31:01 proxmox1 pve-guests[12217]: end task UPID:proxmox1:00002FD0:004301B8:5AE074FD:vzshutdown:135:root@pam:
Apr 25 08:37:46 proxmox1 pct[12603]: <root@pam> starting task UPID:proxmox1:00003141:0043A30E:5AE0769A:vzstart:135:root@pam:
Apr 25 08:37:46 proxmox1 pct[12609]: starting CT 135: UPID:proxmox1:00003141:0043A30E:5AE0769A:vzstart:135:root@pam:
Apr 25 08:37:46 proxmox1 systemd[1]: Starting PVE LXC Container: 135...
Apr 25 08:37:48 proxmox1 kernel: [44329.307985] EXT4-fs (dm-10): mounted filesystem with ordered data mode. Opts: (null)
Apr 25 08:37:48 proxmox1 kernel: [44329.383835] IPv6: ADDRCONF(NETDEV_UP): veth135i0: link is not ready
Apr 25 08:37:48 proxmox1 systemd-udevd[12623]: Could not generate persistent MAC address for vethOEJSI7: No such file or directory
Apr 25 08:37:49 proxmox1 kernel: [44330.802632] vmbr0: port 2(veth135i0) entered blocking state
Apr 25 08:37:49 proxmox1 kernel: [44330.802637] vmbr0: port 2(veth135i0) entered disabled state
Apr 25 08:37:49 proxmox1 kernel: [44330.802795] device veth135i0 entered promiscuous mode
Apr 25 08:38:13 proxmox1 pct[12609]: command 'systemctl start pve-container@135' failed: received interrupt
Apr 25 08:38:13 proxmox1 pct[12603]: <root@pam> end task UPID:proxmox1:00003141:0043A30E:5AE0769A:vzstart:135:root@pam: command 'systemctl start pve-container@135' failed: received interrupt
Apr 25 08:39:16 proxmox1 systemd[1]: pve-container@135.service: Start operation timed out. Terminating.
Apr 25 08:39:16 proxmox1 systemd[1]: Failed to start PVE LXC Container: 135.
Apr 25 08:39:16 proxmox1 systemd[1]: pve-container@135.service: Unit entered failed state.
Apr 25 08:39:16 proxmox1 systemd[1]: pve-container@135.service: Failed with result 'timeout'.

****Boot Starts Below****

Apr 25 08:42:03 proxmox1 systemd-modules-load[279]: Inserted module 'iscsi_tcp'
Apr 25 08:42:03 proxmox1 systemd-modules-load[279]: Inserted module 'ib_iser'
Apr 25 08:42:03 proxmox1 systemd-modules-load[279]: Inserted module 'vhost_net'
Apr 25 08:42:03 proxmox1 systemd[1]: Starting Flush Journal to Persistent Storage...
Apr 25 08:42:03 proxmox1 systemd[1]: Started udev Kernel Device Manager.
Apr 25 08:42:03 proxmox1 systemd[1]: Started Flush Journal to Persistent Storage.
Apr 25 08:42:03 proxmox1 systemd[1]: Started udev Coldplug all Devices.
Apr 25 08:42:03 proxmox1 systemd[1]: Starting udev Wait for Complete Device Initialization...
Apr 25 08:42:03 proxmox1 systemd[1]: Started Set the console keyboard layout.
Apr 25 08:42:03 proxmox1 systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Apr 25 08:42:03 proxmox1 systemd[1]: Starting Load/Save RF Kill Switch Status...
Apr 25 08:42:03 proxmox1 systemd[1]: Found device LITEONIT_LMH-256V2M-41_MSATA_256GB_SED 2.
Apr 25 08:42:03 proxmox1 systemd[1]: Found device /dev/pve/swap.
Apr 25 08:42:03 proxmox1 systemd[1]: Activating swap /dev/pve/swap...
Apr 25 08:42:03 proxmox1 systemd[1]: Started Load/Save RF Kill Switch Status.
Apr 25 08:42:03 proxmox1 systemd[1]: Activated swap /dev/pve/swap.
Apr 25 08:42:03 proxmox1 systemd[1]: Reached target Swap.
Apr 25 08:42:03 proxmox1 systemd-modules-load[279]: Inserted module 'zfs'
and so on...
 
I am hoping that someone can provide some guidance on this.

Thank you!
 
Hi,

what do you see on the console of the proxmox host when you do the reboot? Are you able / have you tried to 'tweak' the 'default' boot stanza being passed to the kernel? (ie, don't let it auto-boot with default!)

My test-dev proxmox box was giving me some pain recently after a reboot, with very little visible on console (ie, nothing ..) . I had to reboot, intervene and edit my boot option stanza,

-- removed the 'quiet' so that I could better see what was happening
-- added the 'fastboot' option to discourage it from doing a FSCK on a disk that it thought was a good idea to force a check on.

in my case this helped - I had a 2Tb non-raid 'junk misc storage drive' attached and mounted and .. default filesystem options decided after my nice long uptime, a forced checkdisk was a good thing. Which was hard to see - when it was 'quiet' booting.

clearly - you may be in a different circumstance entirely. But sometimes by default console can be encouraged to give you more info / if you haven't done this already ?

Tim
 
Tim,

Thank you. I have not checked the console. I will perform some more exploration and capture it to see if I can get more details.

Thank you again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!