Proxmox sometimes hangs on reboot

jaytee129

Member
Jun 16, 2022
142
10
23
I'm new to proxmox and my set up has worked fine for about a week but starting yesterday would hang sometimes when I called for a reboot.

I thought it was the fault of a Windows VM that didn't shutdown when told (a separate problem I'll have to post on) or one of the other linux based VMs running Sophos UTM but it happens even (though not always) when no VM is running.

I mv'ed syslog to syslog.bak, did a reboot, and went through the syslog file after the hung reboot (hard reset) and restart. Here's a segment of the file. What am I looking for? Is there a better file to look at for clues?

Code:
Jun 23 16:43:23 thibworldpx systemd[1]: Stopped Proxmox VE firewall logger.
Jun 23 16:43:24 thibworldpx pve-ha-crm[1366]: received signal TERM
Jun 23 16:43:24 thibworldpx pve-ha-crm[1366]: server received shutdown request
Jun 23 16:43:24 thibworldpx pve-ha-crm[1366]: server stopped
Jun 23 16:43:24 thibworldpx pveproxy[1367]: received signal TERM
Jun 23 16:43:24 thibworldpx pveproxy[1367]: server closing
Jun 23 16:43:24 thibworldpx pveproxy[1369]: worker exit
Jun 23 16:43:24 thibworldpx pveproxy[1370]: worker exit
Jun 23 16:43:24 thibworldpx pveproxy[1367]: worker 1368 finished
Jun 23 16:43:24 thibworldpx pveproxy[1367]: worker 1370 finished
Jun 23 16:43:24 thibworldpx pveproxy[1367]: worker 1369 finished
Jun 23 16:43:24 thibworldpx pveproxy[1367]: server stopped
Jun 23 16:43:25 thibworldpx systemd[1]: pve-ha-crm.service: Succeeded.
Jun 23 16:43:25 thibworldpx systemd[1]: Stopped PVE Cluster HA Resource Manager Daemon.
Jun 23 16:43:25 thibworldpx systemd[1]: pve-ha-crm.service: Consumed 1.430s CPU time.

hung - hard reset


Jun 23 16:48:34 thibworldpx systemd-modules-load[383]: Inserted module 'vfio'
Jun 23 16:48:34 thibworldpx dmeventd[394]: dmeventd ready for processing.
Jun 23 16:48:34 thibworldpx systemd-modules-load[383]: Inserted module 'vfio_pci'
Jun 23 16:48:34 thibworldpx lvm[394]: Monitoring thin pool pve-data-tpool.
Jun 23 16:48:34 thibworldpx systemd-modules-load[383]: Inserted module 'iscsi_tcp'
Jun 23 16:48:34 thibworldpx systemd[1]: Starting Flush Journal to Persistent Storage...
Jun 23 16:48:34 thibworldpx systemd-modules-load[383]: Inserted module 'ib_iser'
Jun 23 16:48:34 thibworldpx lvm[376]:   9 logical volume(s) in volume group "pve" monitored
Jun 23 16:48:34 thibworldpx systemd-modules-load[383]: Inserted module 'vhost_net'
Jun 23 16:48:34 thibworldpx systemd[1]: Finished Coldplug All udev Devices.
Jun 23 16:48:34 thibworldpx systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Jun 23 16:48:34 thibworldpx systemd[1]: Starting Wait for udev To Complete Device Initialization...
Jun 23 16:48:34 thibworldpx systemd[1]: Started Rule-based Manager for Device Events and Files.
Jun 23 16:48:34 thibworldpx systemd[1]: Finished Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Jun 23 16:48:34 thibworldpx systemd[1]: Reached target Local File Systems (Pre).
Jun 23 16:48:34 thibworldpx systemd[1]: Finished Flush Journal to Persistent Storage.
Jun 23 16:48:34 thibworldpx systemd-udevd[424]: Using default interface naming scheme 'v247'.
Jun 23 16:48:34 thibworldpx systemd-udevd[424]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 23 16:48:34 thibworldpx systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Jun 23 16:48:34 thibworldpx systemd-udevd[416]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 23 16:48:34 thibworldpx systemd[1]: Found device My_Book_25ED My_Book.
Jun 23 16:48:34 thibworldpx systemd-modules-load[383]: Inserted module 'zfs'
Jun 23 16:48:34 thibworldpx systemd[1]: Found device Samsung_SSD_850_PRO_256GB 2.
Jun 23 16:48:34 thibworldpx systemd[1]: Finished Load Kernel Modules.
 
Below is more useless info from syslog (to me anyway) from what is now a hanging every time I reboot - with or without any VMs running.

Is there another place to look for clues as to what is happening? Are there any clues in this extract? Do I need to reinstall? Is there a solution? I'm running version 7.2-5 for what that's worth.

Code:
Jun 24 16:08:48 thibworldpx pve-firewall[1330]: server closing
Jun 24 16:08:48 thibworldpx pve-firewall[1330]: clear firewall rules
Jun 24 16:08:48 thibworldpx pve-firewall[1330]: server stopped
Jun 24 16:08:48 thibworldpx pvestatd[1329]: received signal TERM
Jun 24 16:08:48 thibworldpx pvestatd[1329]: server closing
Jun 24 16:08:48 thibworldpx pvestatd[1329]: server stopped
Jun 24 16:08:48 thibworldpx pve-ha-lrm[1376]: received signal TERM
Jun 24 16:08:48 thibworldpx pve-ha-lrm[1376]: got shutdown request with shutdown policy 'conditional'
Jun 24 16:08:48 thibworldpx pve-ha-lrm[1376]: reboot LRM, stop and freeze all services
Jun 24 16:08:48 thibworldpx pve-ha-lrm[1376]: server stopped
Jun 24 16:08:49 thibworldpx systemd[1]: spiceproxy.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped PVE SPICE Proxy Server.
Jun 24 16:08:49 thibworldpx systemd[1]: spiceproxy.service: Consumed 1.297s CPU time.
Jun 24 16:08:49 thibworldpx systemd[1]: pve-firewall.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped Proxmox VE firewall.
Jun 24 16:08:49 thibworldpx systemd[1]: pve-firewall.service: Consumed 36.639s CPU time.
Jun 24 16:08:49 thibworldpx pvefw-logger[597]: received terminate request (signal)
Jun 24 16:08:49 thibworldpx systemd[1]: Stopping Proxmox VE firewall logger...
Jun 24 16:08:49 thibworldpx pvefw-logger[597]: stopping pvefw logger
Jun 24 16:08:49 thibworldpx systemd[1]: pvestatd.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped PVE Status Daemon.
Jun 24 16:08:49 thibworldpx systemd[1]: pvestatd.service: Consumed 1min 9.428s CPU time.
Jun 24 16:08:49 thibworldpx systemd[1]: pve-ha-lrm.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped PVE Local HA Resource Manager Daemon.
Jun 24 16:08:49 thibworldpx systemd[1]: pve-ha-lrm.service: Consumed 2.582s CPU time.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopping LXC Container Initialization and Autoboot Code...
Jun 24 16:08:49 thibworldpx systemd[1]: Stopping PVE Cluster HA Resource Manager Daemon...
Jun 24 16:08:49 thibworldpx systemd[1]: Stopping PVE API Proxy Server...
Jun 24 16:08:49 thibworldpx systemd[1]: Stopping PVE Qemu Event Daemon...
Jun 24 16:08:49 thibworldpx systemd[1]: qmeventd.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped PVE Qemu Event Daemon.
Jun 24 16:08:49 thibworldpx systemd[1]: qmeventd.service: Consumed 4.396s CPU time.
Jun 24 16:08:49 thibworldpx systemd[1]: lxc.service: Control process exited, code=exited, status=1/FAILURE
Jun 24 16:08:49 thibworldpx systemd[1]: lxc.service: Failed with result 'exit-code'.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped LXC Container Initialization and Autoboot Code.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopping LXC network bridge setup...
Jun 24 16:08:49 thibworldpx lxcfs[610]: Running destructor lxcfs_exit
Jun 24 16:08:49 thibworldpx systemd[1]: Stopping FUSE filesystem for LXC...
Jun 24 16:08:49 thibworldpx systemd[1]: var-lib-lxcfs.mount: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Unmounted /var/lib/lxcfs.
Jun 24 16:08:49 thibworldpx systemd[1]: lxc-net.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped LXC network bridge setup.
Jun 24 16:08:49 thibworldpx fusermount[18013]: /bin/fusermount: failed to unmount /var/lib/lxcfs: Invalid argument
Jun 24 16:08:49 thibworldpx systemd[1]: lxcfs.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped FUSE filesystem for LXC.
Jun 24 16:08:49 thibworldpx systemd[1]: pvefw-logger.service: Succeeded.
Jun 24 16:08:49 thibworldpx systemd[1]: Stopped Proxmox VE firewall logger.
Jun 24 16:08:50 thibworldpx pve-ha-crm[1365]: received signal TERM
Jun 24 16:08:50 thibworldpx pve-ha-crm[1365]: server received shutdown request
Jun 24 16:08:50 thibworldpx pve-ha-crm[1365]: server stopped
Jun 24 16:08:50 thibworldpx pveproxy[1366]: received signal TERM
Jun 24 16:08:50 thibworldpx pveproxy[1366]: server closing
Jun 24 16:08:50 thibworldpx pveproxy[15353]: worker exit
Jun 24 16:08:50 thibworldpx pveproxy[13860]: worker exit
Jun 24 16:08:50 thibworldpx pveproxy[1366]: worker 15353 finished
Jun 24 16:08:50 thibworldpx pveproxy[1366]: worker 13988 finished
Jun 24 16:08:50 thibworldpx pveproxy[1366]: worker 13860 finished
Jun 24 16:08:50 thibworldpx pveproxy[1366]: server stopped
Jun 24 16:08:51 thibworldpx systemd[1]: pve-ha-crm.service: Succeeded.
Jun 24 16:08:51 thibworldpx systemd[1]: Stopped PVE Cluster HA Resource Manager Daemon.
Jun 24 16:08:51 thibworldpx systemd[1]: pve-ha-crm.service: Consumed 2.003s CPU time.


hangs here and needs a hard reset (hold power button down). A simple press of power button, which used to do a shutdown, no longer works.


Jun 24 17:01:05 thibworldpx kernel: [    0.000000] Linux version 5.15.35-3-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) ()
Jun 24 17:01:05 thibworldpx kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.35-3-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=simplefb:off video=vesafb:off video=efifb:off video=vesa:off disable_vga=1 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 modprobe.blacklist=radeon,nouveau,nvidia,nvidiafb,nvidia-gpu,snd_hda_intel,snd_hda_codec_hdmi,i915
Jun 24 17:01:05 thibworldpx kernel: [    0.000000] KERNEL supported cpus:
Jun 24 17:01:05 thibworldpx kernel: [    0.000000]   Intel GenuineIntel
Jun 24 17:01:05 thibworldpx kernel: [    0.000000]   AMD AuthenticAMD
Jun 24 17:01:05 thibworldpx kernel: [    0.000000]   Hygon HygonGenuine
Jun 24 17:01:05 thibworldpx kernel: [    0.000000]   Centaur CentaurHauls
Jun 24 17:01:05 thibworldpx kernel: [    0.000000]   zhaoxin   Shanghai
 
I built another proxmox from scratch and restored the VM's and reboot/shutdowns were working fine for past week or more then today the hanging on reboot started again. Needs hard reset - a soft reset (push computer button once), which worked before if needed doesn't work now

Below are the last few syslog lines. Any clues as to what's causing the hanging?

Code:
Jul  3 11:15:03 thibworldpx2 pve-ha-lrm[1482]: server stopped
Jul  3 11:15:04 thibworldpx2 systemd[1]: spiceproxy.service: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped PVE SPICE Proxy Server.
Jul  3 11:15:04 thibworldpx2 systemd[1]: pve-firewall.service: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped Proxmox VE firewall.
Jul  3 11:15:04 thibworldpx2 systemd[1]: pve-firewall.service: Consumed 22.844s CPU time.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopping Proxmox VE firewall logger...
Jul  3 11:15:04 thibworldpx2 pvefw-logger[681]: received terminate request (signal)
Jul  3 11:15:04 thibworldpx2 pvefw-logger[681]: stopping pvefw logger
Jul  3 11:15:04 thibworldpx2 systemd[1]: pve-ha-lrm.service: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped PVE Local HA Resource Manager Daemon.
Jul  3 11:15:04 thibworldpx2 systemd[1]: pve-ha-lrm.service: Consumed 1.761s CPU time.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopping LXC Container Initialization and Autoboot Code...
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopping PVE Cluster HA Resource Manager Daemon...
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopping PVE API Proxy Server...
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopping PVE Qemu Event Daemon...
Jul  3 11:15:04 thibworldpx2 systemd[1]: qmeventd.service: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped PVE Qemu Event Daemon.
Jul  3 11:15:04 thibworldpx2 systemd[1]: qmeventd.service: Consumed 1.446s CPU time.
Jul  3 11:15:04 thibworldpx2 systemd[1]: lxc.service: Control process exited, code=exited, status=1/FAILURE
Jul  3 11:15:04 thibworldpx2 systemd[1]: lxc.service: Failed with result 'exit-code'.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped LXC Container Initialization and Autoboot Code.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopping LXC network bridge setup...
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopping FUSE filesystem for LXC...
Jul  3 11:15:04 thibworldpx2 systemd[1]: var-lib-lxcfs.mount: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Unmounted /var/lib/lxcfs.
Jul  3 11:15:04 thibworldpx2 systemd[1]: lxc-net.service: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped LXC network bridge setup.
Jul  3 11:15:04 thibworldpx2 lxcfs[685]: Running destructor lxcfs_exit
Jul  3 11:15:04 thibworldpx2 fusermount[21912]: /bin/fusermount: failed to unmount /var/lib/lxcfs: Invalid argument
Jul  3 11:15:04 thibworldpx2 systemd[1]: lxcfs.service: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped FUSE filesystem for LXC.
Jul  3 11:15:04 thibworldpx2 systemd[1]: pvefw-logger.service: Succeeded.
Jul  3 11:15:04 thibworldpx2 systemd[1]: Stopped Proxmox VE firewall logger.
Jul  3 11:15:04 thibworldpx2 systemd[1]: pvefw-logger.service: Consumed 1.091s CPU time.
Jul  3 11:15:04 thibworldpx2 pve-ha-crm[1472]: received signal TERM
Jul  3 11:15:04 thibworldpx2 pve-ha-crm[1472]: server received shutdown request
Jul  3 11:15:04 thibworldpx2 pve-ha-crm[1472]: server stopped
Jul  3 11:15:04 thibworldpx2 pveproxy[1473]: received signal TERM
Jul  3 11:15:04 thibworldpx2 pveproxy[1473]: server closing
Jul  3 11:15:04 thibworldpx2 pveproxy[21433]: worker exit
Jul  3 11:15:04 thibworldpx2 pveproxy[1474]: worker exit
Jul  3 11:15:04 thibworldpx2 pveproxy[1473]: worker 1474 finished
Jul  3 11:15:04 thibworldpx2 pveproxy[1473]: worker 1475 finished
Jul  3 11:15:04 thibworldpx2 pveproxy[1473]: worker 21433 finished
Jul  3 11:15:04 thibworldpx2 pveproxy[1473]: server stopped
Jul  3 11:15:05 thibworldpx2 systemd[1]: pve-ha-crm.service: Succeeded.
 
Some BIOS and kernel versions ago, my system would always freeze on reboot (with shutdown working fine). I used the reboot= kernel parameter as a work-around. For me (AMD X470 Zen+) reboot=efi worked but you might have to try each of the possible types. Or maybe your problem is different from mine and this won't help at all.
 
Some BIOS and kernel versions ago, my system would always freeze on reboot (with shutdown working fine). I used the reboot= kernel parameter as a work-around. For me (AMD X470 Zen+) reboot=efi worked but you might have to try each of the possible types. Or maybe your problem is different from mine and this won't help at all.
hmmm. interesting. the thing is it's been working fine through a couple dozen+ reboots I've done as I configure and test things. why would it stop working all of a sudden? I haven't been making changes to proxmox for last couple of days as I focus on configuring UPS software in Windows VM the past couple of days (and doing a lot of successful shutdown/restart tests until now)

Where do I put "reboot=efi" (or whatever option I want to try)? Is it when I call "shutdown -r now"? if so, where in that command syntax?
 
Maybe it's not the same issue at all, because it is not as consistent. Kernel parameters go in /etc/default/grub or /etc/kernel/cmdline, depending on your bootloader. You need to run a command after making changes, but it depends on which bootloader. Here is a nice example if your are using GRUB. You can check the current kernel parameters, and whether your change was applied, with cat /proc/cmdline.
 
I tried different kernels, tried reboot=efi and none of these helped. I solved the problem by adding to /etc/default/grub the parameter nomodeset to the line GRUB_CMDLINE_LINUX_DEFAULT
 
  • Like
Reactions: rhe8502
Hi everyone,
I am experiencing exactly the same problem as jaytee129 (see screenshot).

I would like to try unholy's solution, but I'm not an expert and I'm worried that the server will no longer boot.

Perhaps someone from the Proxmox team can say something about the problem?

Thank you and best regards.
Stefan
 

Attachments

  • hangs-on-reboot.png
    hangs-on-reboot.png
    40.4 KB · Views: 24
I'm not 100% sure but I think this may have been solved after I disabled C-States in BIOS.
I just ran into this issue yesterday on Proxmox 8.2.2 and it's fairly annoying considering the server is in another country, any chance you could confirm if this indeed solved the issue?
 
Last edited:
so for sure disabling C-States fixed crashing issues for me and others (on 7.x). Whether it was this one or another one is what I can't be sure about. Good luck.
 
How did it go?

So it appears the issue either went away on its own or it took a bit to set in, but I added this to kernel cmdline prior to noticing the server stop having issues on boot.
Bash:
processor.max_cstate=0
It's also possible that kernel updates resolved it. I'm probably going to just not update kernel for a while until after updates are out for a bit to prevent any more problems from surfacing.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!