I first noticed this issue with Proxmox 5.4 and suspect it has everything to do with Systemd integration, and very little with Proxmox itself, but it bites me in the backside every so often so would like to bring it up and see if there's something I should be doing differently or a fix for it, or what.
When I have kernel updates to apply on my Proxmox 5.4 hosts, I set an off-hours at job to run a script which calls 'qm shutdown' on all running VMs, unmounts all iSCSI volumes (my VMs run from SSD remote storage), and then issues a reboot command for the Proxmox hypervisor itself. When Proxmox gets back from the reboot, the volumes mount automatically, the VMs start up, and life goes on. Years ago, we used to simply issue a reboot from an at job (much simpler) but VMs would often boot with corrupted filesystems so we surmised this was the wrong way to go about it.
The problem I'm having is there are times when the reboot command fails completely. I've never seen anything like it. Maybe twice in 2019. Not a huge percentage of the time, but it's baffling and a real pain to deal with. I'll remote in around 6am to take a quick look around and one, or several, Proxmox hosts will just be sitting there. The automated reboot command didn't work. I'll try to reboot by hand withreboot, init 6, shutdown -tr now, systemctl reboot .. nothing works and no reboot happens. systemctl will sometimes return a message that it's scheduling a reboot for a minute or so into the future but yeah, it never happens. I never find any related errors in syslog or dmesg. It happened again this past weekend and I'm ready to look into ipmi rebooting or something lower level or some kind of remote power relay. Am I jsut doing this wrong? Any suggestions?
Thanks
When I have kernel updates to apply on my Proxmox 5.4 hosts, I set an off-hours at job to run a script which calls 'qm shutdown' on all running VMs, unmounts all iSCSI volumes (my VMs run from SSD remote storage), and then issues a reboot command for the Proxmox hypervisor itself. When Proxmox gets back from the reboot, the volumes mount automatically, the VMs start up, and life goes on. Years ago, we used to simply issue a reboot from an at job (much simpler) but VMs would often boot with corrupted filesystems so we surmised this was the wrong way to go about it.
The problem I'm having is there are times when the reboot command fails completely. I've never seen anything like it. Maybe twice in 2019. Not a huge percentage of the time, but it's baffling and a real pain to deal with. I'll remote in around 6am to take a quick look around and one, or several, Proxmox hosts will just be sitting there. The automated reboot command didn't work. I'll try to reboot by hand withreboot, init 6, shutdown -tr now, systemctl reboot .. nothing works and no reboot happens. systemctl will sometimes return a message that it's scheduling a reboot for a minute or so into the future but yeah, it never happens. I never find any related errors in syslog or dmesg. It happened again this past weekend and I'm ready to look into ipmi rebooting or something lower level or some kind of remote power relay. Am I jsut doing this wrong? Any suggestions?
Thanks