Shutdown/reboot error nodes.

kbelov

Renowned Member
Jun 13, 2012
1
0
66
Hi.

I am implementing a Proxmox HA cluster of 3 nodes.
Code:
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15 / 48bd51b6)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11 + dfsg1-2.1 + b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1 + pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4 ~ pve6 + 1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-19
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2


During testing, I encountered the problem of shutting down or rebooting the nodes, namely the fact that the guest machines do not shut down normally and the server waits for them to terminate, after which they are forcibly terminated by timeout (although the virtual machines themselves shut down correctly)..


Code:
Nov 12 00:41:42 pve3 pve-guests [14365]: <root @ pam> starting task UPID: pve3: 0000381F: 0004F226: 5FAC5A96: stopall :: root @ pam:
Nov 12 00:41:42 pve3 pvesh [14365]: Stopping VM 107 (timeout = 180 seconds)
Nov 12 00:41:42 pve3 pve-guests [14367]: <root @ pam> starting task UPID: pve3: 00003820: 0004F22B: 5FAC5A96: qmshutdown: 107: root @ pam:
Nov 12 00:41:42 pve3 pve-guests [14368]: shutdown VM 107: UPID: pve3: 00003820: 0004F22B: 5FAC5A96: qmshutdown: 107: root @ pam:
Nov 12 00:41:42 pve3 pvesh [14365]: Stopping VM 102 (timeout = 180 seconds)
Nov 12 00:41:42 pve3 pve-guests [14367]: <root @ pam> starting task UPID: pve3: 00003821: 0004F22D: 5FAC5A96: qmshutdown: 102: root @ pam:
Nov 12 00:41:42 pve3 pve-guests [14369]: shutdown VM 102: UPID: pve3: 00003821: 0004F22D: 5FAC5A96: qmshutdown: 102: root @ pam:
Nov 12 00:41:46 pve3 QEMU [36643]: kvm: Unable to connect character device qmp-event: Failed to connect socket /var/run/qmeventd.sock: Connection refused
Nov 12 00:41:46 pve3 QEMU [2865]: kvm: Unable to connect character device qmp-event: Failed to connect socket /var/run/qmeventd.sock: Connection refused
Nov 12 00:44:42 pve3 pve-guests[14368]: VM 107 qmp command failed - VM 107 qmp command 'guest-shutdown' failed - got timeout
Nov 12 00:44:42 pve3 QEMU[36643]: kvm: terminating on signal 15 from pid 14368 (task UPID:pve3:00003820:0004F22B:5FAC5A96:qmshutdown:107:root@pam:)
Nov 12 00:44:42 pve3 pve-guests[14369]: VM 102 qmp command failed - VM 102 qmp command 'guest-shutdown' failed - got timeout
Nov 12 00:44:42 pve3 QEMU[2865]: kvm: terminating on signal 15 from pid 14369 (task UPID:pve3:00003821:0004F22D:5FAC5A96:qmshutdown:102:root@pam:)

The problem turned out to be that the qmeventd service exits before pve-guests and pve-ha-lrm
By adding qmeventd dependencies to the specified services, we managed to get rid of errors leading to problems when rebooting or shutting down nodes.

Code:
--- old/pve-ha-lrm.service    2020-11-12 10:10:57.535673153 +0300
+++ new/pve-ha-lrm.service    2020-11-12 02:23:58.611344867 +0300
@@ -4,6 +4,7 @@
 Wants=pve-cluster.service
 Wants=watchdog-mux.service
 Wants=pvedaemon.service
+Wants=qmeventd.service
 Wants=pve-ha-crm.service
 Wants=lxc.service
 Wants=pve-storage.target
@@ -14,6 +15,7 @@
 After=pve-storage.target
 After=pvedaemon.service
 After=pveproxy.service
+After=qmeventd.service
 After=ssh.service
 After=syslog.service
 After=watchdog-mux.service
Code:
--- old/pve-guests.service    2020-11-12 10:10:44.975752106 +0300
+++ new/pve-guests.service    2020-11-12 01:01:50.887244327 +0300
@@ -3,11 +3,13 @@
 ConditionPathExists=/usr/bin/pvesh
 RefuseManualStart=true
 RefuseManualStop=true
+Wants=qmeventd.service
 Wants=pvestatd.service
 Wants=pveproxy.service
 Wants=spiceproxy.service
 Wants=pve-firewall.service
 Wants=lxc.service
+After=qmeventd.service
 After=pveproxy.service
 After=pvestatd.service
 After=spiceproxy.service

Perhaps my solution is not the most correct one, but it solves the problem and will be useful to someone, and the developers will also fix this error in the future.
 
In my case the (standalone) host waited for 3 minutes even though all guests had shut down long before.
I actually used a drop-in snippet instead of editing the original service template.

Code:
root@pve:~# systemctl edit pve-guests
...
root@pve:~# cat /etc/systemd/system/pve-guests.service.d/override.conf
# stop qmeventd.service after pve-guests.service
# https://lists.proxmox.com/pipermail/pve-devel/2020-November/045897.html
[Unit]
After=qmeventd.service

Your report and fix solved the issue for me. Now the host shuts down immediately after the last VM. Thank you.

Best regards
Stefan
 
Last edited:
  • Like
Reactions: kbelov

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!