Hi.
I am implementing a Proxmox HA cluster of 3 nodes.
During testing, I encountered the problem of shutting down or rebooting the nodes, namely the fact that the guest machines do not shut down normally and the server waits for them to terminate, after which they are forcibly terminated by timeout (although the virtual machines themselves shut down correctly)..
The problem turned out to be that the qmeventd service exits before pve-guests and pve-ha-lrm
By adding qmeventd dependencies to the specified services, we managed to get rid of errors leading to problems when rebooting or shutting down nodes.
Perhaps my solution is not the most correct one, but it solves the problem and will be useful to someone, and the developers will also fix this error in the future.
I am implementing a Proxmox HA cluster of 3 nodes.
Code:
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15 / 48bd51b6)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11 + dfsg1-2.1 + b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1 + pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4 ~ pve6 + 1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-19
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2
During testing, I encountered the problem of shutting down or rebooting the nodes, namely the fact that the guest machines do not shut down normally and the server waits for them to terminate, after which they are forcibly terminated by timeout (although the virtual machines themselves shut down correctly)..
Code:
Nov 12 00:41:42 pve3 pve-guests [14365]: <root @ pam> starting task UPID: pve3: 0000381F: 0004F226: 5FAC5A96: stopall :: root @ pam:
Nov 12 00:41:42 pve3 pvesh [14365]: Stopping VM 107 (timeout = 180 seconds)
Nov 12 00:41:42 pve3 pve-guests [14367]: <root @ pam> starting task UPID: pve3: 00003820: 0004F22B: 5FAC5A96: qmshutdown: 107: root @ pam:
Nov 12 00:41:42 pve3 pve-guests [14368]: shutdown VM 107: UPID: pve3: 00003820: 0004F22B: 5FAC5A96: qmshutdown: 107: root @ pam:
Nov 12 00:41:42 pve3 pvesh [14365]: Stopping VM 102 (timeout = 180 seconds)
Nov 12 00:41:42 pve3 pve-guests [14367]: <root @ pam> starting task UPID: pve3: 00003821: 0004F22D: 5FAC5A96: qmshutdown: 102: root @ pam:
Nov 12 00:41:42 pve3 pve-guests [14369]: shutdown VM 102: UPID: pve3: 00003821: 0004F22D: 5FAC5A96: qmshutdown: 102: root @ pam:
Nov 12 00:41:46 pve3 QEMU [36643]: kvm: Unable to connect character device qmp-event: Failed to connect socket /var/run/qmeventd.sock: Connection refused
Nov 12 00:41:46 pve3 QEMU [2865]: kvm: Unable to connect character device qmp-event: Failed to connect socket /var/run/qmeventd.sock: Connection refused
Nov 12 00:44:42 pve3 pve-guests[14368]: VM 107 qmp command failed - VM 107 qmp command 'guest-shutdown' failed - got timeout
Nov 12 00:44:42 pve3 QEMU[36643]: kvm: terminating on signal 15 from pid 14368 (task UPID:pve3:00003820:0004F22B:5FAC5A96:qmshutdown:107:root@pam:)
Nov 12 00:44:42 pve3 pve-guests[14369]: VM 102 qmp command failed - VM 102 qmp command 'guest-shutdown' failed - got timeout
Nov 12 00:44:42 pve3 QEMU[2865]: kvm: terminating on signal 15 from pid 14369 (task UPID:pve3:00003821:0004F22D:5FAC5A96:qmshutdown:102:root@pam:)
The problem turned out to be that the qmeventd service exits before pve-guests and pve-ha-lrm
By adding qmeventd dependencies to the specified services, we managed to get rid of errors leading to problems when rebooting or shutting down nodes.
Code:
--- old/pve-ha-lrm.service 2020-11-12 10:10:57.535673153 +0300
+++ new/pve-ha-lrm.service 2020-11-12 02:23:58.611344867 +0300
@@ -4,6 +4,7 @@
Wants=pve-cluster.service
Wants=watchdog-mux.service
Wants=pvedaemon.service
+Wants=qmeventd.service
Wants=pve-ha-crm.service
Wants=lxc.service
Wants=pve-storage.target
@@ -14,6 +15,7 @@
After=pve-storage.target
After=pvedaemon.service
After=pveproxy.service
+After=qmeventd.service
After=ssh.service
After=syslog.service
After=watchdog-mux.service
Code:
--- old/pve-guests.service 2020-11-12 10:10:44.975752106 +0300
+++ new/pve-guests.service 2020-11-12 01:01:50.887244327 +0300
@@ -3,11 +3,13 @@
ConditionPathExists=/usr/bin/pvesh
RefuseManualStart=true
RefuseManualStop=true
+Wants=qmeventd.service
Wants=pvestatd.service
Wants=pveproxy.service
Wants=spiceproxy.service
Wants=pve-firewall.service
Wants=lxc.service
+After=qmeventd.service
After=pveproxy.service
After=pvestatd.service
After=spiceproxy.service
Perhaps my solution is not the most correct one, but it solves the problem and will be useful to someone, and the developers will also fix this error in the future.