Critical: qemu-server / qmp issues - VMs stop working with: qmp command [...] failed

optymale · Mar 22, 2021

When VMs are backed up by PBS some of them often end up getting qmp errors.

All VMs with qmp errors stop working sooner or later. Graceful reboot/shutdown and console is not working anymore. The only solution is hard power off, boot & migrate.

Log:

Code:

Mar 22 00:53:08 2670-010 pvedaemon[3114999]: VM 170 qmp command failed - VM 170 qmp command 'query-machines' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:15 2670-010 pvestatd[2203]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:16 2670-010 pvedaemon[3115109]: VM 170 qmp command failed - VM 170 qmp command 'quit' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:18 2670-010 pvedaemon[2972890]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:25 2670-010 pvestatd[2203]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries

This is extremely critical. Similar problems did occur on previous versions, always mentioned being fixed, see here:

https://forum.proxmox.com/threads/w...vm-100-socket-timeout-after-31-retries.20350/

https://forum.proxmox.com/threads/error-vm-103-qmp-command-query-backup-failed-got-timeout.76554/

https://forum.proxmox.com/threads/certain-vms-from-a-cluster-cannot-be-backed-up-and-managed.57016/

A real, long term bugfix is highly necessary, as VMs keep on crashing randomly at night and can't be migrated by HA because of the same problem.

There can't be provided a working virtualization Cluster with this setup with Proxmox currently. This problem occurs in different autonomous clusters sporadically and unpredictably.

Bash:

root@pvenode1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-1
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1

Information to cluster/node:

3-Node-Cluster with CEPH and exclusively qemu/KVM VMs all on CEPH.

Cluster backed up daily at night by local Proxmox Backup and additionally by Proxmox Backup Server.

Backups don't interfere with each other.

Most VMs with enabled and installed qemu guest agent.

tom · Mar 22, 2021

optymale said:
There can't be provided a working virtualization Cluster with this setup with Proxmox currently

Just run the recommended packages from the enterprise repository and you are not affected from this issue.

But yes, the issue is found and you see a fix soon.

optymale · Mar 22, 2021

tom said:
Just run the recommended packages from the enterprise repository and you are not affected from this issue.

But yes, the issue is found and you see a fix soon.

Okay, thanks for your quick reply.

optymale · Mar 22, 2021

tom said:
Just run the recommended packages from the enterprise repository and you are not affected from this issue.

But yes, the issue is found and you see a fix soon.

@tom
If I buy a subscription now and use the enterprise repository right away, will the bug be solved immediately after updating? Or do I have to wait for the next minor/major update?

tom · Mar 22, 2021

apt will not downgrade packages, so you have to downgrade manually.

optymale · Mar 24, 2021

Does the current update pve-qemu-kvm 5.2.0-4 fix this bug?

tom · Mar 24, 2021

optymale said:
Does the current update pve-qemu-kvm 5.2.0-4 fix this bug?

See http://download.proxmox.com/debian/...nary-amd64/pve-qemu-kvm-dbg_5.2.0-4.changelog

hp_inkjet · Mar 24, 2021

Hi tom,
I cannot see the changelog, the link returns 404

MikeAndreev · May 31, 2021

Same problem with 6.4:

May 31 09:31:13 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries
May 31 09:31:16 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - got timeout
May 31 09:31:19 pm-cal-56-02 pvestatd[3589]: status update time (13.761 seconds)
May 31 09:31:25 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - unable to connect to VM 138 qmp socket - timeout after 31 retries
May 31 09:31:28 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries
May 31 09:31:33 pm-cal-56-02 pvestatd[3589]: got timeout
May 31 09:31:33 pm-cal-56-02 pvestatd[3589]: status update time (14.259 seconds)
May 31 09:31:42 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - got timeout
May 31 09:31:45 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries
May 31 09:31:45 pm-cal-56-02 pvestatd[3589]: status update time (12.259 seconds)
May 31 09:31:54 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - unable to connect to VM 138 qmp socket - timeout after 31 retries
May 31 09:31:57 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries
May 31 09:32:00 pm-cal-56-02 systemd[1]: Starting Proxmox VE replication runner...
May 31 09:32:01 pm-cal-56-02 systemd[1]: pvesr.service: Succeeded.
May 31 09:32:01 pm-cal-56-02 systemd[1]: Started Proxmox VE replication runner.
May 31 09:32:06 pm-cal-56-02 pvestatd[3589]: status update time (20.287 seconds)
May 31 09:32:12 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries

root@pm-cal-56-02:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.4-4 (running version: 6.4-4/337d6701)
pve-kernel-5.4: 6.4-1
pve-kernel-helper: 6.4-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.20-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-2
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-1
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-3
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-1
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

ruysilva · Jun 21, 2021

Same here on 6.4, we can't move vm between nodes or from local to shared storage...

Jun 21 14:17:54 cloud09 pvestatd[2481]: VM 303 qmp command failed - VM 303 qmp command 'query-proxmox-support' failed - got timeout
Jun 21 14:17:54 cloud09 pvedaemon[28078]: VM 303 qmp command failed - VM 303 qmp command 'query-block-jobs' failed - got wrong command id '2481:485164' (expected 28078:1251)
Jun 21 14:17:56 cloud09 pvestatd[2481]: status update time (7.549 seconds)
Jun 21 14:18:00 cloud09 systemd[1]: Starting Proxmox VE replication runner...
Jun 21 14:18:01 cloud09 systemd[1]: pvesr.service: Succeeded.
Jun 21 14:18:01 cloud09 systemd[1]: Started Proxmox VE replication runner.
Jun 21 14:18:16 cloud09 pmxcfs[2344]: [status] notice: received log
Jun 21 14:18:42 cloud09 pvedaemon[28078]: storage migration failed: block job (mirror) error: VM 303 qmp command 'query-block-jobs' failed - got wrong command id '2481:485164' (expected 28078:1251)
migration failed: block job (mirror) error: VM 303 qmp command 'query-block-jobs' failed - got wrong command id '2481:485164' (expected 28078:1251)

ruysilva · Jul 15, 2021

sujitg · Aug 18, 2022

ruysilva said:
up!

How?

Search

Search

Critical: qemu-server / qmp issues - VMs stop working with: qmp command [...] failed

optymale

Member

tom

Proxmox Staff Member

optymale

Member

optymale

Member

tom

Proxmox Staff Member

optymale

Member

tom

Proxmox Staff Member

hp_inkjet

New Member

MikeAndreev

Member

ruysilva

New Member

ruysilva

New Member

sujitg

Member

We value your privacy