qmp command 'query-proxmox-support' failed

ozgurerdogan · Oct 29, 2022

One node logs this error and vms are not reachable:

VM 221 qmp command failed - VM 221 qmp command 'query-proxmox-support' failed - unable to connect to VM 221 qmp socket - timeout after 31 retries

Could this be vm is eating so much resource or something related to pve? We have all updates installed.

This causes downtime. So any suggestion is welcome

proxmox-ve: 7.2-1 (running kernel: 5.15.64-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.60-1-pve: 5.15.60-1
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.39-2-pve: 5.15.39-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
pve-zsync: 2.2.3
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

fiona · Oct 31, 2022

Hi,
please post a big enough part of /var/log/syslog around the time the issue occurs. Are there any specific operations running (e.g. backup/replication/...) before the issue occurs?

ozgurerdogan · Oct 31, 2022

No specific task. But one vm is doing high disk io. (12 TB Disk read in 8 days with nvme speed.)

I set disk type to Async IO: native (as per a thread in forum). Disk type is Hard Disk (scsi0) with VirtIO SCSI (Shall I try sata?)

You can see logs in attached files. There are some following Tained message but these logs are not there on all hangs..

INFO: task z_wr_int_3:3898 blocked for more than 120 seconds.
Tainted: P O 5.15.64-1-pve #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:z_wr_int_3 state stack: 0 pid: 3898 ppid: 2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x34e/0x1740
? cpumask_next_wrap+0x33/0x90
? select_task_rq_fair+0x18b/0x1af0
schedule+0x69/0x110
schedule_preempt_disabled+0xe/0x20
__mutex_lock.constprop.0+0x255/0x480
__mutex_lock_slowpath+0x13/0x20
mutex_lock+0x38/0x50
arc_buf_destroy+0x65/0x110 [zfs]
dbuf_destroy+0x31/0x500 [zfs]
? __cond_resched+0x1a/0x50
dbuf_evict_one+0x13c/0x1a0 [zfs]
dbuf_rele_and_unlock+0x727/0x7e0 [zfs]
? dsl_dataset_block_born+0x256/0x400 [zfs]
dbuf_write_done+0xeb/0x200 [zfs]
arc_write_done+0x8f/0x420 [zfs]
zio_done+0x40b/0x1290 [zfs]
zio_execute+0x95/0x160 [zfs]
taskq_thread+0x29f/0x4d0 [spl]
? wake_up_q+0x90/0x90
? zio_gang_tree_free+0x70/0x70 [zfs]
? taskq_thread_spawn+0x60/0x60 [spl]
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
INFO: task txg_sync:4039 blocked for more than 120 seconds.
Tainted: P O 5.15.64-1-pve #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:txg_sync state stack: 0 pid: 4039 ppid: 2 flags:0x00004000
Call Trace:
<TASK>

fiona · Oct 31, 2022

Yes, there are ZFS-related hung tasks in the kernel. Did it start happening after a specific kernel update? Is there enough RAM for ZFS https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_limit_memory_usage ? Maybe setting bandwidth limits for the VM's disk helps?

ozgurerdogan · Oct 31, 2022

Not sure about kernel update. But I can say I updated kernel about 2 weeks ago.
There are 15 GB free ram of 128 GB total. But there are some backup zfs snapshot from other nodes. They do not need ram when not in use right?
I will consider limiting disk. But I saw many similar threads in forum. So wanted to check if there is quick workaround.

Again those zfs hangs are not in logs on some other hangs...

deepcloud · Aug 15, 2024

I have a similar issue, - enough resources, 1.5TB of RAM, more than 500G Free RAM, all 15TB NVME Enterprise drives in CEPH, enough free space etc...

VM 1023 qmp command 'query-status' failed - got timeout

INFO: include disk 'scsi0' 'RBDNVME:vm-1023-disk-0' 400G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1023/2024-08-15T15:19:46Z'
ERROR: QMP command query-proxmox-support failed - VM 1023 qmp command 'query-proxmox-support' failed - got timeout
INFO: aborting backup job

any bright ideas on this

fiona · Sep 4, 2024

Hi,

deepcloud said:
I have a similar issue, - enough resources, 1.5TB of RAM, more than 500G Free RAM, all 15TB NVME Enterprise drives in CEPH, enough free space etc...

VM 1023 qmp command 'query-status' failed - got timeout

INFO: include disk 'scsi0' 'RBDNVME:vm-1023-disk-0' 400G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1023/2024-08-15T15:19:46Z'
ERROR: QMP command query-proxmox-support failed - VM 1023 qmp command 'query-proxmox-support' failed - got timeout
INFO: aborting backup job

any bright ideas on this

please share the output of pveversion -v and the VM configuration qm config 1023 as well as the excerpt of the system log/journal from around the time the issue happens. Is the VM still responsive after the issue occurs?

Mjolnir · Sep 5, 2024

Also having the same issue. In my case I'm using Veeam to back up a Windows VM in Proxmox. The VM is up and running and working fine, but the backup eventually fails with many errors in the log as seen below. The Windows VM is located on a ZFS datastore.

proxmox-kernel-6.8: 6.8.12-1

Sep 05 11:25:23 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:23 pve pvestatd[1525]: status update time (8.105 seconds)
Sep 05 11:25:28 pve pvedaemon[2215255]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: status update time (8.111 seconds)
Sep 05 11:25:43 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:43 pve pvestatd[1525]: status update time (8.126 seconds)
Sep 05 11:25:53 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:53 pve pvestatd[1525]: status update time (8.115 seconds)
Sep 05 11:25:56 pve pvedaemon[2198028]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:26:03 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries

fiona · Sep 6, 2024

Hi,

Mjolnir said:
Also having the same issue. In my case I'm using Veeam to back up a Windows VM in Proxmox. The VM is up and running and working fine, but the backup eventually fails with many errors in the log as seen below. The Windows VM is located on a ZFS datastore.

proxmox-kernel-6.8: 6.8.12-1

Sep 05 11:25:23 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:23 pve pvestatd[1525]: status update time (8.105 seconds)
Sep 05 11:25:28 pve pvedaemon[2215255]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: status update time (8.111 seconds)
Sep 05 11:25:43 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:43 pve pvestatd[1525]: status update time (8.126 seconds)
Sep 05 11:25:53 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:53 pve pvestatd[1525]: status update time (8.115 seconds)
Sep 05 11:25:56 pve pvedaemon[2198028]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:26:03 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries

these error messages are for pvestatd, not for the backup task. Please share the output of pveversion -v and the VM configuration qm config 109 as well as the full backup task log. Is the VM still responsive after the issue occurs?

Mjolnir · Sep 6, 2024

fiona said:
Hi,

these error messages are for pvestatd, not for the backup task. Please share the output of pveversion -v and the VM configuration qm config 109 as well as the full backup task log. Is the VM still responsive after the issue occurs?

Yes, the info you requested is attached. I only included the info about the backup to point out that the VM was likely undergoing some higher than normal IO. I tried to move VM 109 to a Non-ZFS datastore, and the backup still encountered the same errors, but it was able to complete the backup successfully.

fiona · Sep 9, 2024

The task log of the backup task is still missing. Again, the error messages you posted are not for the backup itself, but for pvestatd. If they occur during backup, you can also try setting a bandwidth limit or reducing the amount of workers (see the Advanced tab in the backup job configuration or use /etc/vzdump.conf for node-wide defaults). The latter helps in particular if the IO wait is high during backup.

Mjolnir · Sep 9, 2024

fiona said:
The task log of the backup task is still missing. Again, the error messages you posted are not for the backup itself, but for pvestatd. If they occur during backup, you can also try setting a bandwidth limit or reducing the amount of workers (see the Advanced tab in the backup job configuration or use /etc/vzdump.conf for node-wide defaults). The latter helps in particular if the IO wait is high during backup.

Task log of the backup task? You mean the logs from Veeam? This specific backup is set to bare minimal settings, I believe its 4 max workers (4 threads), just to see how it worked. This is the most bare bones setup that you can get, I would sure hope Proxmox can handle this? There is literally nothing on the VM but a default Windows installation. If it cannot, how would the Proxmox folks expect me to recommend this to my employers for use?

fiona · Sep 10, 2024

Oh sorry, I missed that you are using Veeam and was thinking about backup tasks in Proxmox VE. How does the load on the host system look like during backup (network load, IO/CPU pressure)? What kind of physical disks and CPU do you have? Many people are using Proxmox VE with Windows VMs, so it rather sounds like an issue specific to your setup.

Search

Search

qmp command 'query-proxmox-support' failed

ozgurerdogan

Renowned Member

fiona

Proxmox Staff Member

ozgurerdogan

Renowned Member

Attachments

fiona

Proxmox Staff Member

ozgurerdogan

Renowned Member

deepcloud

Active Member

fiona

Proxmox Staff Member

Mjolnir

New Member

fiona

Proxmox Staff Member

Mjolnir

New Member

Attachments

fiona

Proxmox Staff Member

Mjolnir

New Member

fiona

Proxmox Staff Member

We value your privacy