qmp command 'query-proxmox-support' failed

ozgurerdogan

Renowned Member
May 2, 2010
613
5
83
Bursa, Turkey, Turkey
One node logs this error and vms are not reachable:

VM 221 qmp command failed - VM 221 qmp command 'query-proxmox-support' failed - unable to connect to VM 221 qmp socket - timeout after 31 retries

Could this be vm is eating so much resource or something related to pve? We have all updates installed.

This causes downtime. So any suggestion is welcome :)

proxmox-ve: 7.2-1 (running kernel: 5.15.64-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.60-1-pve: 5.15.60-1
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.39-2-pve: 5.15.39-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
pve-zsync: 2.2.3
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
 
Last edited:
Hi,
please post a big enough part of /var/log/syslog around the time the issue occurs. Are there any specific operations running (e.g. backup/replication/...) before the issue occurs?
 
No specific task. But one vm is doing high disk io. (12 TB Disk read in 8 days with nvme speed.)

I set disk type to Async IO: native (as per a thread in forum). Disk type is Hard Disk (scsi0) with VirtIO SCSI (Shall I try sata?)

You can see logs in attached files. There are some following Tained message but these logs are not there on all hangs..

INFO: task z_wr_int_3:3898 blocked for more than 120 seconds.
Tainted: P O 5.15.64-1-pve #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:z_wr_int_3 state:D stack: 0 pid: 3898 ppid: 2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x34e/0x1740
? cpumask_next_wrap+0x33/0x90
? select_task_rq_fair+0x18b/0x1af0
schedule+0x69/0x110
schedule_preempt_disabled+0xe/0x20
__mutex_lock.constprop.0+0x255/0x480
__mutex_lock_slowpath+0x13/0x20
mutex_lock+0x38/0x50
arc_buf_destroy+0x65/0x110 [zfs]
dbuf_destroy+0x31/0x500 [zfs]
? __cond_resched+0x1a/0x50
dbuf_evict_one+0x13c/0x1a0 [zfs]
dbuf_rele_and_unlock+0x727/0x7e0 [zfs]
? dsl_dataset_block_born+0x256/0x400 [zfs]
dbuf_write_done+0xeb/0x200 [zfs]
arc_write_done+0x8f/0x420 [zfs]
zio_done+0x40b/0x1290 [zfs]
zio_execute+0x95/0x160 [zfs]
taskq_thread+0x29f/0x4d0 [spl]
? wake_up_q+0x90/0x90
? zio_gang_tree_free+0x70/0x70 [zfs]
? taskq_thread_spawn+0x60/0x60 [spl]
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
INFO: task txg_sync:4039 blocked for more than 120 seconds.
Tainted: P O 5.15.64-1-pve #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:txg_sync state:D stack: 0 pid: 4039 ppid: 2 flags:0x00004000
Call Trace:
<TASK>
 

Attachments

  • syslog.txt
    33.1 KB · Views: 19
Not sure about kernel update. But I can say I updated kernel about 2 weeks ago.
There are 15 GB free ram of 128 GB total. But there are some backup zfs snapshot from other nodes. They do not need ram when not in use right?
I will consider limiting disk. But I saw many similar threads in forum. So wanted to check if there is quick workaround.

Again those zfs hangs are not in logs on some other hangs...
 
I have a similar issue, - enough resources, 1.5TB of RAM, more than 500G Free RAM, all 15TB NVME Enterprise drives in CEPH, enough free space etc...

VM 1023 qmp command 'query-status' failed - got timeout

INFO: include disk 'scsi0' 'RBDNVME:vm-1023-disk-0' 400G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1023/2024-08-15T15:19:46Z'
ERROR: QMP command query-proxmox-support failed - VM 1023 qmp command 'query-proxmox-support' failed - got timeout
INFO: aborting backup job


any bright ideas on this
 
Hi,
I have a similar issue, - enough resources, 1.5TB of RAM, more than 500G Free RAM, all 15TB NVME Enterprise drives in CEPH, enough free space etc...

VM 1023 qmp command 'query-status' failed - got timeout

INFO: include disk 'scsi0' 'RBDNVME:vm-1023-disk-0' 400G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1023/2024-08-15T15:19:46Z'
ERROR: QMP command query-proxmox-support failed - VM 1023 qmp command 'query-proxmox-support' failed - got timeout
INFO: aborting backup job


any bright ideas on this
please share the output of pveversion -v and the VM configuration qm config 1023 as well as the excerpt of the system log/journal from around the time the issue happens. Is the VM still responsive after the issue occurs?
 
Also having the same issue. In my case I'm using Veeam to back up a Windows VM in Proxmox. The VM is up and running and working fine, but the backup eventually fails with many errors in the log as seen below. The Windows VM is located on a ZFS datastore.

proxmox-kernel-6.8: 6.8.12-1


Sep 05 11:25:23 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:23 pve pvestatd[1525]: status update time (8.105 seconds)
Sep 05 11:25:28 pve pvedaemon[2215255]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: status update time (8.111 seconds)
Sep 05 11:25:43 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:43 pve pvestatd[1525]: status update time (8.126 seconds)
Sep 05 11:25:53 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:53 pve pvestatd[1525]: status update time (8.115 seconds)
Sep 05 11:25:56 pve pvedaemon[2198028]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:26:03 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
 
Last edited:
Hi,
Also having the same issue. In my case I'm using Veeam to back up a Windows VM in Proxmox. The VM is up and running and working fine, but the backup eventually fails with many errors in the log as seen below. The Windows VM is located on a ZFS datastore.

proxmox-kernel-6.8: 6.8.12-1


Sep 05 11:25:23 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:23 pve pvestatd[1525]: status update time (8.105 seconds)
Sep 05 11:25:28 pve pvedaemon[2215255]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:33 pve pvestatd[1525]: status update time (8.111 seconds)
Sep 05 11:25:43 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:43 pve pvestatd[1525]: status update time (8.126 seconds)
Sep 05 11:25:53 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:25:53 pve pvestatd[1525]: status update time (8.115 seconds)
Sep 05 11:25:56 pve pvedaemon[2198028]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
Sep 05 11:26:03 pve pvestatd[1525]: VM 109 qmp command failed - VM 109 qmp command 'query-proxmox-support' failed - unable to connect to VM 109 qmp socket - timeout after 51 retries
these error messages are for pvestatd, not for the backup task. Please share the output of pveversion -v and the VM configuration qm config 109 as well as the full backup task log. Is the VM still responsive after the issue occurs?
 
Hi,

these error messages are for pvestatd, not for the backup task. Please share the output of pveversion -v and the VM configuration qm config 109 as well as the full backup task log. Is the VM still responsive after the issue occurs?

Yes, the info you requested is attached. I only included the info about the backup to point out that the VM was likely undergoing some higher than normal IO. I tried to move VM 109 to a Non-ZFS datastore, and the backup still encountered the same errors, but it was able to complete the backup successfully.
 

Attachments

  • qmconfig.txt
    648 bytes · Views: 2
  • pveversion.txt
    1.7 KB · Views: 1
The task log of the backup task is still missing. Again, the error messages you posted are not for the backup itself, but for pvestatd. If they occur during backup, you can also try setting a bandwidth limit or reducing the amount of workers (see the Advanced tab in the backup job configuration or use /etc/vzdump.conf for node-wide defaults). The latter helps in particular if the IO wait is high during backup.
 
The task log of the backup task is still missing. Again, the error messages you posted are not for the backup itself, but for pvestatd. If they occur during backup, you can also try setting a bandwidth limit or reducing the amount of workers (see the Advanced tab in the backup job configuration or use /etc/vzdump.conf for node-wide defaults). The latter helps in particular if the IO wait is high during backup.

Task log of the backup task? You mean the logs from Veeam? This specific backup is set to bare minimal settings, I believe its 4 max workers (4 threads), just to see how it worked. This is the most bare bones setup that you can get, I would sure hope Proxmox can handle this? There is literally nothing on the VM but a default Windows installation. If it cannot, how would the Proxmox folks expect me to recommend this to my employers for use?
 
Oh sorry, I missed that you are using Veeam and was thinking about backup tasks in Proxmox VE. How does the load on the host system look like during backup (network load, IO/CPU pressure)? What kind of physical disks and CPU do you have? Many people are using Proxmox VE with Windows VMs, so it rather sounds like an issue specific to your setup.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!