VM ### qmp command failed - VM ### qmp command 'query-proxmox-support' failed

MikeAndreev · May 31, 2021

hi all!
Got this errors on 3 big(only big) VM in PVE 6.4 (128-256Gb RAM) with Ubuntu guest OS, storages: external CEPH cluster (RBD pools)

May 31 09:31:13 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries
May 31 09:31:16 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - got timeout
May 31 09:31:19 pm-cal-56-02 pvestatd[3589]: status update time (13.761 seconds)
May 31 09:31:25 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - unable to connect to VM 138 qmp socket - timeout after 31 retries
May 31 09:31:28 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries
May 31 09:31:33 pm-cal-56-02 pvestatd[3589]: got timeout
May 31 09:31:33 pm-cal-56-02 pvestatd[3589]: status update time (14.259 seconds)
May 31 09:31:42 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - got timeout
May 31 09:31:45 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries
May 31 09:31:45 pm-cal-56-02 pvestatd[3589]: status update time (12.259 seconds)
May 31 09:31:54 pm-cal-56-02 pvestatd[3589]: VM 138 qmp command failed - VM 138 qmp command 'query-proxmox-support' failed - unable to connect to VM 138 qmp socket - timeout after 31 retries
May 31 09:31:57 pm-cal-56-02 pvestatd[3589]: VM 134 qmp command failed - VM 134 qmp command 'query-proxmox-support' failed - unable to connect to VM 134 qmp socket - timeout after 31 retries

root@pm-cal-56-02:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.4-4 (running version: 6.4-4/337d6701)
pve-kernel-5.4: 6.4-1
pve-kernel-helper: 6.4-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.20-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-2
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-1
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-3
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-1
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

cat /etc/pve/qemu-server/134.conf

#Ubuntu Focal Fossa LTS
agent: 1
boot: order=scsi0
cores: 21
cpu: Cascadelake-Server-noTSX,flags=+spec-ctrl;+ssbd
ide2: none,media=cdrom
localtime: 0
memory: 131072
name: **********
net0: virtio=********,bridge=vmbr10,firewall=1,tag=226
numa: 0
ostype: l26
scsi0: pm-ce1-ssd-ec42:vm-134-disk-0,cache=writeback,size=932G
scsi1: pm-ce1-ec42:vm-134-disk-1,cache=writeback,size=1397G
scsi2: pm-ce1-ec42:vm-134-disk-2,cache=writeback,size=1863G
scsihw: virtio-scsi-pci
smbios1: uuid=c9f3d864-76a1-444f-b054-bbc239aff0db
sockets: 2
tablet: 0
vga: qxl
vmgenid: 1961b584-14f1-4b75-8651-008eaf663ca7

any ideas?

WBR
Mike

foxpalace · Oct 22, 2021

why did you fix it? i have today the same problem

flotho · Nov 1, 2021

Same thing here, any feedback would be appreciated?
Any ceph issue related ?

MikeAndreev · Nov 2, 2021

flotho, the problem was in "Max open files" on PM-host. We solve it with increasing DefaultLimitNOFILE upto 32768:

cat /etc/systemd/system.conf.d/open-files.conf

[Manager]
DefaultLimitNOFILE=32768:524288

the problem is discussed here: https://forum.proxmox.com/threads/open-files-issue-on-pve-node.69783/

mbx · Apr 11, 2023

@MikeAndreev Not sure if this really did the trick, but I had the same problem and since I did your suggested change, it didn't happen anymore. Thanks for sharing your suggestion.

spirit · Dec 7, 2023

Hi,

I have trigger this bug on my ceph cluster, with a vm with 10 disk and 100 osd cluster.

doing sequential read of all disk is openning connections to each osd (so 10x100 connection just for network + qemu internal file descriptions)

After reaching the limit, I have some random some block access timeout (as it was not possible to open connection)

The default max open file soft limit is 1024, and it's really too low

I'll try to see if we could increase the default value on pve8.

Search

Search

VM ### qmp command failed - VM ### qmp command 'query-proxmox-support' failed

MikeAndreev

Member

foxpalace

Renowned Member

flotho

Renowned Member

MikeAndreev

Member

mbx

New Member

spirit

Distinguished Member