When VMs are backed up by PBS some of them often end up getting qmp errors.
All VMs with qmp errors stop working sooner or later. Graceful reboot/shutdown and console is not working anymore. The only solution is hard power off, boot & migrate.
Log:
This is extremely critical. Similar problems did occur on previous versions, always mentioned being fixed, see here:
https://forum.proxmox.com/threads/w...vm-100-socket-timeout-after-31-retries.20350/
https://forum.proxmox.com/threads/error-vm-103-qmp-command-query-backup-failed-got-timeout.76554/
https://forum.proxmox.com/threads/certain-vms-from-a-cluster-cannot-be-backed-up-and-managed.57016/
A real, long term bugfix is highly necessary, as VMs keep on crashing randomly at night and can't be migrated by HA because of the same problem.
There can't be provided a working virtualization Cluster with this setup with Proxmox currently. This problem occurs in different autonomous clusters sporadically and unpredictably.
Information to cluster/node:
3-Node-Cluster with CEPH and exclusively qemu/KVM VMs all on CEPH.
Cluster backed up daily at night by local Proxmox Backup and additionally by Proxmox Backup Server.
Backups don't interfere with each other.
Most VMs with enabled and installed qemu guest agent.
All VMs with qmp errors stop working sooner or later. Graceful reboot/shutdown and console is not working anymore. The only solution is hard power off, boot & migrate.
Log:
Code:
Mar 22 00:53:08 2670-010 pvedaemon[3114999]: VM 170 qmp command failed - VM 170 qmp command 'query-machines' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:15 2670-010 pvestatd[2203]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:16 2670-010 pvedaemon[3115109]: VM 170 qmp command failed - VM 170 qmp command 'quit' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:18 2670-010 pvedaemon[2972890]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
Mar 22 00:53:25 2670-010 pvestatd[2203]: VM 170 qmp command failed - VM 170 qmp command 'query-proxmox-support' failed - unable to connect to VM 170 qmp socket - timeout after 31 retries
This is extremely critical. Similar problems did occur on previous versions, always mentioned being fixed, see here:
https://forum.proxmox.com/threads/w...vm-100-socket-timeout-after-31-retries.20350/
https://forum.proxmox.com/threads/error-vm-103-qmp-command-query-backup-failed-got-timeout.76554/
https://forum.proxmox.com/threads/certain-vms-from-a-cluster-cannot-be-backed-up-and-managed.57016/
A real, long term bugfix is highly necessary, as VMs keep on crashing randomly at night and can't be migrated by HA because of the same problem.
There can't be provided a working virtualization Cluster with this setup with Proxmox currently. This problem occurs in different autonomous clusters sporadically and unpredictably.
Bash:
root@pvenode1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-1
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1
Information to cluster/node:
3-Node-Cluster with CEPH and exclusively qemu/KVM VMs all on CEPH.
Cluster backed up daily at night by local Proxmox Backup and additionally by Proxmox Backup Server.
Backups don't interfere with each other.
Most VMs with enabled and installed qemu guest agent.
Last edited: