[SOLVED] ERROR: VM 104 qmp command 'backup' failed - got timeout

Hi,
I have a Proxmox/Ceph cluster and a PBS virtual machine in other Proxmox system for backup all the Proxmox/Ceph cluster virtual machines.
Proxmox and PBS have been updated this past weekend to the latest versions.

I have programmed snapshot backups every night for all the Proxmox nodes virtual machines over the PBS.

Some virtual machines backup faults with the next errors:
Code:
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2021-02-02 03:02:42
INFO: status = running
INFO: VM Name: NAME
INFO: include disk 'scsi0' 'cephDATA01:vm-104-disk-0' 30G
INFO: include disk 'scsi1' 'cephVMs01:vm-104-disk-0' 50G
INFO: include disk 'scsi2' 'cephVMs01:vm-104-disk-1' 60G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/104/2021-02-02T02:02:42Z'
ERROR: VM 104 qmp command 'backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 104 qmp command 'backup-cancel' failed - got timeout
ERROR: Backup of VM 104 failed - VM 104 qmp command 'backup' failed - got timeout
INFO: Failed at 2021-02-02 03:18:34

Some of them only fault with the error:
ERROR: VM 104 qmp command 'backup' failed - got timeout

But other fault with these two errors:
ERROR: VM 104 qmp command 'backup' failed - got timeout
ERROR: VM 104 qmp command 'backup-cancel' failed - got timeout


When the fault has the two errors, the virtual machine freezes, it is not possible to access it and I have to hard stop for restore its functionality.

Moreover, in the PBS appears an endless spinner and it is not possible to remove it until the virtual machine hard stop.
Captura de pantalla 2021-02-02 a las 9.01.59.png

I think that this problem is consequence of high I/O load on the PBS server, but it shouldn't trigger the virtual machine freezing.
Why are these errors causing the freezing of the virtual machines?

Is it possible to increase the timeout for high I/O PBS servers?
How?

Thank you very much for your time and help.
 
Last edited:
Can you post the output of 'pveversion -v'? I know you mentioned you had updated your systems recently, but we rolled out a fix for a very similar issue with pve-qemu-kvm version 5.1.0-8 - note that the fix will only be applied to VMs that have been restarted at least once since the update to the latest version.

High IO load can very well be the cause for a failing backup, however, with the latest version that should not lead to a full VM freeze. Once we roll out QEMU 5.2 it shouldn't even lead to a temporary one anymore.
 
I have refreshed the pending update packages list of my Proxmox nodes and in the list doesn't appear the pve-qemu-kvm version 5.1.0-8 package you have just mentioned in your post.

Here is the output of the pveversion -v command:
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Ah sorry, it's not in the enterprise repositories yet. You can manually install it from community, or wait until it gets available.
 
I have refreshed the upgrade package list and it appears now. Great! :D

After the upgrade, I must stop and start virtual machines in order to force them the use of the new pve-qemu-kvm version, is it right?
 
After the upgrade, I must stop and start virtual machines in order to force them the use of the new pve-qemu-kvm version, is it right?
yes, alternatively a VM reboot (from the PVE interface, not from inside the VM!) has the same effect.
 
If I upgrade all the Proxmox cluster nodes and live migrate VMs between nodes, is it still necessary to reboot the VM for make sure that it is running with the new pve-qemu-kvm version?

That is, if I have a VM in cluster node1 running pve-qemu-kvm 5.1.0-7, I upgrade cluster node2 to pve-qemu-kvm 5.1.0-8 and live migrate VM from node1 to node2; in node2 it will run over pve-qemu-kvm 5.1.0-8. This way I avoid the VM service interruption.
Is it correct?
 
Great! I'll do it this way for avoid VM service interruption.

Only one thing more, how can I mark this issue as solved?

Thank you very much for your great product and support service.
 
Last edited:
Glad it helped! You can edit the first post and select the SOLVED prefix in the dropdown menu next to the title. I'll spare you the hassle and mark it myself for now, feel free to reopen the thread should the issue not be fixed by the update or arise again.
 
Sorry, on thing more?
Is it possible to know the version of pve-qemu-kvm with which is running a VM?

This way I can verify that all VMs are running with the last one.
 
No way to see the exact version currently, I'm afraid.

Here's a little one-liner that should at least show you which running VMs use a version of QEMU that is *not* the currently installed one:
Code:
qm list | tail -n+2 | awk '{print $6}' | grep -v '^0' | xargs -d'\n' -n1 -I{} sh -c "if readlink /proc/{}/exe | grep -q deleted; then echo PID {} is outdated; fi"
use at your own risk ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!