qmp socket timeout / backup

Dadsun

Active Member
Apr 28, 2016
20
1
43
Europe
Good day Sirs,

a few days ago we upgraded our enterprise version von 6.2-6 to 6.2.15.
Then this started to happen after a backup is run. (I don't use proxmox-backup-server.)
Code:
Nov 23 08:03:23 larry-14 pvedaemon[11175]: <root@example.org> starting task UPID:larry-14:00006E95:059E4F49:5FBB5EBB:qmreset:100
0:root@example.org:
Nov 23 08:03:26 larry-14 pvedaemon[28309]: VM 100 qmp command failed - VM 100 qmp command 'system_reset' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Nov 23 08:03:26 larry-14 pvedaemon[28309]: VM 100 qmp command 'system_reset' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Nov 23 08:03:26 larry-14 pvedaemon[11175]: <root@example.org> end task UPID:larry-14:00006E95:059E4F49:5FBB5EBB:qmreset:100:root@example.org: VM 100 qmp command 'system_reset' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Nov 23 08:03:34 larry-14 pvedaemon[28511]: stop VM 100: UPID:larry-14:00006F5F:059E538B:5FBB5EC6:qmstop:100:root@example.org:
Nov 23 08:03:34 larry-14 pvedaemon[11175]: <root@example.org> starting task UPID:larry-14:00006F5F:059E538B:5FBB5EC6:qmstop:100:root@example.org:

I think this is the important part:
qmp command 'system_reset' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
This only happens when a backup is triggered.

I've seen several other threads here, but none of them seems to fit for this particular issue.
I've attached below the pveversion and what packages have been upgraded right before it went awkward.

Happens since I upgraded these packages:
Code:
Start-Date: 2020-11-12  10:10:27
Commandline: apt-get dist-upgrade -y
Install: libyaml-libyaml-perl:amd64 (0.76+repack-1, automatic)
Upgrade: proxmox-widget-toolkit:amd64 (2.3-6, 2.3-10), 
libpve-storage-perl:amd64 (6.2-8, 6.2-9),
pve-qemu-kvm:amd64 (5.1.0-2, 5.1.0-6),
proxmox-backup-client:amd64 (0.9.6-1, 1.0.1-1),
pve-manager:amd64 (6.2-12, 6.2-15), 
libpve-common-perl:amd64 (6.2-2, 6.2-4),
qemu-server:amd64 (6.2-14, 6.2-19), 
libproxmox-backup-qemu0:amd64 (0.7.0-1, 0.7.1-1)
End-Date: 2020-11-12  10:10:37

After the Upgrade I moved alle machines "around" in the cluster so that the new pve-qemu-kvm would be used.
I plowed through the bugzilla pages like:
https://bugzilla.proxmox.com/show_bug.cgi?id=3043 VM freeze after pbs update - qmp command query-backup failed - got timeout
pve-qemu-kvm:amd64 (5.1.0-2, 5.1.0-6),
proxmox-backup-client:amd64 (0.9.6-1, 1.0.1-1),

So my questions are:
Is this already in the actual package?
Is it even related?

Some threads suggested that discard= Option has something to do with this. But I see no coinsitence.
The Storage for the backup is connected on a Bond with dedicated 10GB Interfaces.

pveversion details below.
I'am open for suggestions.

Kind Regards
D.



This upgrade moved pveversion from 6.2-6 to 6.2-15
Code:
# pveversion 
pve-manager/6.2-15/48bd51b6 (running kernel: 5.4.65-1-pve)
-------------------------------------------------------------
# pveversion -v
pve-manager/6.2-15/48bd51b6 (running kernel: 5.4.65-1-pve)
root@larry-14:~# pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-4.15: 5.4-12
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.15.18-24-pve: 4.15.18-52
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-19
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2
 
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '1aad3f06-59d8-4786-95c0-26da57c45674'
INFO: resuming VM again
ERROR: VM 100 qmp command 'cont' failed - got timeout
INFO: aborting backup job
ERROR: VM 100 qmp command 'backup-cancel' failed - got timeout
ERROR: Backup of VM 100 failed - VM 100 qmp command 'cont' failed - got timeout
INFO: Failed at 2020-11-23 13:10:56
INFO: Backup job finished with errors
 
We still investigate this. For future reference:

When the Backup Clients runs and in right that moment the target storage is full, the VM sometimes stucks.
I'am still in the process of finding the exact point where this happens, before I open up an issue at bugzilla.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!