PVE7 / PBS2 - Backup Timeout (qmp command 'cont' failed - got timeout)

adoII

Active Member
Jan 28, 2010
166
14
38
I have changed
Code:
      } else {
            $timeout = 3; # default
to
      } else {
            $timeout = 8; # default
in line 134 of /usr/share/perl5/PVE/QMPClient.pm , restarted the pve daemons
Code:
for service in pvedaemon.service pveproxy.service pvestatd.service ;do
     echo "systemctl restart $service"
     systemctl restart $service
  done
and now the backups to proxmox backup server are working ....
 
Oct 19, 2021
4
0
1
45
I have changed
Code:
      } else {
            $timeout = 3; # default
to
      } else {
            $timeout = 8; # default
in line 134 of /usr/share/perl5/PVE/QMPClient.pm , restarted the pve daemons
Code:
for service in pvedaemon.service pveproxy.service pvestatd.service ;do
     echo "systemctl restart $service"
     systemctl restart $service
  done
and now the backups to proxmox backup server are working ....

Looks like this works for me as well. (I've set the timeout to 30 to be on the safe side.)
 

leen15

New Member
Oct 9, 2021
3
1
3
33
I can confirm that with @adoll suggestion it works for me as well (30s timeout).

Let's hope that some proxmox Engineer will see this topic...
 

henrikh1998

New Member
Jun 29, 2020
2
0
1
24
Not using PBS, but I can report that i had the same issue described here with NFS backup.
The fix from @adoII finally resolved my backup problems.

Maybe it should be a permanent change.. i can't think of drawback increasing the default timeout a bit
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,893
1,516
164
see the linked bug report, and the patch linked there ;)
 
  • Like
Reactions: nielsnl

adoII

Active Member
Jan 28, 2010
166
14
38
I applied the patch, now I get another error message.
Yes, the Backup Server at Hetzner is a little slower than my servers in my own datacenter. But it is not really slow.

Code:
()
INFO: starting new backup job: vzdump 104 --node zw-pm-1 --mode snapshot --storage backups21 --remove 0
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2021-11-02 10:06:22
INFO: status = running
INFO: VM Name: zwv05
INFO: include disk 'scsi0' 'local-btrfs:104/vm-104-disk-0.raw' 140G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/104/2021-11-02T09:06:22Z'
INFO: started backup task '30f2a01b-556b-4ec5-867c-ab0f37d1cc48'
INFO: resuming VM again
ERROR: VM 104 qmp command 'query-pbs-bitmap-info' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 104 failed - VM 104 qmp command 'query-pbs-bitmap-info' failed - got timeout
INFO: Failed at 2021-11-02 10:06:37
INFO: Backup job finished with errors
TASK ERROR: job errors
 

sztanpet

New Member
Oct 18, 2021
4
2
3
the problem still occures with the newest backup client/server, but now the error is

INFO: resuming VM again
ERROR: VM 103 qmp command 'query-pbs-bitmap-info' failed - got timeout

Code:
proxmox-backup-client-dbgsym/stable,stable,now 2.0.13-1 amd64 [installed]
proxmox-backup-client/stable,stable,now 2.0.13-1 amd64 [installed]
proxmox-backup-docs/stable,now 2.0.13-1 all [installed,automatic]
proxmox-backup-file-restore/stable,now 2.0.13-1 amd64 [installed,automatic]
proxmox-backup-restore-image/stable,now 0.3.1 amd64 [installed,automatic]
proxmox-backup-server-dbgsym/stable,now 2.0.13-1 amd64 [installed]
proxmox-backup-server/stable,now 2.0.13-1 amd64 [installed]
 

sztanpet

New Member
Oct 18, 2021
4
2
3
WIth the latest version (2.0.14-1) it now seems to be working for me (or at least for two days straight now).
EDIT: 4days and going
 
Last edited:
  • Like
Reactions: fabian

raistlinkell

New Member
Nov 21, 2020
7
2
3
52
Hello All
I upgraded my Proxmox cluster of 3 PVE nodes from v6.13 to v7.1.5, then the Proxmox Backup Server from v1.x to v2.0.14. I'm now seeing high numbers of VM's and CT's fail with their backups. Previous to the upgrades the number of backup failures was almost non-existant.

3 x PVE nodes; Promoxbox1, Proxmoxbox2, Proxmoxbox3. Running on the 3 nodes are 5 x VM's and 2 x CT's. Each PVE node hosts its own 6Tb ZFS Storage. There is also an NFS share which is used for non-critical and test VMs.

It would appear where any VM's & CT's have 1 or more storage disks on the NFS share, the backups are failing. The NFS share is an old Thecus NAS. NFS shares are defined on the cluster using the v6.13 default NFS type.

Code:
pveversion -v  (same for all 3 nodes)

proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.4: 6.4-7
pve-kernel-5.13.19-1-pve: 5.13.19-2
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3


copy of (truncated) logs from the 1 node below (I've added a little formatting to try and make it more readable)
Node Proxmoxbox3 hosts 1 VM only. This VM has storage on a NFS share

email subject: vzdump backup status (proxmoxbox3.xxxxxxxx.lan) : backup failed

Code:
vzdump --storage ProxBUP-14Tb --mailto systemadmin@xxxxxxxx.org --quiet 1 --pool TestVMs --mailnotification failure --mode snapshot

301: 2021-11-19 12:00:02 INFO: Starting Backup of VM 301 (qemu)
301: 2021-11-19 12:00:02 INFO: status = running
301: 2021-11-19 12:00:02 INFO: VM Name: UbuUPS
301: 2021-11-19 12:00:02 INFO: include disk 'scsi0' 'ThecusRAID5:301/vm-301-disk-0.raw' 12G
301: 2021-11-19 12:00:02 INFO: include disk 'scsi1' 'ThecusRAID5:301/vm-301-disk-1.raw' 12G
301: 2021-11-19 12:00:02 INFO: backup mode: snapshot
301: 2021-11-19 12:00:02 INFO: ionice priority: 7
301: 2021-11-19 12:00:02 INFO: creating Proxmox Backup Server archive 'vm/301/2021-11-19T04:00:02Z'
301: 2021-11-19 12:00:02 INFO: issuing guest-agent 'fs-freeze' command
301: 2021-11-19 12:00:04 INFO: issuing guest-agent 'fs-thaw' command
301: 2021-11-19 12:00:06 INFO: started backup task '088b0b0e-f952-4575-8978-44244530b3ab'
301: 2021-11-19 12:00:06 INFO: resuming VM again
301: 2021-11-19 12:00:13 ERROR: VM 301 qmp command 'query-pbs-bitmap-info' failed - got timeout
301: 2021-11-19 12:00:13 INFO: aborting backup job
301: 2021-11-19 12:00:21 INFO: resuming VM again
301: 2021-11-19 12:00:21 ERROR: Backup of VM 301 failed - VM 301 qmp command 'query-pbs-bitmap-info' failed - got timeout

All VM's and CT's with local ZFS storage all backed up successfully

If any other information is needed, I'll update ASAP!
 

raistlinkell

New Member
Nov 21, 2020
7
2
3
52
I've completed migrating all my VM and CT storage devices from the NFS share to the Local ZFS storage and all backups function without error.
 

adoII

Active Member
Jan 28, 2010
166
14
38
I still have the query-pbs-bitmap-info error even with my timeout modification and the newest 2.0.14 packages.
My Backup Server is also not so slow, it is a 4 mirror 8 disk zfs raid 10 and even has zfs special devices on nvme.
Also the server ist idle during backup time and not doing anything else.
Backup failing is the one Problem. The second Problem is that sometimes the vms crash during the start of the backup because the backup client halts the IO on the VM for too long and so the vms are crashing...
Any ideas what I can try ? Is the problem known and will someone work on it ?
And yes, backup to local storage is no problem, it is only pbs server that is too slow for what proxmox backup clients expects.
The backup output is:

Code:
INFO: starting new backup job: vzdump 104 --storage backups21 --mode snapshot --node zw-pm-1 --remove 0
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2021-11-22 10:56:11
INFO: status = running
INFO: VM Name: xxxxx
INFO: include disk 'scsi0' 'local-btrfs:104/vm-104-disk-0.raw' 140G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/104/2021-11-22T09:56:11Z'
INFO: started backup task '125bc1c8-b078-4f83-9e64-ac7c9e29b587'
INFO: resuming VM again
ERROR: VM 104 qmp command 'query-pbs-bitmap-info' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 104 failed - VM 104 qmp command 'query-pbs-bitmap-info' failed - got timeout
INFO: Failed at 2021-11-22 10:56:28
INFO: Backup job finished with errors
TASK ERROR: job errors
 

Taledo

Member
Nov 20, 2020
33
4
8
51
I'm on the latest version available from the repos for both my PBS & my proxmox nodes, and I'm still getting errors :

Code:
NFO: include disk 'scsi0' 'local:20314/vm-20314-disk-0.qcow2' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/20314/2021-11-24T03:03:34Z'
ERROR: VM 20314 qmp command 'backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 20314 qmp command 'backup-cancel' failed - unable to connect to VM 20314 qmp socket - timeout after 5980 retries
INFO: resuming VM again
ERROR: Backup of VM 20314 failed - VM 20314 qmp command 'cont' failed - unable to connect to VM 20314 qmp socket - timeout after 450 retries
INFO: Failed at 2021-11-24 04:16:24
INFO: Backup job finished with errors
TASK ERROR: job errors

The issue with that is that it froze the VM and I had to hard reboot it.

PVE version : pve-manager/7.1-5/6fe299a0 (running kernel: 5.13.19-1-pve)
PBS version : proxmox-backup-server 2.1.2-1 running version: 2.1.2

Both the PVEs & the PBS are running on dell servers with physical RAID. I am NOT running ZFS on raid, using EXT4 partitions.

I haven't encountered this issue on servers running ZFS directly, but there are other factors that could affect it (performance, networking...)
 

SDEc

New Member
Dec 2, 2021
1
0
1
52
Same issue here after upgrading from 6.3.x to 7.0.x (now running 7.1-6 and Backup Server 2.1-2), some backups are failing due to "timeout" issues

All Proxmox Server have FC NVMe Storage, Backup Server will write to NFS mount (QNAP)

Note: there was no issue at all while being on Proxmox 6.x with this setup

Code:
pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.11.22-7-pve)
pve-manager: 7.1-6 (running version: 7.1-6/4e61e21c)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-3
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Failure reported on Proxmox :
Code:
INFO: Starting Backup of VM 183 (qemu)
INFO: Backup started at 2021-12-02 03:00:27
INFO: status = running
INFO: VM Name: xxx-yyyy
INFO: include disk 'scsi0' 'datastore_pve01:vm-183-disk-0' 20G
INFO: include disk 'scsi1' 'datastore_pve01:vm-183-disk-1' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/183/2021-12-02T02:00:27Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 183 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 183 failed - VM 183 qmp command 'backup' failed - got timeout
INFO: Failed at 2021-12-02 03:04:02

Failure reported on PBS:
Code:
2021-12-02T03:00:31+01:00: starting new backup on datastore 'PVE1Backup': "vm/183/2021-12-02T02:00:27Z"
2021-12-02T03:00:31+01:00: download 'index.json.blob' from previous backup.
2021-12-02T03:03:29+01:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2021-12-02T03:03:29+01:00: download 'drive-scsi0.img.fidx' from previous backup.
2021-12-02T03:03:29+01:00: created new fixed index 1 ("vm/183/2021-12-02T02:00:27Z/drive-scsi0.img.fidx")
2021-12-02T03:03:43+01:00: register chunks in 'drive-scsi1.img.fidx' from previous backup.
2021-12-02T03:03:43+01:00: download 'drive-scsi1.img.fidx' from previous backup.
2021-12-02T03:03:50+01:00: created new fixed index 2 ("vm/183/2021-12-02T02:00:27Z/drive-scsi1.img.fidx")
2021-12-02T03:04:02+01:00: add blob "/mnt/nfs/pve1/vm/183/2021-12-02T02:00:27Z/qemu-server.conf.blob" (302 bytes, comp: 302)
2021-12-02T03:04:02+01:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-12-02T03:04:02+01:00: removing unfinished backup
2021-12-02T03:04:02+01:00: TASK ERROR: removing backup snapshot "/mnt/nfs/pve1/vm/183/2021-12-02T02:00:27Z" failed - Directory not empty (os error 39)

(what was it waiting for bettween 03:00:31 and 03:03:29 ? - I assume this is the "timeout" ~180 sec / on QNAP the above directory has been created and is empty)


from syslog
Code:
.
.
Dec  2 03:03:50 pvebackup proxmox-backup-proxy[672]: created new fixed index 2 ("vm/183/2021-12-02T02:00:27Z/drive-scsi1.img.fidx")
Dec  2 03:04:02 pvebackup proxmox-backup-proxy[672]: add blob "/mnt/nfs/pve1/vm/183/2021-12-02T02:00:27Z/qemu-server.conf.blob" (302 bytes, comp: 302)
Dec  2 03:04:02 pvebackup proxmox-backup-proxy[672]: backup ended and finish failed: backup ended but finished flag is not set.
Dec  2 03:04:02 pvebackup proxmox-backup-proxy[672]: removing unfinished backup
Dec  2 03:04:02 pvebackup proxmox-backup-proxy[672]: removing backup snapshot "/mnt/nfs/pve1/vm/183/2021-12-02T02:00:27Z"
Dec  2 03:04:02 pvebackup proxmox-backup-proxy[672]: TASK ERROR: removing backup snapshot "/mnt/nfs/pve1/vm/183/2021-12-02T02:00:27Z" failed - Directory not empty (os error 39)
 

adoII

Active Member
Jan 28, 2010
166
14
38
I confirm, still the same problem qmp command 'query-pbs-bitmap-info' failed - got timeout also here ...
 

felipe

Active Member
Oct 28, 2013
221
6
38
should the error : "qmp command 'cont' failed - got timeout" be fixed by now. (in proxmox 7.1) ?
i know it occurs only on high loaded storage backends. but that was not a problem in proxmox 6.4
 
Last edited:

ioanv

Active Member
Dec 11, 2014
44
3
28
Bug still present with latest version.
NSF storage on synology drive.
Increasing from 3 to 8 seconds did NOT solve the problem.
Increasing to 30 seconds DID solve the problem.

Kernel Version Linux 5.13.19-2-pve #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100)
PVE Manager Version pve-manager/7.1-8/5b267f33
 
  • Like
Reactions: drnoelkelly

sztanpet

New Member
Oct 18, 2021
4
2
3
This has also re-occurred for me, but now with the "qmp command 'query-pbs-bitmap-info' failed - got timeout" error.
I noticed that when it occurs the iowait is through the roof and it shouldn't be since the target drive is on nvme. The culprit based on pidstat -dl 5
are arc_prune kernel threads and googling that took me to https://github.com/openzfs/zfs/issues/6223
so in my case the root cause is a zfs issue, and it causes long timeouts that pbs isnt expecting.
 
  • Like
Reactions: fabian

Taledo

Member
Nov 20, 2020
33
4
8
51
I'm also seeing the "query-pbs-bitmap-info" issue now.


We're not running ZFS so that can't be our issue.

The PVE Is running on 5.11.22-3-pve
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!