PVE7 / PBS2 - Backup Timeout (qmp command 'cont' failed - got timeout)

Same here, brtfs, latest available version of proxmox.

it randomly failed for some of vm's previously. but now one VM always fails to backup

pveversion
Code:
pve-manager/7.1-10/6ddebafe (running kernel: 5.13.19-4-pve)

qm status 120 --verbose
Code:
cpus: 2
disk: 0
diskread: 0
diskwrite: 0
maxdisk: 34359738368
maxmem: 3670016000
mem: 0
name: tf
netin: 0
netout: 0
qmpstatus: stopped
status: stopped
uptime: 0
vmid: 120

pveversion -v
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-7
pve-kernel-5.13.19-4-pve: 5.13.19-9
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-6
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1

* AMD Ryzen 9 3900
* 128GB ECC Ram
* Enterprice NVMe SSD
* BTRFS raid 1
 
Last edited:
Hi,

Same issue for VM 132 & 133.

Code:
132    xxx    FAILED    00:01:21    VM 132 qmp command 'query-pbs-bitmap-info' failed - got timeout
133    xxx    FAILED    00:01:03    VM 133 qmp command 'query-pbs-bitmap-info' failed - got timeout

Hardware :

Lenovo SR635
128 Go RAM ECC
SSD NVME
Datastore NFS v3

PVE Version :

Code:
pve-manager/7.1-10/6ddebafe (running kernel: 5.15.19-1-pve)

VM 133 :

Code:
balloon: 6442450944
balloon_min: 62914560
ballooninfo:
        actual: 6442450944
        free_mem: 2501373952
        last_update: 1645438560
        major_page_faults: 2384393
        max_mem: 6442450944
        mem_swapped_in: 42773762048
        mem_swapped_out: 1039425536
        minor_page_faults: 741021483
        total_mem: 6441914368
blockstat:
        ide0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 3
                flush_total_time_ns: 201127
                idle_time_ns: 1036404297
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 76563288576
                rd_merged: 0
                rd_operations: 4300837
                rd_total_time_ns: 19881218589498
                timed_stats:
                unmap_bytes: 102215426048
                unmap_merged: 0
                unmap_operations: 335060
                unmap_total_time_ns: 4914692373
                wr_bytes: 127317865472
                wr_highest_offset: 42107231232
                wr_merged: 0
                wr_operations: 5515428
                wr_total_time_ns: 10650550564552
        ide2:
                account_failed: 0
                account_invalid: 0
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 0
                flush_total_time_ns: 0
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 0
                rd_merged: 0
                rd_operations: 0
                rd_total_time_ns: 0
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 0
                wr_highest_offset: 0
                wr_merged: 0
                wr_operations: 0
                wr_total_time_ns: 0
cpus: 4
disk: 0
diskread: 76563288576
diskwrite: 127317865472
freemem: 2501373952
maxdisk: 53687091200
maxmem: 6442450944
mem: 3940540416
name: fr-leh-eipu-rec
netin: 81689056
netout: 66108404
nics:
        tap133i0:
                netin: 81689056
                netout: 66108404
pid: 655215
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.2.0 (6e555bc73a7dcfb4d0b47355b958afd101ad27b5)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.1+pve0
running-qemu: 6.1.1
shares: 1000
status: running
uptime: 952615
vmid: 133

VM 132 :

Code:
balloon: 4294967296
ballooninfo:
        actual: 4294967296
        free_mem: 448761856
        last_update: 1645438458
        major_page_faults: 3942898
        max_mem: 4294967296
        mem_swapped_in: 70168485888
        mem_swapped_out: 2019561472
        minor_page_faults: 522275242
        total_mem: 4294430720
blockstat:
        ide0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 2
                flush_operations: 3
                flush_total_time_ns: 145382
                idle_time_ns: 319466243
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 113206216704
                rd_merged: 0
                rd_operations: 6816641
                rd_total_time_ns: 23401913901550
                timed_stats:
                unmap_bytes: 104575729664
                unmap_merged: 0
                unmap_operations: 434267
                unmap_total_time_ns: 5867540199
                wr_bytes: 126291296256
                wr_highest_offset: 49075250176
                wr_merged: 0
                wr_operations: 5510221
                wr_total_time_ns: 8968544029292
        ide2:
                account_failed: 0
                account_invalid: 0
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 0
                flush_total_time_ns: 0
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 0
                rd_merged: 0
                rd_operations: 0
                rd_total_time_ns: 0
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 0
                wr_highest_offset: 0
                wr_merged: 0
                wr_operations: 0
                wr_total_time_ns: 0
cpus: 4
disk: 0
diskread: 113206216704
diskwrite: 126291296256
freemem: 448761856
maxdisk: 53687091200
maxmem: 4294967296
mem: 3845668864
name: fr-leh-eipu-dev
netin: 654985789
netout: 181796139
nics:
        tap132i0:
                netin: 654985789
                netout: 181796139
pid: 1268625
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.2.0 (6e555bc73a7dcfb4d0b47355b958afd101ad27b5)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.1+pve0
running-qemu: 6.1.1
status: running
uptime: 896018
vmid: 132


Package Version :

Code:
proxmox-ve: 7.1-1 (running kernel: 5.15.19-1-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.15: 7.1-11
pve-kernel-5.13: 7.1-7
pve-kernel-5.15.19-2-pve: 5.15.19-2
pve-kernel-5.15.19-1-pve: 5.15.19-1
pve-kernel-5.13.19-4-pve: 5.13.19-9
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-6
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
 
Last edited:
  • Like
Reactions: evgenij
Good morning people,

I have the same problem, we have a physical host where PBS is installed and a few more physical hosts with VMs, all backing up to PBS, but on days and on random VMs, some backups have the same error reported by all of you.
 
I'm experiencing this issue as well:


  • Issue happens for 4 out of about 50VMs but always the same 4.
  • The 4 of issue are templates which are stored on an NFS v4 share.
  • Other templates and VMs from that same NFS storage backup without issue.
  • Manually initiating a backup results in the same issue, even when there is no load on the PVE nor PBS machines
  • Moving the problematic VM from the NFS storage to local dir storage resolves the issue. Moving it back to NFS re-introduces the issue
  • Following the steps outlined here to increase the timeout (I used 15sec) and restarting the services on the PVE host resolves the issue
  • Issue began after the following changes were introduced:
    • Migrated PVE from 6.x -> 7.x
    • Migrated PBS from 1.x -> 2.x
    • Added Prune & GC schedule

PBS
Version 2.1-5
2x 900GB SSD ZFS mirror
Code:
Linux pbs 5.13.19-4-pve #1 SMP PVE 5.13.19-9 (Mon, 07 Feb 2022 11:01:14 +0100) x86_64 GNU/Linux

PVE
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-3-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.4: 6.4-12
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-1
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

Client Side Log
Code:
INFO: starting new backup job: vzdump 523 --node pve3 --storage pbs --mode snapshot --remove 0
INFO: Starting Backup of VM 523 (qemu)
INFO: Backup started at 2022-02-27 10:59:01
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: win2003R2x64
INFO: include disk 'virtio0' 'truenas-nfs:523/base-523-disk-1.qcow2' 32G
INFO: creating Proxmox Backup Server archive 'vm/523/2022-02-27T15:59:01Z'
INFO: starting kvm to execute backup task
INFO: started backup task '6250a3d8-402c-483f-b3ec-99021a14ff1e'
ERROR: VM 523 qmp command 'query-pbs-bitmap-info' failed - got timeout
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 523 failed - VM 523 qmp command 'query-pbs-bitmap-info' failed - got timeout
INFO: Failed at 2022-02-27 10:59:08
INFO: Backup job finished with errors
TASK ERROR: job errors

Server Side Log
Code:
2022-02-27T10:59:02-05:00: starting new backup on datastore 'pbs-storage': "vm/523/2022-02-27T15:59:01Z"
2022-02-27T10:59:02-05:00: download 'index.json.blob' from previous backup.
2022-02-27T10:59:02-05:00: register chunks in 'drive-virtio0.img.fidx' from previous backup.
2022-02-27T10:59:02-05:00: download 'drive-virtio0.img.fidx' from previous backup.
2022-02-27T10:59:02-05:00: created new fixed index 1 ("vm/523/2022-02-27T15:59:01Z/drive-virtio0.img.fidx")
2022-02-27T10:59:02-05:00: add blob "/mnt/datastore/pbs-storage/vm/523/2022-02-27T15:59:01Z/qemu-server.conf.blob" (360 bytes, comp: 360)
2022-02-27T10:59:07-05:00: backup ended and finish failed: backup ended but finished flag is not set.
2022-02-27T10:59:07-05:00: removing unfinished backup
2022-02-27T10:59:07-05:00: TASK ERROR: backup ended but finished flag is not set.

VM Info
Code:
m status 523 --verbose
cpus: 2
disk: 0
diskread: 0
diskwrite: 0
maxdisk: 34359738368
maxmem: 2147483648
mem: 0
name: win2003R2x64
netin: 0
netout: 0
qmpstatus: stopped
status: stopped
template: 1
uptime: 0
vmid: 523
 
Last edited:
For one of VM's backup for one of VM hard drive now i'm getting: ERROR: job failed with err -5 - Input/output error
 
When i tried to move disk to another storage

drive-virtio3: transferred 9.7 GiB of 100.0 GiB (9.65%) in 1m 36s
drive-virtio3: Cancelling block job
drive-virtio3: Done.
Logical volume "vm-100-disk-2" successfully removed
TASK ERROR: storage migration failed: block job (mirror) error: drive-virtio3: 'mirror' has been cancelled
 
We're seeing the timeouts too with a local PBS backup server for many standalone Proxmox servers (no NFS). We think it's because when the backups run on the PBS, the server can be slow to respond (GUI sometimes takes 15-20 seconds to load a new page), and this is tripping that 3 second timeout in /usr/share/perl5/PVE/QMPClient.pm.

We've edited the timeout to 30s, and we'll see if that fixes these seemingly random timeout errors.
 
  • Like
Reactions: itNGO

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!