Error with backup (when backing up) - qmp command 'query-backup' failed - got wrong command id

DerDieDas

Member
Dec 3, 2019
19
2
23
44
Hi fellow Proxmoxers

two days ago, our nightly backup raised an error with one VM:
ERROR: Backup of VM 160 failed - VM 160 qmp command 'query-backup' failed - got wrong command id '12299:46449189' (expected 27831:3521)

160: 2021-04-19 23:51:05 INFO: Starting Backup of VM 160 (qemu)
160: 2021-04-19 23:51:05 INFO: status = running
160: 2021-04-19 23:51:05 INFO: VM Name: XXX
160: 2021-04-19 23:51:05 INFO: include disk 'ide0' 'ironVM:160/vm-160-disk-0.qcow2' 1000G
160: 2021-04-19 23:51:05 INFO: exclude disk 'ide1' 'ironNFS:160/vm-160-disk-1.qcow2' (backup=no)
160: 2021-04-19 23:51:05 INFO: backup mode: snapshot
160: 2021-04-19 23:51:05 INFO: ionice priority: 7
160: 2021-04-19 23:51:05 INFO: skip unused drive 'ironNFS:160/vm-160-disk-0.qcow2' (not included into backup)
160: 2021-04-19 23:51:05 INFO: creating vzdump archive '/mnt/pve/ironNFS/dump/vzdump-qemu-160-2021_04_19-23_51_04.vma.zst'
160: 2021-04-19 23:51:05 INFO: issuing guest-agent 'fs-freeze' command
160: 2021-04-19 23:51:06 INFO: issuing guest-agent 'fs-thaw' command
160: 2021-04-19 23:51:06 INFO: started backup task 'd2a5692c-f11e-4227-939f-6688a456cbf5'
160: 2021-04-19 23:51:06 INFO: resuming VM again
160: 2021-04-19 23:51:09 INFO: 0% (634.4 MiB of 1000.0 GiB) in 3s, read: 211.5 MiB/s, write: 167.7 MiB/s
160: 2021-04-19 23:52:20 INFO: 1% (10.1 GiB of 1000.0 GiB) in 1m 14s, read: 136.8 MiB/s, write: 134.5 MiB/s
160: 2021-04-19 23:53:38 INFO: 2% (20.0 GiB of 1000.0 GiB) in 2m 32s, read: 130.1 MiB/s, write: 128.5 MiB/s
160: 2021-04-19 23:54:53 INFO: 3% (30.5 GiB of 1000.0 GiB) in 3m 47s, read: 143.1 MiB/s, write: 137.2 MiB/s
160: 2021-04-19 23:56:02 INFO: 4% (40.1 GiB of 1000.0 GiB) in 4m 56s, read: 142.5 MiB/s, write: 140.9 MiB/s
160: 2021-04-19 23:57:20 INFO: 5% (50.1 GiB of 1000.0 GiB) in 6m 14s, read: 131.5 MiB/s, write: 129.9 MiB/s
160: 2021-04-19 23:58:43 INFO: 6% (60.1 GiB of 1000.0 GiB) in 7m 37s, read: 123.1 MiB/s, write: 121.7 MiB/s
160: 2021-04-19 23:59:55 INFO: 7% (70.1 GiB of 1000.0 GiB) in 8m 49s, read: 143.0 MiB/s, write: 141.0 MiB/s
160: 2021-04-20 00:01:08 INFO: 8% (80.1 GiB of 1000.0 GiB) in 10m 2s, read: 139.4 MiB/s, write: 137.7 MiB/s
160: 2021-04-20 00:02:19 INFO: 9% (90.0 GiB of 1000.0 GiB) in 11m 13s, read: 143.0 MiB/s, write: 141.3 MiB/s
160: 2021-04-20 00:03:34 INFO: 10% (100.1 GiB of 1000.0 GiB) in 12m 28s, read: 138.2 MiB/s, write: 134.5 MiB/s
160: 2021-04-20 00:04:50 INFO: 11% (110.0 GiB of 1000.0 GiB) in 13m 44s, read: 133.6 MiB/s, write: 131.9 MiB/s
160: 2021-04-20 00:06:00 INFO: 12% (120.0 GiB of 1000.0 GiB) in 14m 54s, read: 146.3 MiB/s, write: 143.7 MiB/s
160: 2021-04-20 00:07:10 INFO: 13% (130.0 GiB of 1000.0 GiB) in 16m 4s, read: 145.8 MiB/s, write: 144.5 MiB/s
160: 2021-04-20 00:07:15 INFO: 14% (140.1 GiB of 1000.0 GiB) in 16m 9s, read: 2.0 GiB/s, write: 108.9 MiB/s
160: 2021-04-20 00:07:18 INFO: 15% (157.2 GiB of 1000.0 GiB) in 16m 12s, read: 5.7 GiB/s, write: 21.3 KiB/s
160: 2021-04-20 00:07:35 INFO: 16% (160.1 GiB of 1000.0 GiB) in 16m 29s, read: 172.0 MiB/s, write: 122.3 MiB/s
160: 2021-04-20 00:08:48 INFO: 17% (170.1 GiB of 1000.0 GiB) in 17m 42s, read: 140.2 MiB/s, write: 138.9 MiB/s
160: 2021-04-20 00:10:06 INFO: 18% (180.1 GiB of 1000.0 GiB) in 19m, read: 131.6 MiB/s, write: 130.5 MiB/s
160: 2021-04-20 00:11:17 INFO: 19% (190.0 GiB of 1000.0 GiB) in 20m 11s, read: 143.2 MiB/s, write: 136.8 MiB/s
160: 2021-04-20 00:12:37 INFO: 20% (200.1 GiB of 1000.0 GiB) in 21m 31s, read: 129.1 MiB/s, write: 127.2 MiB/s
160: 2021-04-20 00:13:54 INFO: 21% (210.1 GiB of 1000.0 GiB) in 22m 48s, read: 132.5 MiB/s, write: 131.4 MiB/s
160: 2021-04-20 00:15:13 INFO: 22% (220.0 GiB of 1000.0 GiB) in 24m 7s, read: 128.8 MiB/s, write: 127.1 MiB/s
160: 2021-04-20 00:16:22 INFO: 23% (230.1 GiB of 1000.0 GiB) in 25m 16s, read: 150.1 MiB/s, write: 149.6 MiB/s
160: 2021-04-20 00:17:43 INFO: 24% (240.2 GiB of 1000.0 GiB) in 26m 37s, read: 127.5 MiB/s, write: 125.8 MiB/s
160: 2021-04-20 00:18:23 ERROR: VM 160 qmp command 'query-backup' failed - got wrong command id '12299:46449189' (expected 27831:3521)
160: 2021-04-20 00:18:23 INFO: aborting backup job
160: 2021-04-20 00:18:28 ERROR: VM 160 qmp command 'backup-cancel' failed - client closed connection
160: 2021-04-20 00:18:38 ERROR: Backup of VM 160 failed - VM 160 qmp command 'query-backup' failed - got wrong command id '12299:46449189' (expected 27831:3521)

We manually tried to backup the VM and it failed. When we set the backup mode from snapshot to stop, it worked fine.
Then, in the night from yesterday to today, the same errors occured on multiple VMs:

168: 2021-04-21 21:10:37 INFO: Starting Backup of VM 168 (qemu)
168: 2021-04-21 21:10:37 INFO: status = running
168: 2021-04-21 21:10:37 INFO: VM Name: XXX
168: 2021-04-21 21:10:37 INFO: include disk 'scsi0' 'ironVM:168/vm-168-disk-0.qcow2' 32G
168: 2021-04-21 21:10:37 INFO: exclude disk 'scsi1' 'ironNFS:168/vm-168-disk-1.qcow2' (backup=no)
168: 2021-04-21 21:10:37 INFO: backup mode: snapshot
168: 2021-04-21 21:10:37 INFO: ionice priority: 7
168: 2021-04-21 21:10:37 INFO: skip unused drive 'ironNFS:168/vm-168-disk-0.qcow2' (not included into backup)
168: 2021-04-21 21:10:37 INFO: creating vzdump archive '/mnt/pve/ironNFS/dump/vzdump-qemu-168-2021_04_21-21_10_37.vma.zst'
168: 2021-04-21 21:10:37 INFO: issuing guest-agent 'fs-freeze' command
168: 2021-04-21 21:10:37 INFO: issuing guest-agent 'fs-thaw' command
168: 2021-04-21 21:10:38 INFO: started backup task '3c48b6d7-1488-4c40-848c-4d66eb977e30'
168: 2021-04-21 21:10:38 INFO: resuming VM again
168: 2021-04-21 21:10:41 INFO: 1% (456.8 MiB of 32.0 GiB) in 3s, read: 152.2 MiB/s, write: 136.5 MiB/s
168: 2021-04-21 21:10:44 INFO: 2% (866.4 MiB of 32.0 GiB) in 6s, read: 136.5 MiB/s, write: 133.2 MiB/s
168: 2021-04-21 21:10:47 INFO: 3% (1.2 GiB of 32.0 GiB) in 9s, read: 120.8 MiB/s, write: 117.4 MiB/s
168: 2021-04-21 21:10:50 INFO: 5% (1.7 GiB of 32.0 GiB) in 12s, read: 161.9 MiB/s, write: 161.3 MiB/s
168: 2021-04-21 21:10:53 INFO: 6% (2.1 GiB of 32.0 GiB) in 15s, read: 161.9 MiB/s, write: 144.0 MiB/s
168: 2021-04-21 21:10:56 INFO: 8% (2.8 GiB of 32.0 GiB) in 18s, read: 219.9 MiB/s, write: 214.5 MiB/s
168: 2021-04-21 21:10:59 INFO: 10% (3.3 GiB of 32.0 GiB) in 21s, read: 187.3 MiB/s, write: 185.8 MiB/s
168: 2021-04-21 21:11:02 INFO: 12% (3.9 GiB of 32.0 GiB) in 24s, read: 180.0 MiB/s, write: 178.5 MiB/s
168: 2021-04-21 21:11:06 INFO: 13% (4.4 GiB of 32.0 GiB) in 28s, read: 125.1 MiB/s, write: 117.7 MiB/s
168: 2021-04-21 21:11:09 INFO: 15% (4.9 GiB of 32.0 GiB) in 31s, read: 189.7 MiB/s, write: 189.5 MiB/s
168: 2021-04-21 21:11:12 INFO: 17% (5.5 GiB of 32.0 GiB) in 34s, read: 190.9 MiB/s, write: 190.9 MiB/s
168: 2021-04-21 21:11:15 INFO: 18% (6.0 GiB of 32.0 GiB) in 37s, read: 193.3 MiB/s, write: 184.4 MiB/s
168: 2021-04-21 21:11:18 INFO: 20% (6.4 GiB of 32.0 GiB) in 40s, read: 139.0 MiB/s, write: 136.5 MiB/s
168: 2021-04-21 21:11:21 INFO: 21% (6.9 GiB of 32.0 GiB) in 43s, read: 142.6 MiB/s, write: 136.4 MiB/s
168: 2021-04-21 21:11:26 INFO: 23% (7.6 GiB of 32.0 GiB) in 48s, read: 142.8 MiB/s, write: 141.0 MiB/s
168: 2021-04-21 21:11:29 INFO: 24% (8.0 GiB of 32.0 GiB) in 51s, read: 147.0 MiB/s, write: 131.0 MiB/s
168: 2021-04-21 21:11:32 INFO: 26% (8.4 GiB of 32.0 GiB) in 54s, read: 138.1 MiB/s, write: 125.6 MiB/s
168: 2021-04-21 21:11:36 INFO: 28% (9.0 GiB of 32.0 GiB) in 58s, read: 145.0 MiB/s, write: 131.6 MiB/s
168: 2021-04-21 21:11:39 INFO: 29% (9.3 GiB of 32.0 GiB) in 1m 1s, read: 128.1 MiB/s, write: 118.1 MiB/s
168: 2021-04-21 21:11:42 INFO: 30% (9.7 GiB of 32.0 GiB) in 1m 4s, read: 122.0 MiB/s, write: 120.5 MiB/s
168: 2021-04-21 21:11:46 INFO: 32% (10.2 GiB of 32.0 GiB) in 1m 8s, read: 139.6 MiB/s, write: 114.4 MiB/s
168: 2021-04-21 21:11:49 INFO: 33% (10.6 GiB of 32.0 GiB) in 1m 11s, read: 131.7 MiB/s, write: 128.1 MiB/s
168: 2021-04-21 21:11:52 INFO: 34% (11.0 GiB of 32.0 GiB) in 1m 14s, read: 114.8 MiB/s, write: 109.9 MiB/s
168: 2021-04-21 21:11:56 INFO: 35% (11.5 GiB of 32.0 GiB) in 1m 18s, read: 135.0 MiB/s, write: 134.4 MiB/s
168: 2021-04-21 21:11:59 INFO: 36% (11.8 GiB of 32.0 GiB) in 1m 21s, read: 113.6 MiB/s, write: 113.4 MiB/s
168: 2021-04-21 21:12:02 INFO: 38% (12.2 GiB of 32.0 GiB) in 1m 24s, read: 129.3 MiB/s, write: 118.2 MiB/s
168: 2021-04-21 21:12:06 INFO: 39% (12.7 GiB of 32.0 GiB) in 1m 28s, read: 128.7 MiB/s, write: 128.4 MiB/s
168: 2021-04-21 21:12:09 INFO: 40% (13.1 GiB of 32.0 GiB) in 1m 31s, read: 124.5 MiB/s, write: 123.9 MiB/s
168: 2021-04-21 21:12:12 INFO: 41% (13.4 GiB of 32.0 GiB) in 1m 34s, read: 116.0 MiB/s, write: 115.6 MiB/s
168: 2021-04-21 21:12:16 INFO: 43% (14.0 GiB of 32.0 GiB) in 1m 38s, read: 145.9 MiB/s, write: 143.2 MiB/s
168: 2021-04-21 21:12:19 INFO: 45% (14.5 GiB of 32.0 GiB) in 1m 41s, read: 172.8 MiB/s, write: 164.2 MiB/s
168: 2021-04-21 21:12:22 INFO: 46% (14.9 GiB of 32.0 GiB) in 1m 44s, read: 125.7 MiB/s, write: 125.5 MiB/s
168: 2021-04-21 21:12:26 ERROR: VM 168 qmp command 'query-backup' failed - got wrong command id '12299:49138754' (expected 73836:1299)
168: 2021-04-21 21:12:26 INFO: aborting backup job
168: 2021-04-21 21:12:27 ERROR: Backup of VM 168 failed - VM 168 qmp command 'query-backup' failed - got wrong command id '12299:49138754' (expected 73836:1299)

Clues:
- There's a QEMU guest agent installed on all the above suspects/VMs (Windows & Mac).
- All are on the same host. The other hosts are not affected (yet).
- The same VM from day 1 had the same error on day 2.
- If we manually backup in snapshot mode, the same error occures. But if we manually backup in stop mode, no errors occure.
- OS independent: some are Windows Servers, some Linux.
- It seems to spread, day 1 one VM was affected, day 2 five VMs are affected.

Just rebooting is not an option as there are a couple dozens of VMs running on the host.

And the pve versions:
Proxmox
Virtual Environment 6.3-6
Node 'XY'
CPU usage

9.54% of 96 CPU(s)

IO delay

0.01%
Load average

12.03,12.39,13.20
RAM usage

79.50% (1.17 TiB of 1.48 TiB)

KSM sharing

61.61 GiB
HD space(root)

3.26% (7.01 GiB of 215.05 GiB)

SWAP usage

N/A


CPU(s)

96 x Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (2 Sockets)
Kernel Version

Linux 5.4.103-1-pve #1 SMP PVE 5.4.103-1 (Sun, 07 Mar 2021 15:55:09 +0100)
PVE Manager Version

pve-manager/6.3-6/2184247e
Logs
()
proxmox-ve: 6.3-1 (running kernel: 5.4.103-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-7
pve-kernel-helper: 6.3-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-6
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-4
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve2

Any help or hints from the hive is appreciated :)

Best,
Andy
 
Hi,

Please try to update to the latest version of PVE apt update && apt full-upgrade and try again. If the issue occurs again check the journal during the VM backup for any error.
 
Hi Moayad,

thanks for your reply.
I think we'll need to reboot after that to take effect, is that correct? That is not possible until our mid-May maintenance window, I'm afraid :-(
 
I think we'll need to reboot after that to take effect, is that correct?
In general, with a cluster you can live-migrate machines to already updated nodes before rebooting the node to avoid VM downtime.

Any way, I do not think just applying the latest updates will help here.
All are on the same host. The other hosts are not affected (yet).

What is different on the other hosts? Older/newer versions, other storage, other OS in the VMs, ...?
 
In general, with a cluster you can live-migrate machines to already updated nodes before rebooting the node to avoid VM downtime.

Any way, I do not think just applying the latest updates will help here.


What is different on the other hosts? Older/newer versions, other storage, other OS in the VMs, ...?

Thanks Thomas for your input.
Both hosts should be the same, they were updated almost at the same time.

proxmox-ve: 6.3-1 (running kernel: 5.4.103-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-7
pve-kernel-helper: 6.3-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-6
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-4
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve2

proxmox-ve: 6.3-1 (running kernel: 5.4.103-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-7
pve-kernel-helper: 6.3-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-6
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-4
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve2

They use the same NFS storage where they backup to. VMs are running mostly locally (fast NVMe disks; we had speed issues when having the system disks on NFS) with ("slow") data disks on the NFS storage. So live migration could be tricky.

Best,
Andy
 
Hello, I put in this thread some informations Fabian asked from my post on the PBS forum with almost the same error.

pveversion:
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

qmp status:
I tried on all the Proxmox node and get that :( :
Code:
root@proxmox06:~# qmp status VMID --verbose
-bash: qmp: command not found
path error ??
 
Quick update to this:
We solved the issue *fingers-crossed*. After further investigation, we saw in the log that one VM is doing wonky stuff:
Code:
Apr 22 10:32:07 HOST pvedaemon[78720]: VM 172 qmp command failed - VM 172 qmp command 'guest-ping' failed - unable to connect to VM 172 qga socket - timeout after 31 retries
Apr 22 10:32:11 HOST pvestatd[12299]: VM 172 qmp command failed - VM 172 qmp command 'query-proxmox-support' failed - unable to connect to VM 172 qmp socket - timeout after 31 retries
We stopped the VM and rebooted it (yesterday afternoon). Last night, all backups went smoothly. We'll observe over the weekend if any further problems arise.

Here is, however, the output of qm status of one of the affected VMs of my original post.
# qm status 160 --verbose
blockstat:
ide0:
account_failed: 1
account_invalid: 1
failed_flush_operations: 0
failed_rd_operations: 0
failed_unmap_operations: 0
failed_wr_operations: 0
flush_operations: 24227
flush_total_time_ns: 18024212483
idle_time_ns: 820360395
invalid_flush_operations: 0
invalid_rd_operations: 0
invalid_unmap_operations: 0
invalid_wr_operations: 0
rd_bytes: 6945859072
rd_merged: 0
rd_operations: 376620
rd_total_time_ns: 24821503322
timed_stats:
unmap_bytes: 6384287744
unmap_merged: 0
unmap_operations: 25446
unmap_total_time_ns: 232171207
wr_bytes: 2985208320
wr_highest_offset: 637455032320
wr_merged: 0
wr_operations: 223096
wr_total_time_ns: 26504464191
ide1:
account_failed: 1
account_invalid: 1
failed_flush_operations: 0
failed_rd_operations: 0
failed_unmap_operations: 0
failed_wr_operations: 0
flush_operations: 12
flush_total_time_ns: 1632576
idle_time_ns: 3370354721774
invalid_flush_operations: 0
invalid_rd_operations: 0
invalid_unmap_operations: 0
invalid_wr_operations: 0
rd_bytes: 369911808
rd_merged: 0
rd_operations: 2936
rd_total_time_ns: 5114223325
timed_stats:
unmap_bytes: 0
unmap_merged: 0
unmap_operations: 0
unmap_total_time_ns: 0
wr_bytes: 94208
wr_highest_offset: 3238039552
wr_merged: 0
wr_operations: 21
wr_total_time_ns: 25029890
cpus: 16
disk: 0
diskread: 7315770880
diskwrite: 2985302528
maxdisk: 1073741824000
maxmem: 206158430208
mem: 203427142355
name: XXX
netin: 3074167904
netout: 122488445
nics:
tap160i0:
netin: 60
netout: 0
tap160i1:
netin: 3074167844
netout: 122488445
pid: 43436
proxmox-support:
pbs-dirty-bitmap: 1
pbs-dirty-bitmap-migration: 1
pbs-dirty-bitmap-savevm: 1
pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea)
pbs-masterkey: 1
query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.1+pve0
running-qemu: 5.2.0
status: running
template:
uptime: 90593
vmid: 160

Thanks and we'll keep you updated.
Andy
 
As recommended by fabian in the thread
https://forum.proxmox.com/threads/pve-shows-failed-pbs-shows-ok.88045/

pveversion -v
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.103-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-7
pve-kernel-helper: 6.3-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
ceph-fuse: 14.2.16-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.0.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-6
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-3
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve2

qm status VMID --verbose:
Code:
balloon: 8589934592
ballooninfo:
        actual: 8589934592
        last_update: 1617372209
        max_mem: 8589934592
blockstat:
        ide0:
                account_failed: 0
                account_invalid: 0
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 0
                flush_total_time_ns: 0
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 0
                rd_merged: 0
                rd_operations: 0
                rd_total_time_ns: 0
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 0
                wr_highest_offset: 0
                wr_merged: 0
                wr_operations: 0
                wr_total_time_ns: 0
        scsi0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 2
                flush_total_time_ns: 162690
                idle_time_ns: 1508553556
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 515251768320
                rd_merged: 0
                rd_operations: 3597254
                rd_total_time_ns: 96737500989145
                timed_stats:
                unmap_bytes: 441322471424
                unmap_merged: 0
                unmap_operations: 722633
                unmap_total_time_ns: 8053953062
                wr_bytes: 254920104960
                wr_highest_offset: 39878246400
                wr_merged: 0
                wr_operations: 9721192
                wr_total_time_ns: 309139099627586
cpus: 2
disk: 0
diskread: 515251768320
diskwrite: 254920104960
maxdisk: 107374182400
maxmem: 8589934592
mem: 7155977840
name: DC001-10000
netin: 25474267010
netout: 129401102361
nics:
        tap134i0:
                netin: 25328863003
                netout: 129399821298
        tap134i1:
                netin: 145404007
                netout: 1281063
pid: 981043
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.2+pve0
running-qemu: 5.2.0
status: running
template:
uptime: 1789276
vmid: 134
 
We stopped the VM and rebooted it (yesterday afternoon). Last night, all backups went smoothly. We'll observe over the weekend if any further problems arise.
So the guest-agent seems to be a possible correlation, what version is in use inside the VM?

As recommended by fabian in the thread
Is the QEMU guest agent enabled and started for the VM where this happened?
 
So the guest-agent seems to be a possible correlation, what version is in use inside the VM?

The person responsible just told me that they did not (re-)install the guest-agent.
It's a Fedora Core OS that's provisioned via a json. We told the person to install the guest-agent - and they did for a couple of month. During the latest iteration, the guest-agent somehow did not make it into the json and thus was not installed, even though we set the OPTION in the WebGUI for the VM to Guest-Agent Enabled. Maybe this was the root of the problem?
 
During the latest iteration, the guest-agent somehow did not make it into the json and thus was not installed, even though we set the OPTION in the WebGUI for the VM to Guest-Agent Enabled. Maybe this was the root of the problem?
Hmm, enabling the flag causes more QMP commands, which may potentially wait a bit for the QGA to reply, cannot say for sure but it would seem possible to me that this correlates with the log issue.
 
Quick update to this:
We solved the issue *fingers-crossed*. After further investigation, we saw in the log that one VM is doing wonky stuff:
Code:
Apr 22 10:32:07 HOST pvedaemon[78720]: VM 172 qmp command failed - VM 172 qmp command 'guest-ping' failed - unable to connect to VM 172 qga socket - timeout after 31 retries
Apr 22 10:32:11 HOST pvestatd[12299]: VM 172 qmp command failed - VM 172 qmp command 'query-proxmox-support' failed - unable to connect to VM 172 qmp socket - timeout after 31 retries
We stopped the VM and rebooted it (yesterday afternoon). Last night, all backups went smoothly. We'll observe over the weekend if any further problems arise.

Here is, however, the output of qm status of one of the affected VMs of my original post.
# qm status 160 --verbose
blockstat:
ide0:
account_failed: 1
account_invalid: 1
failed_flush_operations: 0
failed_rd_operations: 0
failed_unmap_operations: 0
failed_wr_operations: 0
flush_operations: 24227
flush_total_time_ns: 18024212483
idle_time_ns: 820360395
invalid_flush_operations: 0
invalid_rd_operations: 0
invalid_unmap_operations: 0
invalid_wr_operations: 0
rd_bytes: 6945859072
rd_merged: 0
rd_operations: 376620
rd_total_time_ns: 24821503322
timed_stats:
unmap_bytes: 6384287744
unmap_merged: 0
unmap_operations: 25446
unmap_total_time_ns: 232171207
wr_bytes: 2985208320
wr_highest_offset: 637455032320
wr_merged: 0
wr_operations: 223096
wr_total_time_ns: 26504464191
ide1:
account_failed: 1
account_invalid: 1
failed_flush_operations: 0
failed_rd_operations: 0
failed_unmap_operations: 0
failed_wr_operations: 0
flush_operations: 12
flush_total_time_ns: 1632576
idle_time_ns: 3370354721774
invalid_flush_operations: 0
invalid_rd_operations: 0
invalid_unmap_operations: 0
invalid_wr_operations: 0
rd_bytes: 369911808
rd_merged: 0
rd_operations: 2936
rd_total_time_ns: 5114223325
timed_stats:
unmap_bytes: 0
unmap_merged: 0
unmap_operations: 0
unmap_total_time_ns: 0
wr_bytes: 94208
wr_highest_offset: 3238039552
wr_merged: 0
wr_operations: 21
wr_total_time_ns: 25029890
cpus: 16
disk: 0
diskread: 7315770880
diskwrite: 2985302528
maxdisk: 1073741824000
maxmem: 206158430208
mem: 203427142355
name: XXX
netin: 3074167904
netout: 122488445
nics:
tap160i0:
netin: 60
netout: 0
tap160i1:
netin: 3074167844
netout: 122488445
pid: 43436
proxmox-support:
pbs-dirty-bitmap: 1
pbs-dirty-bitmap-migration: 1
pbs-dirty-bitmap-savevm: 1
pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea)
pbs-masterkey: 1
query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.1+pve0
running-qemu: 5.2.0
status: running
template:
uptime: 90593
vmid: 160

Thanks and we'll keep you updated.
Andy

Update 2: The backups still run successfully during the night. I'll post a 3rd and last update Monday.
 
  • Like
Reactions: t.lamprecht
Bash:
qm status VMID --verbose
well.....
Code:
root@proxmox06:~# qm status VMID --verbose
400 Parameter verification failed.
vmid: type check ('integer') failed - got 'VMID'
qm status <vmid> [OPTIONS]

But I will follow the global idea here and start to check guest agent on VM to see if it improves our situation.
 
well.....
Code:
root@proxmox06:~# qm status VMID --verbose
400 Parameter verification failed.
vmid: type check ('integer') failed - got 'VMID'
qm status <vmid> [OPTIONS]

But I will follow the global idea here and start to check guest agent on VM to see if it improves our situation.
The VMID string in the proposed command needs to be replaced with the actual, well, VMID integer... (Not that command's output matters too much currently, so a more general hint about proposed commands in the forum here)
 
Last edited:
lol my bad :)
note to myself, never do that kind of thing when you just wake up !
That VM (Windows one) was the one firstly at fault but the backup was fine for the last 3 updates... I updated the guest agent 3 days ago !
I have other VM under Windows or Linux with problems since...
Code:
qm status 135 --verbose
balloon: 4294967296
ballooninfo:
        actual: 4294967296
        last_update: 1619081215
        max_mem: 4294967296
blockstat:
        ide2:
                account_failed: 0
                account_invalid: 0
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 0
                flush_total_time_ns: 0
                idle_time_ns: 334108294692552
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 6641664
                rd_merged: 0
                rd_operations: 134
                rd_total_time_ns: 18975203
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 0
                wr_highest_offset: 0
                wr_merged: 0
                wr_operations: 0
                wr_total_time_ns: 0
        scsi0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 457269
                flush_total_time_ns: 1976373659460
                idle_time_ns: 3103567877
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 29193788416
                rd_merged: 0
                rd_operations: 2220228
                rd_total_time_ns: 19801108766323
                timed_stats:
                unmap_bytes: 66889007104
                unmap_merged: 0
                unmap_operations: 507492
                unmap_total_time_ns: 4315829478
                wr_bytes: 45227201024
                wr_highest_offset: 42711834624
                wr_merged: 0
                wr_operations: 1938014
                wr_total_time_ns: 47882991946401
        scsi1:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 30537
                flush_total_time_ns: 1892590479857
                idle_time_ns: 15974996
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 1136801896960
                rd_merged: 0
                rd_operations: 72085166
                rd_total_time_ns: 616783874018798
                timed_stats:
                unmap_bytes: 287629680640
                unmap_merged: 0
                unmap_operations: 337896
                unmap_total_time_ns: 2474629584
                wr_bytes: 234423429632
                wr_highest_offset: 832179945472
                wr_merged: 0
                wr_operations: 1482804
                wr_total_time_ns: 203476290452260
        scsi2:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 6456
                flush_total_time_ns: 44239405553
                idle_time_ns: 470929158937
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 75067018752
                rd_merged: 0
                rd_operations: 14061111
                rd_total_time_ns: 37126767993682
                timed_stats:
                unmap_bytes: 65538916352
                unmap_merged: 0
                unmap_operations: 1069
                unmap_total_time_ns: 11164009
                wr_bytes: 739126784
                wr_highest_offset: 191244529664
                wr_merged: 0
                wr_operations: 47293
                wr_total_time_ns: 1033540779471
        scsi3:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 13213
                flush_total_time_ns: 53866857850
                idle_time_ns: 470929048660
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 81918363648
                rd_merged: 0
                rd_operations: 16116767
                rd_total_time_ns: 43680478409049
                timed_stats:
                unmap_bytes: 91718246400
                unmap_merged: 0
                unmap_operations: 1872
                unmap_total_time_ns: 19623100
                wr_bytes: 2205566464
                wr_highest_offset: 349709815808
                wr_merged: 0
                wr_operations: 94525
                wr_total_time_ns: 2464309642553
        scsi4:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 28935
                flush_total_time_ns: 349267067351
                idle_time_ns: 470928915964
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 314985014272
                rd_merged: 0
                rd_operations: 59540065
                rd_total_time_ns: 166830923209953
                timed_stats:
                unmap_bytes: 163209981952
                unmap_merged: 0
                unmap_operations: 8562
                unmap_total_time_ns: 75138275
                wr_bytes: 5571679744
                wr_highest_offset: 796294840320
                wr_merged: 0
                wr_operations: 337059
                wr_total_time_ns: 6972040260376
        scsi5:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 3014
                flush_total_time_ns: 31898424186
                idle_time_ns: 470928750931
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 167021452288
                rd_merged: 0
                rd_operations: 16453481
                rd_total_time_ns: 45290633737565
                timed_stats:
                unmap_bytes: 131431297024
                unmap_merged: 0
                unmap_operations: 950
                unmap_total_time_ns: 9518377
                wr_bytes: 3648072192
                wr_highest_offset: 715365871616
                wr_merged: 0
                wr_operations: 42639
                wr_total_time_ns: 2439583614113
        scsi6:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 2526
                flush_total_time_ns: 12835315621
                idle_time_ns: 470928566343
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 77020182016
                rd_merged: 0
                rd_operations: 16895865
                rd_total_time_ns: 35497109641982
                timed_stats:
                unmap_bytes: 82838622208
                unmap_merged: 0
                unmap_operations: 1470
                unmap_total_time_ns: 14171114
                wr_bytes: 479986176
                wr_highest_offset: 213524103168
                wr_merged: 0
                wr_operations: 21650
                wr_total_time_ns: 408426552942
cpus: 2
disk: 0
diskread: 1882014358016
diskwrite: 292295062016
maxdisk: 42949672960
maxmem: 4294967296
mem: 1831761355
name: SRV-DATA
netin: 446612179472
netout: 470522125219
nics:
        tap135i0:
                netin: 79189495
                netout: 884408
        tap135i1:
                netin: 446532989977
                netout: 470521240811
pid: 14319
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.1+pve0
running-qemu: 5.2.0
status: running
template:
uptime: 801020
vmid: 135

The last VM on error (linux, freebsd)
Code:
qm status 147 --verbose
balloon: 4294967296
ballooninfo:
        actual: 4294967296
        free_mem: 123244544
        last_update: 1619416028
        major_page_faults: 17203
        max_mem: 4294967296
        mem_swapped_in: 87683072
        mem_swapped_out: 236388352
        minor_page_faults: 130826608
        total_mem: 4129189888
blockstat:
        ide2:
                account_failed: 0
                account_invalid: 0
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 0
                flush_total_time_ns: 0
                idle_time_ns: 504587435963672
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 316
                rd_merged: 0
                rd_operations: 10
                rd_total_time_ns: 236998
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 0
                wr_highest_offset: 0
                wr_merged: 0
                wr_operations: 0
                wr_total_time_ns: 0
        scsi0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 109508
                flush_total_time_ns: 207364045458
                idle_time_ns: 20220196740
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 5972670464
                rd_merged: 0
                rd_operations: 110773
                rd_total_time_ns: 2044911958448
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 5856745472
                wr_highest_offset: 34345172992
                wr_merged: 0
                wr_operations: 474029
                wr_total_time_ns: 11497600075094
cpus: 2
disk: 0
diskread: 5972670780
diskwrite: 5856745472
freemem: 123244544
maxdisk: 34359738368
maxmem: 4294967296
mem: 4005945344
name: SRV-ALCASAR
netin: 2743762969
netout: 3239182506
nics:
        tap147i0:
                netin: 1767842313
                netout: 137386569
        tap147i1:
                netin: 975920656
                netout: 3101795937
pid: 4197
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.2+pve0
running-qemu: 5.2.0
status: running
template:
uptime: 801653
vmid: 147
 
Last edited:
Hello
on my side, 3 VM on the same error this morning, all of them with the last guest agent... It's not every morning so it's more problematic than catastrophic :)
 
I get the same error on only one of my VMs. It always occurs at different ptimes of the backup process, QEMU guest agent runs on the VM. Deactivating it made no difference, neither does chaning the backup mode from snapshot to suspend. My Proxmox version is 7.0-10.

INFO: starting new backup job: vzdump 201 --compress zstd --storage Proxmox-Backup --node pve --mode snapshot --remove 0 INFO: Starting Backup of VM 201 (qemu) INFO: Backup started at 2021-07-17 10:08:35 INFO: status = running INFO: VM Name: Nextcloud INFO: include disk 'scsi0' 'Proxmox-Backup:201/vm-201-disk-0.qcow2' 450G INFO: backup mode: snapshot INFO: ionice priority: 7 INFO: skip unused drive 'local-zfs:vm-201-disk-1' (not included into backup) INFO: creating vzdump archive '/media/usb/Backup5TB1/Proxmox-Backup/dump/vzdump-qemu-201-2021_07_17-10_08_35.vma.zst' INFO: issuing guest-agent 'fs-freeze' command INFO: issuing guest-agent 'fs-thaw' command INFO: started backup task 'f6831c35-0b3b-453b-89c7-e4ec81df5660' INFO: resuming VM again INFO: 0% (39.9 MiB of 450.0 GiB) in 3s, read: 13.3 MiB/s, write: 10.6 MiB/s .... INFO: 74% (333.2 GiB of 450.0 GiB) in 2h 37m 17s, read: 115.4 MiB/s, write: 0 B/s ERROR: VM 201 qmp command 'query-backup' failed - got wrong command id '1515:37904' (expected 2127233:14738) INFO: aborting backup job INFO: resuming VM again ERROR: Backup of VM 201 failed - VM 201 qmp command 'query-backup' failed - got wrong command id '1515:37904' (expected 2127233:14738) INFO: Failed at 2021-07-17 12:46:13 INFO: Backup job finished with errors TASK ERROR: job errors

Any tips for troubleshooting?
 
Sorry for the delay on this issue. We're starting to get more reports of it now, and are working on it, but so far have been unable to reproduce the issue locally.

We've now put up a debug build for 'qemu-server'. It would be very helpful if you could install the packages and, if the problem appears again, post the full Task Log as well as 'journalctl -e' output from the time it happened.

http://download.proxmox.com/temp/bad-command-id-debug-output/
Code:
sha256sum:
c7d41f080a80d3c313a02bb758deaebe69d05349219dd473e3a96186f317a772  qemu-server_7.0-13+debug_amd64.deb
a5c475dd6dce3c60bbc80da0c0624443a112e12b01fdd37e73098d249a6d4e4a  qemu-server-dbgsym_7.0-13+debug_amd64.deb

The package is a regular 'qemu-server 7.0-13' with added logging around the error.

Thanks!
 
  • Like
Reactions: dahlgren

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!