Error with backup - qmp command 'backup' failed - got wrong command id

Oct 1, 2019
9
0
6
45
Hello,
I don't know if it is more a PBS problem or a Proxmox VE one but we get this problem since we passed form NFS save to PBS :

At the beggining of the save of the same VM (the biggest one we have) we get this :
Code:
135: 2021-04-22 00:15:02 INFO: Starting Backup of VM 135 (qemu)
135: 2021-04-22 00:15:02 INFO: status = running
135: 2021-04-22 00:15:02 INFO: VM Name: SRV-DATA
135: 2021-04-22 00:15:02 INFO: include disk 'scsi0' 'cephploermel:vm-135-disk-0' 40G
135: 2021-04-22 00:15:02 INFO: include disk 'scsi1' 'cephploermel:vm-135-disk-1' 780G
135: 2021-04-22 00:15:02 INFO: include disk 'scsi2' 'cephploermel:vm-135-disk-2' 230G
135: 2021-04-22 00:15:02 INFO: include disk 'scsi3' 'cephploermel:vm-135-disk-3' 400G
135: 2021-04-22 00:15:02 INFO: include disk 'scsi4' 'cephploermel:vm-135-disk-4' 750G
135: 2021-04-22 00:15:02 INFO: include disk 'scsi5' 'cephploermel:vm-135-disk-5' 690G
135: 2021-04-22 00:15:02 INFO: include disk 'scsi6' 'cephploermel:vm-135-disk-6' 220G
135: 2021-04-22 00:15:03 INFO: backup mode: snapshot
135: 2021-04-22 00:15:03 INFO: ionice priority: 7
135: 2021-04-22 00:15:03 INFO: creating Proxmox Backup Server archive 'vm/135/2021-04-21T22:15:02Z'
135: 2021-04-22 00:17:09 ERROR: VM 135 qmp command 'backup' failed - got timeout
135: 2021-04-22 00:17:09 INFO: aborting backup job
135: 2021-04-22 00:18:12 ERROR: VM 135 qmp command 'backup-cancel' failed - got wrong command id '28117:6' (expected 28117:7)
135: 2021-04-22 00:18:12 ERROR: Backup of VM 135 failed - VM 135 qmp command 'backup' failed - got timeout

Until today it was only this VM (and this is not every day :( ) but a new VM started to have the same error las night.
Do you have an idea about this ?
Thanks
 
Hello I respond in this thread again because I discovered that in fact these error messages are not real errors :(
Some days we have 0 error message, some days we have 3 to 12 VM not saved if we believe only in the error message.

But I checked on the VM directly and almost every VM have all their backup, most of them anyway.
like this one, we received the error message on 5 of the last 7 backup but in proxmox :

1620639084797.png

131: 2021-05-08 22:38:53 INFO: Starting Backup of VM 131 (qemu)
131: 2021-05-08 22:38:53 INFO: status = running
131: 2021-05-08 22:38:53 INFO: VM Name: SRV-POPULATION
131: 2021-05-08 22:38:53 INFO: include disk 'scsi0' 'cephploermel:vm-131-disk-0' 65G
131: 2021-05-08 22:38:53 INFO: include disk 'scsi1' 'cephploermel:vm-131-disk-1' 5G
131: 2021-05-08 22:38:54 INFO: backup mode: snapshot
131: 2021-05-08 22:38:54 INFO: ionice priority: 7
131: 2021-05-08 22:38:54 INFO: pending configuration changes found (not included into backup)
131: 2021-05-08 22:38:54 INFO: creating Proxmox Backup Server archive 'vm/131/2021-05-08T20:38:53Z'
131: 2021-05-08 22:38:57 INFO: skipping guest-agent 'fs-freeze', agent configured but not running?
131: 2021-05-08 22:41:02 ERROR: VM 131 qmp command 'backup' failed - got timeout
131: 2021-05-08 22:41:02 INFO: aborting backup job
131: 2021-05-08 22:41:34 ERROR: VM 131 qmp command 'backup-cancel' failed - got wrong command id '14790:1042' (expected 14790:1043)
131: 2021-05-08 22:41:34 INFO: resuming VM again
131: 2021-05-08 22:41:35 ERROR: Backup of VM 131 failed - VM 131 qmp command 'backup' failed - got timeout

142: 2021-05-08 22:41:35 INFO: Starting Backup of VM 142 (qemu)
142: 2021-05-08 22:41:35 INFO: status = running
142: 2021-05-08 22:41:35 INFO: VM Name: ETHERCALC
142: 2021-05-08 22:41:35 INFO: include disk 'scsi0' 'cephploermel:vm-142-disk-0' 8G
142: 2021-05-08 22:41:35 INFO: backup mode: snapshot
142: 2021-05-08 22:41:35 INFO: ionice priority: 7
142: 2021-05-08 22:41:35 INFO: creating Proxmox Backup Server archive 'vm/142/2021-05-08T20:41:35Z'
142: 2021-05-08 22:41:35 INFO: issuing guest-agent 'fs-freeze' command
142: 2021-05-08 22:43:23 INFO: issuing guest-agent 'fs-thaw' command
142: 2021-05-08 22:43:24 INFO: started backup task '02e918eb-98a7-42ae-b9d6-ad2db2527a6c'
142: 2021-05-08 22:43:24 INFO: resuming VM again
142: 2021-05-08 22:43:24 INFO: scsi0: dirty-bitmap status: OK (84.0 MiB of 8.0 GiB dirty)
142: 2021-05-08 22:43:24 INFO: using fast incremental mode (dirty-bitmap), 84.0 MiB dirty of 8.0 GiB total
142: 2021-05-08 22:43:27 INFO: 76% (64.0 MiB of 84.0 MiB) in 3s, read: 21.3 MiB/s, write: 20.0 MiB/s
142: 2021-05-08 22:43:30 INFO: 100% (84.0 MiB of 84.0 MiB) in 6s, read: 6.7 MiB/s, write: 6.7 MiB/s
142: 2021-05-08 22:44:41 INFO: backup was done incrementally, reused 7.92 GiB (99%)
142: 2021-05-08 22:44:41 INFO: transferred 84.00 MiB in 77 seconds (1.1 MiB/s)
142: 2021-05-08 22:44:41 INFO: Finished Backup of VM 142 (00:03:06)

And I don't understand since the backup was done, how the backup of the next VM could start only 3mn after the one in false error !

So it's great because we have almost all our VM on backup, but is there a way not to get this false alarm ?
Thanks
 
Hello
got the same problem last night :

146: 2021-11-02 23:59:32 INFO: Starting Backup of VM 146 (qemu) 146: 2021-11-02 23:59:32 INFO: status = running 146: 2021-11-02 23:59:32 INFO: VM Name: freepbx-macetpc 146: 2021-11-02 23:59:32 INFO: include disk 'virtio0' 'ceph_storage:vm-146-disk-0' 20G 146: 2021-11-02 23:59:33 INFO: backup mode: snapshot 146: 2021-11-02 23:59:33 INFO: ionice priority: 7 146: 2021-11-02 23:59:33 INFO: creating Proxmox Backup Server archive 'vm/146/2021-11-02T22:59:32Z' 146: 2021-11-03 00:01:38 ERROR: VM 146 qmp command 'backup' failed - got timeout 146: 2021-11-03 00:01:38 INFO: aborting backup job 146: 2021-11-03 00:02:39 ERROR: VM 146 qmp command 'backup-cancel' failed - got wrong command id '24225:3654' (expected 24225:3655) 146: 2021-11-03 00:02:39 INFO: resuming VM again 146: 2021-11-03 00:02:39 ERROR: Backup of VM 146 failed - VM 146 qmp command 'backup' failed - got timeout

but the backup seems correct

# pveversion pve-manager/6.4-8/185e14db (running kernel: 5.4.119-1-pve) # qm status 146 --verbose blockstat: ide2: account_failed: 0 account_invalid: 0 failed_flush_operations: 0 failed_rd_operations: 0 failed_unmap_operations: 0 failed_wr_operations: 0 flush_operations: 0 flush_total_time_ns: 0 invalid_flush_operations: 0 invalid_rd_operations: 0 invalid_unmap_operations: 0 invalid_wr_operations: 0 rd_bytes: 0 rd_merged: 0 rd_operations: 0 rd_total_time_ns: 0 timed_stats: unmap_bytes: 0 unmap_merged: 0 unmap_operations: 0 unmap_total_time_ns: 0 wr_bytes: 0 wr_highest_offset: 0 wr_merged: 0 wr_operations: 0 wr_total_time_ns: 0 virtio0: account_failed: 1 account_invalid: 1 failed_flush_operations: 0 failed_rd_operations: 0 failed_unmap_operations: 0 failed_wr_operations: 0 flush_operations: 12878925 flush_total_time_ns: 1569877565950 idle_time_ns: 1211259780 invalid_flush_operations: 0 invalid_rd_operations: 0 invalid_unmap_operations: 0 invalid_wr_operations: 0 rd_bytes: 138941284352 rd_merged: 528476 rd_operations: 6422526 rd_total_time_ns: 13487784820719 timed_stats: unmap_bytes: 0 unmap_merged: 0 unmap_operations: 0 unmap_total_time_ns: 0 wr_bytes: 563466985472 wr_highest_offset: 20847198208 wr_merged: 163729 wr_operations: 37074585 wr_total_time_ns: 129735364280058 cpus: 2 disk: 0 diskread: 138941284352 diskwrite: 563466985472 maxdisk: 21474836480 maxmem: 2147483648 mem: 1236480167 name: freepbx-macetpc netin: 900292116678 netout: 21221354833 nics: tap146i0: netin: 895131779884 netout: 19912683368 tap146i1: netin: 5160336794 netout: 1308671465 pid: 5904 proxmox-support: pbs-dirty-bitmap: 1 pbs-dirty-bitmap-migration: 1 pbs-dirty-bitmap-savevm: 1 pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea) pbs-masterkey: 1 query-bitmap-info: 1 qmpstatus: running running-machine: pc-i440fx-2.12+pve0 running-qemu: 5.2.0 status: running template: uptime: 12628792 vmid: 146
Regards
 
that one is a different one, already fixed in PVE 7