qmp command 'cont' failed - got timeout

drbeam

Member
Apr 5, 2021
6
0
21
26
Hi all,

i build a new cluster on PVE 7.0-11 and PBS 2.0-7 and now i get this error mentioned above.

I was able to narrow down the problems on Disks that are stored on an external SMB share. If i remove the backup mark for the smb disks the backup works perfectly fine. Does anyone knows what i'm doing wrong?
Code:
INFO: starting new backup job: vzdump 103 --remove 0 --node pve02 --storage pbs02_disk1_data1 --mode snapshot
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2021-08-18 11:07:51
INFO: status = running
INFO: VM Name: docker02
INFO: include disk 'scsi0' 'local-zfs:vm-103-disk-0' 65G
INFO: include disk 'scsi1' 'nas542-smb:103/vm-103-disk-0.qcow2' 250G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: pending configuration changes found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/103/2021-08-18T09:07:51Z'
INFO: enabling encryption
INFO: started backup task 'fa7040ec-f00d-4a8e-b402-95b1069d2f12'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi1: dirty-bitmap status: existing bitmap was invalid and has been cleared
ERROR: VM 103 qmp command 'query-backup' failed - got wrong command id '994779:4200' (expected 2201423:2195)
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 103 failed - VM 103 qmp command 'query-backup' failed - got wrong command id '994779:4200' (expected 2201423:2195)
INFO: Failed at 2021-08-18 11:08:03
INFO: Backup job finished with errors
TASK ERROR: job errors

I already tried another smb share on a different NAS, still the same error.
 
Hi, this is a known error we are currently working towards fixing. So far, we've been unable to reproduce it.

If possible, it would help a lot to install the following debug packages:
http://download.proxmox.com/temp/bad-command-id-debug-output/
Code:
sha256sum:
c7d41f080a80d3c313a02bb758deaebe69d05349219dd473e3a96186f317a772  qemu-server_7.0-13+debug_amd64.deb
a5c475dd6dce3c60bbc80da0c0624443a112e12b01fdd37e73098d249a6d4e4a  qemu-server-dbgsym_7.0-13+debug_amd64.deb
and post the task log and 'journalctl -e' output if the error occurs again.

The package is a regular 'qemu-server 7.0-13' with added logging around this error. Please ensure your system is up-to-date before attempting to install it.

Thanks!
 
Hi, this is a known error we are currently working towards fixing. So far, we've been unable to reproduce it.

If possible, it would help a lot to install the following debug packages:
http://download.proxmox.com/temp/bad-command-id-debug-output/
Code:
sha256sum:
c7d41f080a80d3c313a02bb758deaebe69d05349219dd473e3a96186f317a772  qemu-server_7.0-13+debug_amd64.deb
a5c475dd6dce3c60bbc80da0c0624443a112e12b01fdd37e73098d249a6d4e4a  qemu-server-dbgsym_7.0-13+debug_amd64.deb
and post the task log and 'journalctl -e' output if the error occurs again.

The package is a regular 'qemu-server 7.0-13' with added logging around this error. Please ensure your system is up-to-date before attempting to install it.

Thanks!
Hi,

will do this tomorrow and let you know.
 
Hi Stefan,

here are the logfiles.
Task Log:
Code:
INFO: starting new backup job: vzdump 100 --node pve01 --remove 0 --storage pbs01_disk1_data1 --mode snapshot
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2021-08-20 20:36:02
INFO: status = running
INFO: VM Name: docker01
INFO: include disk 'scsi0' 'local-zfs:vm-100-disk-0' 50G
INFO: include disk 'scsi1' 'nas542-smb:100/vm-100-disk-1.qcow2' 80G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: skip unused drive 'nas542-smb:100/vm-100-disk-0.qcow2' (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/100/2021-08-20T18:36:02Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '990f1f84-a2ca-4eb5-bb94-dd6ea5180fbb'
INFO: resuming VM again
ERROR: VM 100 qmp command 'cont' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 100 failed - VM 100 qmp command 'cont' failed - got timeout
INFO: Failed at 2021-08-20 20:36:10
INFO: Backup job finished with errors
TASK ERROR: job errors
Note: i know that vm-disk-0.qcow2 is not used :)

Journalctl -e:
Code:
Aug 20 20:36:11 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:11 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:13 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:13 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:13 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:59 pve01 systemd[1]: Starting Check_MK (10.0.1.3:51274)...
Aug 20 20:37:00 pve01 systemd[1]: Starting Proxmox VE replication runner...
Aug 20 20:37:00 pve01 systemd[1]: pvesr.service: Succeeded.
Aug 20 20:37:00 pve01 systemd[1]: Finished Proxmox VE replication runner.

Do you know at the moment, if this issue is also there using NFS storage? For now i could switch to a NFS based system.
 
This seems to be a seperate issue from your first? ("got timeout" vs. "got wrong command id") Those logs don't tell us anything about the original problem unfortunately...

In this case though it could be a stuck or super slow storage target, so switching to NFS might be a short-time solution for you.
 
This seems to be a seperate issue from your first? ("got timeout" vs. "got wrong command id") Those logs don't tell us anything about the original problem unfortunately...

In this case though it could be a stuck or super slow storage target, so switching to NFS might be a short-time solution for you.
Hi Stefan,

NFS solved it for now, but i got the same error a few times with NFS based storage.
Can i provide you any more logs to find this bug?