qmp command 'cont' failed - got timeout

drbeam

Member
Apr 5, 2021
6
0
6
26
Hi all,

i build a new cluster on PVE 7.0-11 and PBS 2.0-7 and now i get this error mentioned above.

I was able to narrow down the problems on Disks that are stored on an external SMB share. If i remove the backup mark for the smb disks the backup works perfectly fine. Does anyone knows what i'm doing wrong?
Code:
INFO: starting new backup job: vzdump 103 --remove 0 --node pve02 --storage pbs02_disk1_data1 --mode snapshot
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2021-08-18 11:07:51
INFO: status = running
INFO: VM Name: docker02
INFO: include disk 'scsi0' 'local-zfs:vm-103-disk-0' 65G
INFO: include disk 'scsi1' 'nas542-smb:103/vm-103-disk-0.qcow2' 250G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: pending configuration changes found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/103/2021-08-18T09:07:51Z'
INFO: enabling encryption
INFO: started backup task 'fa7040ec-f00d-4a8e-b402-95b1069d2f12'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi1: dirty-bitmap status: existing bitmap was invalid and has been cleared
ERROR: VM 103 qmp command 'query-backup' failed - got wrong command id '994779:4200' (expected 2201423:2195)
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 103 failed - VM 103 qmp command 'query-backup' failed - got wrong command id '994779:4200' (expected 2201423:2195)
INFO: Failed at 2021-08-18 11:08:03
INFO: Backup job finished with errors
TASK ERROR: job errors

I already tried another smb share on a different NAS, still the same error.
 
Hi, this is a known error we are currently working towards fixing. So far, we've been unable to reproduce it.

If possible, it would help a lot to install the following debug packages:
http://download.proxmox.com/temp/bad-command-id-debug-output/
Code:
sha256sum:
c7d41f080a80d3c313a02bb758deaebe69d05349219dd473e3a96186f317a772  qemu-server_7.0-13+debug_amd64.deb
a5c475dd6dce3c60bbc80da0c0624443a112e12b01fdd37e73098d249a6d4e4a  qemu-server-dbgsym_7.0-13+debug_amd64.deb
and post the task log and 'journalctl -e' output if the error occurs again.

The package is a regular 'qemu-server 7.0-13' with added logging around this error. Please ensure your system is up-to-date before attempting to install it.

Thanks!
 
Hi, this is a known error we are currently working towards fixing. So far, we've been unable to reproduce it.

If possible, it would help a lot to install the following debug packages:
http://download.proxmox.com/temp/bad-command-id-debug-output/
Code:
sha256sum:
c7d41f080a80d3c313a02bb758deaebe69d05349219dd473e3a96186f317a772  qemu-server_7.0-13+debug_amd64.deb
a5c475dd6dce3c60bbc80da0c0624443a112e12b01fdd37e73098d249a6d4e4a  qemu-server-dbgsym_7.0-13+debug_amd64.deb
and post the task log and 'journalctl -e' output if the error occurs again.

The package is a regular 'qemu-server 7.0-13' with added logging around this error. Please ensure your system is up-to-date before attempting to install it.

Thanks!
Hi,

will do this tomorrow and let you know.
 
Hi Stefan,

here are the logfiles.
Task Log:
Code:
INFO: starting new backup job: vzdump 100 --node pve01 --remove 0 --storage pbs01_disk1_data1 --mode snapshot
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2021-08-20 20:36:02
INFO: status = running
INFO: VM Name: docker01
INFO: include disk 'scsi0' 'local-zfs:vm-100-disk-0' 50G
INFO: include disk 'scsi1' 'nas542-smb:100/vm-100-disk-1.qcow2' 80G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: skip unused drive 'nas542-smb:100/vm-100-disk-0.qcow2' (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/100/2021-08-20T18:36:02Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '990f1f84-a2ca-4eb5-bb94-dd6ea5180fbb'
INFO: resuming VM again
ERROR: VM 100 qmp command 'cont' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 100 failed - VM 100 qmp command 'cont' failed - got timeout
INFO: Failed at 2021-08-20 20:36:10
INFO: Backup job finished with errors
TASK ERROR: job errors
Note: i know that vm-disk-0.qcow2 is not used :)

Journalctl -e:
Code:
Aug 20 20:36:11 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:11 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:12 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:13 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:13 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:13 pve01 proxmox-backup-proxy[1850]: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.
Aug 20 20:36:59 pve01 systemd[1]: Starting Check_MK (10.0.1.3:51274)...
Aug 20 20:37:00 pve01 systemd[1]: Starting Proxmox VE replication runner...
Aug 20 20:37:00 pve01 systemd[1]: pvesr.service: Succeeded.
Aug 20 20:37:00 pve01 systemd[1]: Finished Proxmox VE replication runner.

Do you know at the moment, if this issue is also there using NFS storage? For now i could switch to a NFS based system.
 
This seems to be a seperate issue from your first? ("got timeout" vs. "got wrong command id") Those logs don't tell us anything about the original problem unfortunately...

In this case though it could be a stuck or super slow storage target, so switching to NFS might be a short-time solution for you.
 
This seems to be a seperate issue from your first? ("got timeout" vs. "got wrong command id") Those logs don't tell us anything about the original problem unfortunately...

In this case though it could be a stuck or super slow storage target, so switching to NFS might be a short-time solution for you.
Hi Stefan,

NFS solved it for now, but i got the same error a few times with NFS based storage.
Can i provide you any more logs to find this bug?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!