Error with backup (when backing up) - qmp command 'query-backup' failed - got wrong command id

Thank you for looking into this!

Task Log:
INFO: starting new backup job: vzdump 201 --mailto x@y.de.de --compress 0 --node pve --quiet 1 --storage Proxmox-Backup --mode snapshot --all 0 --mailnotification always INFO: Starting Backup of VM 201 (qemu) INFO: Backup started at 2021-08-19 22:26:44 INFO: status = running INFO: VM Name: Nextcloud INFO: include disk 'scsi0' 'local-zfs:vm-201-disk-1' 450G INFO: backup mode: snapshot INFO: ionice priority: 7 INFO: skip unused drive 'Proxmox-Backup:201/vm-201-disk-0.qcow2' (not included into backup) INFO: creating vzdump archive '/media/usb/Backup5TB1/Proxmox-Backup/dump/vzdump-qemu-201-2021_08_19-22_26_43.vma' INFO: issuing guest-agent 'fs-freeze' command INFO: issuing guest-agent 'fs-thaw' command INFO: started backup task '83f5b3b9-e043-4e66-906f-8bc59c12afb2' INFO: resuming VM again INFO: 0% (1.4 GiB of 450.0 GiB) in 5s, read: 292.2 MiB/s, write: 273.4 MiB/s INFO: 1% (4.6 GiB of 450.0 GiB) in 40s, read: 93.6 MiB/s, write: 88.4 MiB/s INFO: 2% (9.2 GiB of 450.0 GiB) in 1m 34s, read: 86.3 MiB/s, write: 82.8 MiB/s INFO: 3% (13.6 GiB of 450.0 GiB) in 2m 29s, read: 82.1 MiB/s, write: 81.9 MiB/s INFO: 4% (18.5 GiB of 450.0 GiB) in 3m 29s, read: 83.3 MiB/s, write: 83.2 MiB/s INFO: 5% (22.6 GiB of 450.0 GiB) in 4m 19s, read: 83.7 MiB/s, write: 83.7 MiB/s INFO: 6% (27.4 GiB of 450.0 GiB) in 5m 20s, read: 82.0 MiB/s, write: 82.0 MiB/s INFO: 7% (31.9 GiB of 450.0 GiB) in 6m 16s, read: 81.8 MiB/s, write: 81.8 MiB/s INFO: 8% (36.3 GiB of 450.0 GiB) in 7m 10s, read: 83.7 MiB/s, write: 83.7 MiB/s INFO: 9% (41.0 GiB of 450.0 GiB) in 8m 8s, read: 83.3 MiB/s, write: 83.3 MiB/s INFO: 10% (45.4 GiB of 450.0 GiB) in 9m 4s, read: 79.9 MiB/s, write: 79.9 MiB/s Expected: $VAR1 = { 'id' => '304921:158', 'arguments' => {}, 'execute' => 'qmp_capabilities' }; Received: $VAR1 = { 'error' => {}, 'return' => {}, 'id' => '837591:1' }; ERROR: VM 201 qmp command 'query-backup' failed - got wrong command id '837591:1' (expected 304921:158) INFO: aborting backup job INFO: resuming VM again ERROR: Backup of VM 201 failed - VM 201 qmp command 'query-backup' failed - got wrong command id '837591:1' (expected 304921:158) INFO: Failed at 2021-08-19 22:36:07 INFO: Backup job finished with errors TASK ERROR: job errors

journalctl:
https://pastebin.com/fj8U64CB
 
Same error occurring for me.

I backed up and did fresh installs of all nodes to proxmox 7 last week. All nodes are fullly updated. This issue has been plaguing me ever since the upgrade. Some days worse than others. I have to redo the backups and even have had to restart nodes to get a backup to work successfully. There is definitely some bug in this new version of qemu. On proxmox 6.x to backup server 1.x I never had this problem. Any updates/fixes would be much appreciated!

INFO: 64% (257.2 GiB of 400.0 GiB) in 31m 2s, read: 663.3 MiB/s, write: 2.0 MiB/s
ERROR: VM 9500 qmp command 'query-backup' failed - got wrong command id '2314:394661' (expected 1999705:1134)
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 9500 failed - VM 9500 qmp command 'query-backup' failed - got wrong command id '2314:394661' (expected 1999705:1134)
INFO: Failed at 2021-08-19 21:32:49

Other backups fail with this right at the beginning:

VM 9327 qmp command 'cont' failed - got timeout

When this happens the machine is often forced into read only storage and has to be rebooted, sometimes with the need to run fsck. This is making unattended backups worrisome and almost causes me to contemplate a downgrade to 6.x if there isn't a solution soon.

Thanks for your help!
 
Last edited:
When this happens the machine is often forced into read only storage and has to be rebooted, sometimes with the need to run fsck. This is making unattended backups worrisome and almost causes me to contemplate a downgrade to 6.x if there isn't a solution soon.
This issue happens also in PVE 6.x not only in PVE 7.x, we are working on it
 
I'm having the same issue, like @Rawthink only since upgrading to PVE 7.x

Backing up with vzdump to CIFS/NFS storage uncompressed

Code:
INFO: started backup task 'c38fde0f-2299-4146-a862-e76a9ecbd5ab'
INFO: resuming VM again after 6 seconds
INFO:  16% (3.3 GiB of 20.0 GiB) in 3s, read: 1.1 GiB/s, write: 316.7 MiB/s
INFO:  41% (8.4 GiB of 20.0 GiB) in 7s, read: 1.3 GiB/s, write: 149.9 MiB/s
INFO:  62% (12.4 GiB of 20.0 GiB) in 10s, read: 1.4 GiB/s, write: 50.9 MiB/s
INFO:  94% (18.8 GiB of 20.0 GiB) in 14s, read: 1.6 GiB/s, write: 49.2 MiB/s
ERROR: VM 1001 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 1001 qmp command 'backup-cancel' failed - unable to connect to VM 1001 qmp socket - timeout after 5975 retries
INFO: resuming VM again
ERROR: Backup of VM 1001 failed - VM 1001 qmp command 'cont' failed - unable to connect to VM 1001 qmp socket - timeout after 31 retries
INFO: Failed at 2021-08-21 01:16:30

Also another failed backup with another job:

Code:
INFO:  82% (32.9 GiB of 40.0 GiB) in 4m 24s, read: 97.9 MiB/s, write: 97.9 MiB/s
INFO:  83% (33.6 GiB of 40.0 GiB) in 4m 31s, read: 97.9 MiB/s, write: 97.6 MiB/s
ERROR: VM 100004 qmp command 'query-backup' failed - got wrong command id '5532:3234293' (expected 4021887:382)
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 100004 failed - VM 100004 qmp command 'query-backup' failed - got wrong command id '5532:3234293' (expected 4021887:382)
INFO: Failed at 2021-08-21 01:37:36

And with a single manual backup of an already stopped VM:

Code:
INFO:  14% (29.7 GiB of 210.0 GiB) in 7m 23s, read: 673.3 MiB/s, write: 622.3 MiB/s
INFO:  15% (31.7 GiB of 210.0 GiB) in 7m 27s, read: 489.4 MiB/s, write: 456.9 MiB/s
INFO:  16% (34.1 GiB of 210.0 GiB) in 8m 45s, read: 31.5 MiB/s, write: 31.3 MiB/s
INFO:  17% (36.2 GiB of 210.0 GiB) in 8m 48s, read: 727.3 MiB/s, write: 663.8 MiB/s
ERROR: VM 700500 qmp command 'query-backup' failed - got wrong command id '5532:3590173' (expected 2207068:1132)
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 700500 failed - VM 700500 qmp command 'query-backup' failed - got wrong command id '5532:3590173' (expected 2207068:1132)
INFO: Failed at 2021-08-21 16:29:10
INFO: Backup job finished with errors
TASK ERROR: job errors
 
@xbb, if you are running the latest versions (with 'qemu-server 7.0-13'), would it be possible for you to install the debug version and follow Stefan's previous instructions [1]? This would really help us to further understand the issue.

[1] https://forum.proxmox.com/threads/e...failed-got-wrong-command-id.88017/post-411861

Code:
INFO: starting new backup job: vzdump 700500 --storage nas-backups --compress 0 --node pve --mode stop --remove 0
INFO: Starting Backup of VM 700500 (qemu)
INFO: Backup started at 2021-08-23 12:26:39
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: Test
INFO: include disk 'scsi0' 'ssdrz:vm-700500-disk-1' 210G
INFO: include disk 'efidisk0' 'ssdrz:vm-700500-disk-0' 1M
INFO: creating vzdump archive '/mnt/pve/nas-backups/dump/vzdump-qemu-700500-2021_08_23-12_26_39.vma'
INFO: starting kvm to execute backup task
INFO: started backup task 'ac701cf3-65f9-4c7d-a2e4-b246e3f4b9f6'
INFO:   1% (2.8 GiB of 210.0 GiB) in 4s, read: 705.2 MiB/s, write: 632.8 MiB/s
INFO:   2% (4.3 GiB of 210.0 GiB) in 7s, read: 516.7 MiB/s, write: 504.2 MiB/s
INFO:   3% (6.6 GiB of 210.0 GiB) in 26s, read: 126.5 MiB/s, write: 122.5 MiB/s
INFO:   4% (8.9 GiB of 210.0 GiB) in 49s, read: 102.8 MiB/s, write: 100.9 MiB/s
INFO:   5% (10.9 GiB of 210.0 GiB) in 1m 8s, read: 106.3 MiB/s, write: 104.6 MiB/s
INFO:   6% (12.8 GiB of 210.0 GiB) in 1m 27s, read: 100.0 MiB/s, write: 97.9 MiB/s
INFO:   7% (15.2 GiB of 210.0 GiB) in 1m 52s, read: 100.5 MiB/s, write: 99.6 MiB/s
INFO:   8% (17.1 GiB of 210.0 GiB) in 2m 11s, read: 101.3 MiB/s, write: 99.7 MiB/s
INFO:   9% (19.1 GiB of 210.0 GiB) in 2m 32s, read: 98.6 MiB/s, write: 97.6 MiB/s
INFO:  10% (21.0 GiB of 210.0 GiB) in 2m 51s, read: 103.4 MiB/s, write: 102.4 MiB/s
INFO:  11% (23.2 GiB of 210.0 GiB) in 3m 13s, read: 99.5 MiB/s, write: 97.8 MiB/s
INFO:  12% (25.7 GiB of 210.0 GiB) in 3m 39s, read: 101.6 MiB/s, write: 99.2 MiB/s
INFO:  13% (27.4 GiB of 210.0 GiB) in 3m 56s, read: 101.9 MiB/s, write: 96.4 MiB/s
INFO:  14% (29.6 GiB of 210.0 GiB) in 4m 16s, read: 110.4 MiB/s, write: 102.2 MiB/s
INFO:  15% (31.9 GiB of 210.0 GiB) in 4m 39s, read: 105.0 MiB/s, write: 98.7 MiB/s
INFO:  16% (33.9 GiB of 210.0 GiB) in 4m 59s, read: 102.2 MiB/s, write: 101.9 MiB/s
Expected: $VAR1 = {
          'arguments' => {},
          'id' => '3999187:114',
          'execute' => 'qmp_capabilities'
        };

Received: $VAR1 = {
          'return' => {},
          'error' => {},
          'id' => '5532:3410'
        };


#### LONG PAUSE HERE: it seems the backup was still working in the background and writing at ~100MB/s to the end ####


ERROR: VM 700500 qmp command 'query-backup' failed - got wrong command id '5532:3410' (expected 3999187:114)
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 700500 failed - VM 700500 qmp command 'query-backup' failed - got wrong command id '5532:3410' (expected 3999187:114)
INFO: Failed at 2021-08-23 12:46:44
INFO: Backup job finished with errors
TASK ERROR: job errors

Another try

Code:
INFO:  19% (40.4 GiB of 210.0 GiB) in 6m 3s, read: 104.9 MiB/s, write: 101.2 MiB/s
INFO:  20% (42.4 GiB of 210.0 GiB) in 6m 22s, read: 104.0 MiB/s, write: 101.4 MiB/s
Expected: $VAR1 = {
          'id' => '1311189:155',
          'execute' => 'qmp_capabilities',
          'arguments' => {}
        };

Received: $VAR1 = {
          'id' => '5532:27666',
          'error' => {},
          'return' => {}
        };
...

Also, I have successfully backed up the VM three times in a row so far when using compression (LZO), so the error seems happening only when doing uncompressed backups in my case.
 
  • Like
Reactions: xbb
So, the update to the pve kernel to 5.11.22-3-pve seems as well as updates to proxmox-backup-server and proxmox-backup-client yesterday have improved things.

I have been able to get through VM backups mostly without errors.

However, on a server 2019, the following error is happening every time since the update to Proxmox 7.

INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 6801 qmp command 'guest-fsfreeze-thaw' failed - got timeout

The backup finishes normally and the VM stays running, but about 1 in 4 times in kills the qemu guest agent and the service can't be killed, so the machine has to be rebooted to get the guest agent back.

EDIT
Also, got this on one VM on a job just now. So, some improvement in frequency of occurrence it seems, but not fixed.

INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '18a4b50c-a919-4a2b-a97d-b1106bee410b'
INFO: resuming VM again
ERROR: VM 9350 qmp command 'cont' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 9350 failed - VM 9350 qmp command 'cont' failed - got timeout
 
Last edited:
The update to 5.11.22-3-pve also seems to have improved things for me. The one backup that always used to fail completed for the first time.
 
I am continuing to have tons of errors with backups. Despite my earlier report I can't say there's any improvement at this point.

Below are two VMS that failed just today. It actually broke both centOS VMs, which jumped to huge CPU usage and required a stop and reboot.

It's also often causing the Qemu guest agent to fail on 2016 and 2019 windows servers requiring reboots.

9350: 2021-08-28 21:00:02 INFO: issuing guest-agent 'fs-freeze' command
9350: 2021-08-28 21:00:04 INFO: issuing guest-agent 'fs-thaw' command
9350: 2021-08-28 21:00:14 ERROR: VM 9350 qmp command 'guest-fsfreeze-thaw' failed - got timeout
9350: 2021-08-28 21:00:14 INFO: started backup task '446bc502-d219-4d42-ae0c-db1404d1c9e9'
9350: 2021-08-28 21:00:14 INFO: resuming VM again
9350: 2021-08-28 21:00:17 ERROR: VM 9350 qmp command 'cont' failed - unable to connect to VM 9350 qmp socket - timeout after 31 retries
9350: 2021-08-28 21:00:17 INFO: aborting backup job
9350: 2021-08-28 21:00:24 INFO: resuming VM again
9350: 2021-08-28 21:00:24 ERROR: Backup of VM 9350 failed - VM 9350 qmp command 'cont' failed - unable to connect to VM 9350 qmp socket - timeout after 31 retries

9701: 2021-08-28 21:00:24 INFO: issuing guest-agent 'fs-freeze' command
9701: 2021-08-28 21:00:31 INFO: issuing guest-agent 'fs-thaw' command
9701: 2021-08-28 21:00:41 ERROR: VM 9701 qmp command 'guest-fsfreeze-thaw' failed - got timeout
9701: 2021-08-28 21:00:41 INFO: started backup task '22840474-db54-4b8e-a840-f2de1856db96'
9701: 2021-08-28 21:00:41 INFO: resuming VM again
9701: 2021-08-28 21:00:44 ERROR: VM 9701 qmp command 'cont' failed - got timeout
9701: 2021-08-28 21:00:44 INFO: aborting backup job
9701: 2021-08-28 21:00:45 INFO: resuming VM again
9701: 2021-08-28 21:00:45 ERROR: Backup of VM 9701 failed - VM 9701 qmp command 'cont' failed - got timeout

1630202645146.png
 
Last edited:
I am continuing to have tons of errors with backups. Despite my earlier report I can't say there's any improvement at this point.

Below are two VMS that failed just today. It actually broke both centOS VMs, which jumped to huge CPU usage and required a stop and reboot.

It's also often causing the Qemu guest agent to fail on 2016 and 2019 windows servers requiring reboots.
The logs you posted do not contain the "got wrong command id" error, which makes me think this is a different issue than the one discussed in this thread. It appears as though QEMU starts to hang at the guest agent interaction, does this error also happen with VMs that do not use the agent? How is the load situation on that machine?
 
@Stefan_R I believe you are right, which is why I noticed some improvement, because that error was fixed, while this one has not been.

Load around 2-6 at the time of the errors... The server is under very light utiliziation and these are small incremental backups. It can occur on virtual machines with or without the qemu agent. Sometimes the backups finish with no issue, other times they fail. Again, this is new to Proxmox 7 and did not occur in 6.x. One other thing, this is occurring on multiple proxmox hosts, backing up to multiple PMX backup servers, so its definitely something specific to Proxmox 7.

As you can see in this example, the vms are backing up quickly, some just fail with one of these qmp errors.

FAILED
0:00:17​
VM 9327 qmp command 'cont' failed - got timeout
OK
0:05:52​
400.00GBvm/9500/2021-08-30T12:45:14Z
OK
0:00:47​
1.46TBvm/9650/2021-08-30T12:51:06Z
OK
0:01:43​
400.00GBvm/9704/2021-08-30T12:51:53Z
OK
0:05:27​
140.00GBvm/13451/2021-08-30T12:53:36Z
OK
0:00:04​
32.00GBvm/13501/2021-08-30T12:59:03Z
OK
0:00:07​
32.00GBvm/13502/2021-08-30T12:59:07Z

Also note, that this did not break VM 9327 like the error did yesterday.

Instead, when I just retried 9327, I got this:

INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'f6c9919a-cf14-4d98-9cda-e86ed7a2d4d0'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: existing bitmap was invalid and has been cleared

This often happens after this error and the bitmap has to be rebuilt.
 
Last edited:
Just got the following on a backup and the CENTOS VM got stuck:

INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '419428ab-7767-4fa4-ae1e-27bd5a539bcf'
INFO: resuming VM again
ERROR: VM 9327 qmp command 'query-pbs-bitmap-info' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 9327 failed - VM 9327 qmp command 'query-pbs-bitmap-info' failed - got timeout

1630418959641.png


And as requested:
root@v2:~# VM_PID=189159
root@v2:~# gdb attach $VM_PID -ex='thread apply all bt' -ex='quit'
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 189159
[New LWP 189160]
[New LWP 189209]
[New LWP 189210]
[New LWP 189211]
[New LWP 189212]
[New LWP 189215]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f20f6bde4f6 in __ppoll (fds=0x55ec816a6710, nfds=78, timeout=<optimized out>, timeout@entry=0x7ffd7a22d660,
sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 7 (Thread 0x7f2076dff700 (LWP 189215) "kvm"):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55ec81cd9e8c) at ../sysdeps/nptl/futex-internal.h:186
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55ec81cd9e98, cond=0x55ec81cd9e60) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0x55ec81cd9e60, mutex=mutex@entry=0x55ec81cd9e98) at pthread_cond_wait.c:638
#3 0x000055ec8030f44b in qemu_cond_wait_impl (cond=0x55ec81cd9e60, mutex=0x55ec81cd9e98, file=0x55ec804256e1 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:174
#4 0x000055ec7ffb71e3 in vnc_worker_thread_loop (queue=0x55ec81cd9e60) at ../ui/vnc-jobs.c:248
#5 0x000055ec7ffb7ea8 in vnc_worker_thread (arg=arg@entry=0x55ec81cd9e60) at ../ui/vnc-jobs.c:361
#6 0x000055ec8030ed09 in qemu_thread_start (args=0x7f2076dfa3f0) at ../util/qemu-thread-posix.c:521
#7 0x00007f20f6cb8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007f20f6be8def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f20e9abc700 (LWP 189212) "kvm"):
#0 0x00007f20f6bdfcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055ec801bcc87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55ec8108ebc0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x000055ec801bcdf1 in kvm_cpu_exec (cpu=cpu@entry=0x55ec8108ebc0) at ../accel/kvm/kvm-all.c:2467
#3 0x000055ec800fe2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x55ec8108ebc0) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x000055ec8030ed09 in qemu_thread_start (args=0x7f20e9ab73f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f20f6cb8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f20f6be8def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f20ea2bd700 (LWP 189211) "kvm"):
#0 0x00007f20f6bdfcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055ec801bcc87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55ec810810f0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x000055ec801bcdf1 in kvm_cpu_exec (cpu=cpu@entry=0x55ec810810f0) at ../accel/kvm/kvm-all.c:2467
#3 0x000055ec800fe2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x55ec810810f0) at ../accel/kvm/kvm-accel-ops.c:49
--Type <RET> for more, q to quit, c to continue without paging--
#4 0x000055ec8030ed09 in qemu_thread_start (args=0x7f20ea2b83f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f20f6cb8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f20f6be8def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f20eaabe700 (LWP 189210) "kvm"):
#0 0x00007f20f6bdfcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055ec801bcc87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55ec81072d40, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x000055ec801bcdf1 in kvm_cpu_exec (cpu=cpu@entry=0x55ec81072d40) at ../accel/kvm/kvm-all.c:2467
#3 0x000055ec800fe2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x55ec81072d40) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x000055ec8030ed09 in qemu_thread_start (args=0x7f20eaab93f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f20f6cb8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f20f6be8def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f20ebdc2700 (LWP 189209) "kvm"):
#0 0x00007f20f6bdfcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055ec801bcc87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55ec8103ce30, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x000055ec801bcdf1 in kvm_cpu_exec (cpu=cpu@entry=0x55ec8103ce30) at ../accel/kvm/kvm-all.c:2467
#3 0x000055ec800fe2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x55ec8103ce30) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x000055ec8030ed09 in qemu_thread_start (args=0x7f20ebdbd3f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f20f6cb8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f20f6be8def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f20ec6c4700 (LWP 189160) "kvm"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x000055ec8030fc9a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu-kvm-eK5ihW/pve-qemu-kvm-6.0.0/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x55ec808a4208 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
#3 0x000055ec8033a84a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:258
#4 0x000055ec8030ed09 in qemu_thread_start (args=0x7f20ec6bf3f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f20f6cb8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f20f6be8def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f20ec8261c0 (LWP 189159) "kvm"):
#0 0x00007f20f6bde4f6 in __ppoll (fds=0x55ec816a6710, nfds=78, timeout=<optimized out>, timeout@entry=0x7ffd7a22d660, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x000055ec80328b81 in ppoll (__ss=0x0, __timeout=0x7ffd7a22d660, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=529870355) at ../util/qemu-timer.c:348
#3 0x000055ec80309d25 in os_host_main_loop_wait (timeout=529870355) at ../util/main-loop.c:249
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
#5 0x000055ec80183361 in qemu_main_loop () at ../softmmu/runstate.c:725
#6 0x000055ec7fe2cc0e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50
A debugging session is active.

Inferior 1 [process 189159] will be detached.
 
If the VM gets stuck completely again, could you try to capture a gdb trace as described here: https://forum.proxmox.com/threads/all-vms-locking-up-after-latest-pve-update.85397/post-377434 ?
Also, this occurred to one VM. It was stuck with 1 core at 100% since a backup at 19:00 last night and when I ran another backup set this morning, it caused the machine to drop to normal utilization and it never totally froze. When the machines freeze, it's almost always 1 core being stuck at full utilization as well.

Another machine also got "qmp command 'cont' failed - got timeout", but did not freeze that does not have the guest agent.

1630498668867.png
 
Last edited:
If the VM gets stuck completely again, could you try to capture a gdb trace as described here: https://forum.proxmox.com/threads/all-vms-locking-up-after-latest-pve-update.85397/post-377434 ?
This just caused an Ubuntu 20.04.03 Server to go to read only access. It hasn't caused a CPU spike in this case. The machine will have to be rebooted and fsck will have to be manually performed to get it running again..

Also, the servers were freshly upgraded to 5.11.22-4-pve and this particular VM was actually rebooted just before the backup was attempted. Everything is stable except for backups, and to repeat, this all started after the nodes were freshly installed with Proxmox 7. This was never an issue with 6.x on the same hardware...

INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 9650 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task '750420db-5963-46df-86a7-7681945e95d2'
INFO: resuming VM again
ERROR: VM 9650 qmp command 'cont' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 9650 failed - VM 9650 qmp command 'cont' failed - got timeout

1630588038093.png

Below is the trace:
root@v2:~# VM_PID=3697
root@v2:~# gdb attach $VM_PID -ex='thread apply all bt' -ex='quit'
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 3697
[New LWP 3698]
[New LWP 3731]
[New LWP 3732]
[New LWP 3733]
[New LWP 3734]
[New LWP 3738]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007facfc30e4f6 in __ppoll (fds=0x561fd1049360, nfds=78, timeout=<optimized out>, timeout@entry=0x7ffda66bacb0,
sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 7 (Thread 0x7fab8edff700 (LWP 3738) "kvm"):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x561fd108a438) at ../sysdeps/nptl/futex-internal.h:186
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x561fd108a448, cond=0x561fd108a410) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0x561fd108a410, mutex=mutex@entry=0x561fd108a448) at pthread_cond_wait.c:638
#3 0x0000561fcf90444b in qemu_cond_wait_impl (cond=0x561fd108a410, mutex=0x561fd108a448, file=0x561fcfa1a6e1 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:174
#4 0x0000561fcf5ac1e3 in vnc_worker_thread_loop (queue=0x561fd108a410) at ../ui/vnc-jobs.c:248
#5 0x0000561fcf5acea8 in vnc_worker_thread (arg=arg@entry=0x561fd108a410) at ../ui/vnc-jobs.c:361
#6 0x0000561fcf903d09 in qemu_thread_start (args=0x7fab8edfa3f0) at ../util/qemu-thread-posix.c:521
#7 0x00007facfc3e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007facfc318def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7fab955fe700 (LWP 3734) "kvm"):
#0 0x00007facfc30fcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000561fcf7b1c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x561fd1065bc0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000561fcf7b1df1 in kvm_cpu_exec (cpu=cpu@entry=0x561fd1065bc0) at ../accel/kvm/kvm-all.c:2467
#3 0x0000561fcf6f32dd in kvm_vcpu_thread_fn (arg=arg@entry=0x561fd1065bc0) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x0000561fcf903d09 in qemu_thread_start (args=0x7fab955f93f0) at ../util/qemu-thread-posix.c:521
#5 0x00007facfc3e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007facfc318def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7fab95dff700 (LWP 3733) "kvm"):
#0 0x00007facfc30fcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000561fcf7b1c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x561fd10580f0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000561fcf7b1df1 in kvm_cpu_exec (cpu=cpu@entry=0x561fd10580f0) at ../accel/kvm/kvm-all.c:2467
#3 0x0000561fcf6f32dd in kvm_vcpu_thread_fn (arg=arg@entry=0x561fd10580f0) at ../accel/kvm/kvm-accel-ops.c:49
--Type <RET> for more, q to quit, c to continue without paging--c
#4 0x0000561fcf903d09 in qemu_thread_start (args=0x7fab95dfa3f0) at ../util/qemu-thread-posix.c:521
#5 0x00007facfc3e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007facfc318def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7facf0ce9700 (LWP 3732) "kvm"):
#0 0x00007facfc30fcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000561fcf7b1c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x561fd1049920, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000561fcf7b1df1 in kvm_cpu_exec (cpu=cpu@entry=0x561fd1049920) at ../accel/kvm/kvm-all.c:2467
#3 0x0000561fcf6f32dd in kvm_vcpu_thread_fn (arg=arg@entry=0x561fd1049920) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x0000561fcf903d09 in qemu_thread_start (args=0x7facf0ce43f0) at ../util/qemu-thread-posix.c:521
#5 0x00007facfc3e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007facfc318def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7facf14ea700 (LWP 3731) "kvm"):
#0 0x00007facfc30fcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000561fcf7b1c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x561fd10121b0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000561fcf7b1df1 in kvm_cpu_exec (cpu=cpu@entry=0x561fd10121b0) at ../accel/kvm/kvm-all.c:2467
#3 0x0000561fcf6f32dd in kvm_vcpu_thread_fn (arg=arg@entry=0x561fd10121b0) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x0000561fcf903d09 in qemu_thread_start (args=0x7facf14e53f0) at ../util/qemu-thread-posix.c:521
#5 0x00007facfc3e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007facfc318def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7facf1dec700 (LWP 3698) "kvm"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x0000561fcf904c9a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu-kvm-eK5ihW/pve-qemu-kvm-6.0.0/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x561fcfe99208 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
#3 0x0000561fcf92f84a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:258
#4 0x0000561fcf903d09 in qemu_thread_start (args=0x7facf1de73f0) at ../util/qemu-thread-posix.c:521
#5 0x00007facfc3e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007facfc318def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7facf1f561c0 (LWP 3697) "kvm"):
#0 0x00007facfc30e4f6 in __ppoll (fds=0x561fd1049360, nfds=78, timeout=<optimized out>, timeout@entry=0x7ffda66bacb0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x0000561fcf91db81 in ppoll (__ss=0x0, __timeout=0x7ffda66bacb0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=993367) at ../util/qemu-timer.c:348
#3 0x0000561fcf8fed25 in os_host_main_loop_wait (timeout=993367) at ../util/main-loop.c:249
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
#5 0x0000561fcf778361 in qemu_main_loop () at ../softmmu/runstate.c:725
#6 0x0000561fcf421c0e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50
A debugging session is active.

Inferior 1 [process 3697] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/qemu-system-x86_64, process 3697
[Inferior 1 (process 3697) detached]
 
Last edited:
Here's one more from a server 2016 that isn't stuck, but is unusuable..

1630591500363.png

root@v2:~# VM_PID=21653
root@v2:~# gdb attach $VM_PID -ex='thread apply all bt' -ex='quit'
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 21653
[New LWP 21654]
[New LWP 21683]
[New LWP 21684]
[New LWP 21685]
[New LWP 21686]
[New LWP 21688]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f69b89cc4f6 in __ppoll (fds=0x560b72523630, nfds=79, timeout=<optimized out>, timeout@entry=0x7ffe305c1180,
sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 7 (Thread 0x7f68669ff700 (LWP 21688) "kvm"):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x560b724dcb1c) at ../sysdeps/nptl/futex-internal.h:186
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x560b724dcb28, cond=0x560b724dcaf0) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0x560b724dcaf0, mutex=mutex@entry=0x560b724dcb28) at pthread_cond_wait.c:638
#3 0x0000560b718fb44b in qemu_cond_wait_impl (cond=0x560b724dcaf0, mutex=0x560b724dcb28, file=0x560b71a116e1 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:174
#4 0x0000560b715a31e3 in vnc_worker_thread_loop (queue=0x560b724dcaf0) at ../ui/vnc-jobs.c:248
#5 0x0000560b715a3ea8 in vnc_worker_thread (arg=arg@entry=0x560b724dcaf0) at ../ui/vnc-jobs.c:361
#6 0x0000560b718fad09 in qemu_thread_start (args=0x7f68669fa3f0) at ../util/qemu-thread-posix.c:521
#7 0x00007f69b8aa6ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007f69b89d6def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f69a63ff700 (LWP 21686) "kvm"):
#0 0x00007f69b89cdcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000560b717a8c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x560b72514df0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000560b717a8df1 in kvm_cpu_exec (cpu=cpu@entry=0x560b72514df0) at ../accel/kvm/kvm-all.c:2467
#3 0x0000560b716ea2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x560b72514df0) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x0000560b718fad09 in qemu_thread_start (args=0x7f69a63fa3f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f69b8aa6ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f69b89d6def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f69a6fff700 (LWP 21685) "kvm"):
#0 0x00007f69b89cdcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000560b717a8c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x560b72507340, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000560b717a8df1 in kvm_cpu_exec (cpu=cpu@entry=0x560b72507340) at ../accel/kvm/kvm-all.c:2467
#3 0x0000560b716ea2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x560b72507340) at ../accel/kvm/kvm-accel-ops.c:49
--Type <RET> for more, q to quit, c to continue without paging--
#4 0x0000560b718fad09 in qemu_thread_start (args=0x7f69a6ffa3f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f69b8aa6ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f69b89d6def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f69a7bff700 (LWP 21684) "kvm"):
#0 0x00007f69b89cdcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000560b717a8c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x560b724f91d0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000560b717a8df1 in kvm_cpu_exec (cpu=cpu@entry=0x560b724f91d0) at ../accel/kvm/kvm-all.c:2467
#3 0x0000560b716ea2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x560b724f91d0) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x0000560b718fad09 in qemu_thread_start (args=0x7f69a7bfa3f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f69b8aa6ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f69b89d6def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f69adba8700 (LWP 21683) "kvm"):
#0 0x00007f69b89cdcc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x0000560b717a8c87 in kvm_vcpu_ioctl (cpu=cpu@entry=0x560b724c37f0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2 0x0000560b717a8df1 in kvm_cpu_exec (cpu=cpu@entry=0x560b724c37f0) at ../accel/kvm/kvm-all.c:2467
#3 0x0000560b716ea2dd in kvm_vcpu_thread_fn (arg=arg@entry=0x560b724c37f0) at ../accel/kvm/kvm-accel-ops.c:49
#4 0x0000560b718fad09 in qemu_thread_start (args=0x7f69adba33f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f69b8aa6ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f69b89d6def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f69ae4aa700 (LWP 21654) "kvm"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x0000560b718fbc9a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu-kvm-eK5ihW/pve-qemu-kvm-6.0.0/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x560b71e90208 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
#3 0x0000560b7192684a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:258
#4 0x0000560b718fad09 in qemu_thread_start (args=0x7f69ae4a53f0) at ../util/qemu-thread-posix.c:521
#5 0x00007f69b8aa6ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f69b89d6def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f69ae6141c0 (LWP 21653) "kvm"):
#0 0x00007f69b89cc4f6 in __ppoll (fds=0x560b72523630, nfds=79, timeout=<optimized out>, timeout@entry=0x7ffe305c1180, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x0000560b71914b81 in ppoll (__ss=0x0, __timeout=0x7ffe305c1180, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=61410485) at ../util/qemu-timer.c:348
#3 0x0000560b718f5d25 in os_host_main_loop_wait (timeout=61410485) at ../util/main-loop.c:249
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
#5 0x0000560b7176f361 in qemu_main_loop () at ../softmmu/runstate.c:725
#6 0x0000560b71418c0e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50
A debugging session is active.

Inferior 1 [process 21653] will be detached.

Quit anyway? (y or n)
Please answer y or n.
A debugging session is active.

Inferior 1 [process 21653] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/qemu-system-x86_64, process 21653
[Inferior 1 (process 21653) detached]
 
Sorry for the delay on this issue. We're starting to get more reports of it now, and are working on it, but so far have been unable to reproduce the issue locally.

We've now put up a debug build for 'qemu-server'. It would be very helpful if you could install the packages and, if the problem appears again, post the full Task Log as well as 'journalctl -e' output from the time it happened.

http://download.proxmox.com/temp/bad-command-id-debug-output/
Code:
sha256sum:
c7d41f080a80d3c313a02bb758deaebe69d05349219dd473e3a96186f317a772  qemu-server_7.0-13+debug_amd64.deb
a5c475dd6dce3c60bbc80da0c0624443a112e12b01fdd37e73098d249a6d4e4a  qemu-server-dbgsym_7.0-13+debug_amd64.deb

The package is a regular 'qemu-server 7.0-13' with added logging around the error.

Thanks!
Hi, I have the same problem with one of my WMs.
But after some testing, it seems that it is only when i using compressed ZSTD

ERROR: VM 201 qmp command 'query-backup' failed - got wrong command id '853670:484' (expected 854836:776)
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 201 failed - VM 201 qmp command 'query-backup' failed - got wrong command id '853670:484' (expected 854836:776)
INFO: Failed at 2021-09-04 17:50:12
INFO: Backup job finished with errors

Best regards,
Michael
 
Last edited:
Hi,
Same problem here, fresh install of PVE and PBS and full uptodate
I dont have the problem on other hardware and cluster, any idea ?
ubuntun/nextcloud update to date with qemu agent on

fail after 31 or 34%, random ?

INFO: 31% (527.6 GiB of 1.7 TiB) in 2h 3m 37s, read: 153.9 MiB/s, write: 0 B/s
ERROR: VM 100 qmp command 'query-backup' failed - got wrong command id '947:18195' (expected 52484:16389)
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 100 failed - VM 100 qmp command 'query-backup' failed - got wrong command id '947:18195' (expected 52484:16389)
INFO: Failed at 2021-09-09 02:22:38
INFO: Backup job finished with errors
TASK ERROR: job errors

INFO: 34% (578.4 GiB of 1.7 TiB) in 2h 9m 17s, read: 155.9 MiB/s, write: 0 B/s
ERROR: VM 100 qmp command 'query-backup' failed - got wrong command id '947:24166' (expected 71105:14275)
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 100 failed - VM 100 qmp command 'query-backup' failed - got wrong command id '947:24166' (expected 71105:14275)
INFO: Failed at 2021-09-09 04:32:29
INFO: Backup job finished with errors
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!