storage migration failed: block job (mirror) error

Jul 11, 2020
10
2
3
Hello,

We are getting a lot of these errors when we try to move vm discs between nodes or from NFS storage to local storage:

Logical volume "vm-305-disk-0" successfully removed
TASK ERROR: storage migration failed: block job (mirror) error: VM 305 qmp command 'query-block-jobs' failed - got wrong command id


We have a cluster with 18 nodes all with SSD for local storage and NFS for shared storage. 2x10gbit link in all nodes and 4x10 for NFS.
This start to happen in any node after we start upgrading from PVE 6.2 to version 6.4

Vzdumps backup is working fine with no problem at all.

Does anyone have face this problem?

Kind Regards,


Syslog during disc migration:
Jun 20 15:36:17 cloud09 pvedaemon[37143]: <xxx@pve> move disk VM 305: move --disk scsi0 --storage storage-cloud09
Jun 20 15:36:17 cloud09 pvedaemon[37143]: <xxx@pve> starting task UPID:cloud09:00001D32:032A3AC8:60CF5261:qmmove:305:xxx@pve:
Jun 20 15:36:24 cloud09 pvedaemon[37143]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:32 cloud09 pvestatd[2481]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:34 cloud09 pvestatd[2481]: status update time (7.637 seconds)
Jun 20 15:36:42 cloud09 pvestatd[2481]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:43 cloud09 pvestatd[2481]: status update time (7.586 seconds)
Jun 20 15:36:44 cloud09 pvedaemon[37143]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:52 cloud09 pvestatd[2481]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:53 cloud09 pvedaemon[7474]: VM 305 qmp command failed - VM 305 qmp command 'query-block-jobs' failed - got wrong command id '2481:352584' (expected 7474:2559)
Jun 20 15:36:54 cloud09 pvestatd[2481]: status update time (7.600 seconds)
Jun 20 15:36:59 cloud09 pvedaemon[7474]: storage migration failed: block job (mirror) error: VM 305 qmp command 'query-block-jobs' failed - got wrong command id '2481:352584' (expected 7474:2559)
Jun 20 15:36:59 cloud09 pvedaemon[37143]: <xxx@pve> end task UPID:cloud09:00001D32:032A3AC8:60CF5261:qmmove:305:xxx@pve: storage migration failed: block job (mirror) error: VM 305 qmp command 'qu
ery-block-jobs' failed - got wrong command id '2481:352584' (expected 7474:2559)



PVE Manager Version
pve-manager/6.4-8/185e14db

packet versions:

proxmox-ve: 6.4-1 (running kernel: 5.4.114-1-pve) pve-manager: 6.4-8 (running version: 6.4-8/185e14db) pve-kernel-5.4: 6.4-2 pve-kernel-helper: 6.4-2 pve-kernel-5.4.114-1-pve: 5.4.114-1 pve-kernel-5.4.34-1-pve: 5.4.34-2 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.4-1 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-3 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.1.8-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.5-6 pve-cluster: 6.4-1 pve-container: 3.3-5 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.2-4 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.4-pve1


vm status --verbose

qm status 305 --verbose
balloon: 34359738368 ballooninfo: actual: 34359738368 free_mem: 3080478720 last_update: 1624194848 major_page_faults: 75094 max_mem: 34359738368 mem_swapped_in: 45056 mem_swapped_out: 163840 minor_page_faults: 1295835297 total_mem: 33566240768 blockstat: ide2: account_failed: 0 account_invalid: 0 failed_flush_operations: 0 failed_rd_operations: 0 failed_unmap_operations: 0 failed_wr_operations: 0 flush_operations: 0 flush_total_time_ns: 0 idle_time_ns: 2307235711896 invalid_flush_operations: 0 invalid_rd_operations: 0 invalid_unmap_operations: 0 invalid_wr_operations: 0 rd_bytes: 1122 rd_merged: 0 rd_operations: 23 rd_total_time_ns: 16647859228 timed_stats: unmap_bytes: 0 unmap_merged: 0 unmap_operations: 0 unmap_total_time_ns: 0 wr_bytes: 0 wr_highest_offset: 0 wr_merged: 0 wr_operations: 0 wr_total_time_ns: 0 scsi0: account_failed: 1 account_invalid: 1 failed_flush_operations: 0 failed_rd_operations: 0 failed_unmap_operations: 0 failed_wr_operations: 0 flush_operations: 513441 flush_total_time_ns: 282256053153 idle_time_ns: 162519128 invalid_flush_operations: 0 invalid_rd_operations: 0 invalid_unmap_operations: 0 invalid_wr_operations: 0 rd_bytes: 21010677248 rd_merged: 0 rd_operations: 857810 rd_total_time_ns: 763615449425 timed_stats: unmap_bytes: 0 unmap_merged: 0 unmap_operations: 0 unmap_total_time_ns: 0 wr_bytes: 30570020864 wr_highest_offset: 749906018304 wr_merged: 0 wr_operations: 2677513 wr_total_time_ns: 17310025635816 cpus: 8 disk: 0 diskread: 21010678370 diskwrite: 30570020864 freemem: 3080478720 maxdisk: 751619276800 maxmem: 34359738368 mem: 30485762048 name: xxxxx netin: 951061967 netout: 9943821373 nics: tap305i0: netin: 862358015 netout: 6560856795 tap305i1: netin: 88703952 netout: 3382964578 pid: 16976 proxmox-support: pbs-dirty-bitmap: 1 pbs-dirty-bitmap-migration: 1 pbs-dirty-bitmap-savevm: 1 pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea) pbs-masterkey: 1 query-bitmap-info: 1 qmpstatus: running running-machine: pc-i440fx-5.2+pve0 running-qemu: 5.2.0 status: running template: uptime: 48443 vmid: 305
 
Last edited:
  • Like
Reactions: kidalabama
We have done more tests and we can confirm that this issue only happens when we try to move a vm disk from NFS storage to local storage on nodes with 6.3 and 6.4 pve.
 
Hi,
i've the same problem now....
From NFS to LOCAL ZFS i obtain this error:

drive mirror is starting for drive-scsi2
drive-scsi2: Cancelling block job
drive-scsi2: Done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-scsi2: 'mirror' has been cancelled


i'v migrated machines until friday.. i don't understand error..
 
1. If run (edit path)
$ ls /nfs/images/vmid/Image.format
Before migration, migration works?

2. What is local-storage backend? Ext4/zfs?
 
Hi, no one with this problem? it's possible that is only us?

At this moment is almost impossible to migrate vms between nodes (local storage) or vms disks from NFS storage to local storage.

All with this error:
TASK ERROR: storage migration failed: block job (mirror) error: VM 265 qmp command 'query-block-jobs' failed - got wrong command id '2738:6778596' (expected 43352:1447)

Node version, all nodes with same version:

Code:
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1
 
Virtual Environment 7.0-11

Move disk NFS to NFS 4Tb
drive-scsi0: transferred 1.1 TiB of 4.0 TiB (26.75%) in 3h 38m 50s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: VM 241 qmp command 'query-block-jobs' failed - got wrong command id '1448:303790072' (expected 1594521:5263)
 
Last edited:
6.4.14 while trying to move qcow2 from NFS storage to local storage.

Code:
drive-scsi1: transferred 44.5 GiB of 300.0 GiB (14.82%) in 31m 21s
drive-scsi1: transferred 44.5 GiB of 300.0 GiB (14.83%) in 31m 22s
drive-scsi1: transferred 44.5 GiB of 300.0 GiB (14.84%) in 31m 24s
drive-scsi1: Cancelling block job
drive-scsi1: Done.
TASK ERROR: storage migration failed: block job (mirror) error: VM 198 qmp command 'query-block-jobs' failed - got wrong command id '3422341:778' (expected 3417768:2875)
 
I am also facing this today trying to move VMs from one host to another (local to LVM live migration) when this was never the case before.

on the output I see the following
Code:
2022-05-04 10:05:11 found local disk 'local:207/vm-207-disk-0.qcow2' (in current VM config)
2022-05-04 10:05:11 starting VM 207 on remote node 'chpve01'
2022-05-04 10:05:13 volume 'local:207/vm-207-disk-0.qcow2' is 'vmstor1:vm-207-disk-0' on the target
2022-05-04 10:05:13 start remote tunnel
2022-05-04 10:05:14 ssh tunnel ver 1
2022-05-04 10:05:14 starting storage migration
2022-05-04 10:05:14 sata0: start migration to nbd:unix:/run/qemu-server/207_nbd.migrate:exportname=drive-sata0
drive mirror is starting for drive-sata0
drive-sata0: Cancelling block job
drive-sata0: Done.
2022-05-04 10:05:14 ERROR: online migrate failure - block job (mirror) error: drive-sata0: 'mirror' has been cancelled
2022-05-04 10:05:14 aborting phase 2 - cleanup resources
2022-05-04 10:05:14 migrate_cancel
2022-05-04 10:05:16 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

I've checked machine integrity and all and everything seems to be fine. Migration from chpve01 to chpve02 (the one I'm trying to migrate from...) works flawlessly.

Any suggestions? Last thing I remember doing before this had an impact was an update on packages installed on all hosts.
 
I am also facing this today trying to move VMs from one host to another (local to LVM live migration) when this was never the case before.

on the output I see the following
Code:
2022-05-04 10:05:11 found local disk 'local:207/vm-207-disk-0.qcow2' (in current VM config)
2022-05-04 10:05:11 starting VM 207 on remote node 'chpve01'
2022-05-04 10:05:13 volume 'local:207/vm-207-disk-0.qcow2' is 'vmstor1:vm-207-disk-0' on the target
2022-05-04 10:05:13 start remote tunnel
2022-05-04 10:05:14 ssh tunnel ver 1
2022-05-04 10:05:14 starting storage migration
2022-05-04 10:05:14 sata0: start migration to nbd:unix:/run/qemu-server/207_nbd.migrate:exportname=drive-sata0
drive mirror is starting for drive-sata0
drive-sata0: Cancelling block job
drive-sata0: Done.
2022-05-04 10:05:14 ERROR: online migrate failure - block job (mirror) error: drive-sata0: 'mirror' has been cancelled
2022-05-04 10:05:14 aborting phase 2 - cleanup resources
2022-05-04 10:05:14 migrate_cancel
2022-05-04 10:05:16 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

I've checked machine integrity and all and everything seems to be fine. Migration from chpve01 to chpve02 (the one I'm trying to migrate from...) works flawlessly.

Any suggestions? Last thing I remember doing before this had an impact was an update on packages installed on all hosts.
Interesting fact... these are VM's I've pulled out of Azure and seems they're the only ones having this weird behavior.
Maybe this will be useful for someone else in the future... I didn't check further, I've cleaned up the VM's from the Azure VM Agent and all but... no dice.
 
Hi,
I am also facing this today trying to move VMs from one host to another (local to LVM live migration) when this was never the case before.

on the output I see the following
Code:
2022-05-04 10:05:11 found local disk 'local:207/vm-207-disk-0.qcow2' (in current VM config)
2022-05-04 10:05:11 starting VM 207 on remote node 'chpve01'
2022-05-04 10:05:13 volume 'local:207/vm-207-disk-0.qcow2' is 'vmstor1:vm-207-disk-0' on the target
2022-05-04 10:05:13 start remote tunnel
2022-05-04 10:05:14 ssh tunnel ver 1
2022-05-04 10:05:14 starting storage migration
2022-05-04 10:05:14 sata0: start migration to nbd:unix:/run/qemu-server/207_nbd.migrate:exportname=drive-sata0
drive mirror is starting for drive-sata0
drive-sata0: Cancelling block job
drive-sata0: Done.
2022-05-04 10:05:14 ERROR: online migrate failure - block job (mirror) error: drive-sata0: 'mirror' has been cancelled
2022-05-04 10:05:14 aborting phase 2 - cleanup resources
2022-05-04 10:05:14 migrate_cancel
2022-05-04 10:05:16 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

I've checked machine integrity and all and everything seems to be fine. Migration from chpve01 to chpve02 (the one I'm trying to migrate from...) works flawlessly.

Any suggestions? Last thing I remember doing before this had an impact was an update on packages installed on all hosts.
how big is the virtual disk (check with qemu-img info /var/lib/vz/images/207/vm-207-disk-0.qcow2)? Since QEMU 6.0, drive-mirror needs the same exact size on source and target. LVM is usually aligned to 4MiB, so if the disk on the source isn't aligned to that too, mirroring will fail. If you resize the source disk to be aligned to 4MiB, it might work around the issue.
 
Hi,

how big is the virtual disk (check with qemu-img info /var/lib/vz/images/207/vm-207-disk-0.qcow2)? Since QEMU 6.0, drive-mirror needs the same exact size on source and target. LVM is usually aligned to 4MiB, so if the disk on the source isn't aligned to that too, mirroring will fail. If you resize the source disk to be aligned to 4MiB, it might work around the issue.
Oops... machine's gone already because I was a bit in a rush and decided to just redeploy it clean, taking the opportunity to even upgrade the VM's OS at the same time.

Sorry :|
 
Hi,

how big is the virtual disk (check with qemu-img info /var/lib/vz/images/207/vm-207-disk-0.qcow2)? Since QEMU 6.0, drive-mirror needs the same exact size on source and target. LVM is usually aligned to 4MiB, so if the disk on the source isn't aligned to that too, mirroring will fail. If you resize the source disk to be aligned to 4MiB, it might work around the issue.

Thank you for pointing out this detail about the alignment in LVM, I couldn't find out why I couldn't migrate a disk image from NFS to iscsi-lvm, and with that I was able to continue with the migration process

As I had images imported from vmware, the sizes were not aligned correctly, so after setting the size via cli using "qm disk resize", I was able to continue with the transfer of the images.

Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!