storage migration failed: block job (mirror) error

ruysilva · Jun 20, 2021

Hello,

We are getting a lot of these errors when we try to move vm discs between nodes or from NFS storage to local storage:

Logical volume "vm-305-disk-0" successfully removed
TASK ERROR: storage migration failed: block job (mirror) error: VM 305 qmp command 'query-block-jobs' failed - got wrong command id

We have a cluster with 18 nodes all with SSD for local storage and NFS for shared storage. 2x10gbit link in all nodes and 4x10 for NFS.
This start to happen in any node after we start upgrading from PVE 6.2 to version 6.4

Vzdumps backup is working fine with no problem at all.

Does anyone have face this problem?

Kind Regards,

Syslog during disc migration:
Jun 20 15:36:17 cloud09 pvedaemon[37143]: <xxx@pve> move disk VM 305: move --disk scsi0 --storage storage-cloud09
Jun 20 15:36:17 cloud09 pvedaemon[37143]: <xxx@pve> starting task UPID:cloud09:00001D32:032A3AC8:60CF5261:qmmove:305:xxx@pve:
Jun 20 15:36:24 cloud09 pvedaemon[37143]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:32 cloud09 pvestatd[2481]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:34 cloud09 pvestatd[2481]: status update time (7.637 seconds)
Jun 20 15:36:42 cloud09 pvestatd[2481]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:43 cloud09 pvestatd[2481]: status update time (7.586 seconds)
Jun 20 15:36:44 cloud09 pvedaemon[37143]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:52 cloud09 pvestatd[2481]: VM 305 qmp command failed - VM 305 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 15:36:53 cloud09 pvedaemon[7474]: VM 305 qmp command failed - VM 305 qmp command 'query-block-jobs' failed - got wrong command id '2481:352584' (expected 7474:2559)
Jun 20 15:36:54 cloud09 pvestatd[2481]: status update time (7.600 seconds)
Jun 20 15:36:59 cloud09 pvedaemon[7474]: storage migration failed: block job (mirror) error: VM 305 qmp command 'query-block-jobs' failed - got wrong command id '2481:352584' (expected 7474:2559)
Jun 20 15:36:59 cloud09 pvedaemon[37143]: <xxx@pve> end task UPID:cloud09:00001D32:032A3AC8:60CF5261:qmmove:305:xxx@pve: storage migration failed: block job (mirror) error: VM 305 qmp command 'qu
ery-block-jobs' failed - got wrong command id '2481:352584' (expected 7474:2559)

PVE Manager Version
pve-manager/6.4-8/185e14db

packet versions:

proxmox-ve: 6.4-1 (running kernel: 5.4.114-1-pve) pve-manager: 6.4-8 (running version: 6.4-8/185e14db) pve-kernel-5.4: 6.4-2 pve-kernel-helper: 6.4-2 pve-kernel-5.4.114-1-pve: 5.4.114-1 pve-kernel-5.4.34-1-pve: 5.4.34-2 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.4-1 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-3 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.1.8-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.5-6 pve-cluster: 6.4-1 pve-container: 3.3-5 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.2-4 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.4-pve1

vm status --verbose

qm status 305 --verbose

balloon: 34359738368
ballooninfo:
        actual: 34359738368
        free_mem: 3080478720
        last_update: 1624194848
        major_page_faults: 75094
        max_mem: 34359738368
        mem_swapped_in: 45056
        mem_swapped_out: 163840
        minor_page_faults: 1295835297
        total_mem: 33566240768
blockstat:
        ide2:
                account_failed: 0
                account_invalid: 0
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 0
                flush_total_time_ns: 0
                idle_time_ns: 2307235711896
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 1122
                rd_merged: 0
                rd_operations: 23
                rd_total_time_ns: 16647859228
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 0
                wr_highest_offset: 0
                wr_merged: 0
                wr_operations: 0
                wr_total_time_ns: 0
        scsi0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 513441
                flush_total_time_ns: 282256053153
                idle_time_ns: 162519128
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 21010677248
                rd_merged: 0
                rd_operations: 857810
                rd_total_time_ns: 763615449425
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 30570020864
                wr_highest_offset: 749906018304
                wr_merged: 0
                wr_operations: 2677513
                wr_total_time_ns: 17310025635816
cpus: 8
disk: 0
diskread: 21010678370
diskwrite: 30570020864
freemem: 3080478720
maxdisk: 751619276800
maxmem: 34359738368
mem: 30485762048
name: xxxxx
netin: 951061967
netout: 9943821373
nics:
        tap305i0:
                netin: 862358015
                netout: 6560856795
        tap305i1:
                netin: 88703952
                netout: 3382964578
pid: 16976
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.0.3 (8de935110ed4cab743f6c9437357057f9f9f08ea)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-5.2+pve0
running-qemu: 5.2.0
status: running
template:
uptime: 48443
vmid: 305

ruysilva · Jun 22, 2021

We have done more tests and we can confirm that this issue only happens when we try to move a vm disk from NFS storage to local storage on nodes with 6.3 and 6.4 pve.

ruysilva · Jun 25, 2021

No one?

np-prxmx · Jul 5, 2021

Hi,
i've the same problem now....
From NFS to LOCAL ZFS i obtain this error:

drive mirror is starting for drive-scsi2
drive-scsi2: Cancelling block job
drive-scsi2: Done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-scsi2: 'mirror' has been cancelled

i'v migrated machines until friday.. i don't understand error..

ruysilva · Jul 6, 2021

Can you add the output of your pveversion -v just to compare?

squirell · Jul 6, 2021

1. If run (edit path)
$ ls /nfs/images/vmid/Image.format
Before migration, migration works?

2. What is local-storage backend? Ext4/zfs?

ruysilva · Aug 16, 2021

Hi, no one with this problem? it's possible that is only us?

At this moment is almost impossible to migrate vms between nodes (local storage) or vms disks from NFS storage to local storage.

All with this error:
TASK ERROR: storage migration failed: block job (mirror) error: VM 265 qmp command 'query-block-jobs' failed - got wrong command id '2738:6778596' (expected 43352:1447)

Node version, all nodes with same version:

Code:

proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1

tjk · Sep 14, 2021

Seeing the same thing on latest PVE 7 on NFS storage.

ruysilva · Oct 19, 2021

Still no feedback from Proxmox team...?

Fanat_sst · Dec 10, 2021

Virtual Environment 7.0-11

Move disk NFS to NFS 4Tb
drive-scsi0: transferred 1.1 TiB of 4.0 TiB (26.75%) in 3h 38m 50s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: VM 241 qmp command 'query-block-jobs' failed - got wrong command id '1448:303790072' (expected 1594521:5263)

emoxam · Apr 26, 2022

6.4.14 while trying to move qcow2 from NFS storage to local storage.

Code:

drive-scsi1: transferred 44.5 GiB of 300.0 GiB (14.82%) in 31m 21s
drive-scsi1: transferred 44.5 GiB of 300.0 GiB (14.83%) in 31m 22s
drive-scsi1: transferred 44.5 GiB of 300.0 GiB (14.84%) in 31m 24s
drive-scsi1: Cancelling block job
drive-scsi1: Done.
TASK ERROR: storage migration failed: block job (mirror) error: VM 198 qmp command 'query-block-jobs' failed - got wrong command id '3422341:778' (expected 3417768:2875)

Ghosty · May 4, 2022

I am also facing this today trying to move VMs from one host to another (local to LVM live migration) when this was never the case before.

on the output I see the following

Code:

2022-05-04 10:05:11 found local disk 'local:207/vm-207-disk-0.qcow2' (in current VM config)
2022-05-04 10:05:11 starting VM 207 on remote node 'chpve01'
2022-05-04 10:05:13 volume 'local:207/vm-207-disk-0.qcow2' is 'vmstor1:vm-207-disk-0' on the target
2022-05-04 10:05:13 start remote tunnel
2022-05-04 10:05:14 ssh tunnel ver 1
2022-05-04 10:05:14 starting storage migration
2022-05-04 10:05:14 sata0: start migration to nbd:unix:/run/qemu-server/207_nbd.migrate:exportname=drive-sata0
drive mirror is starting for drive-sata0
drive-sata0: Cancelling block job
drive-sata0: Done.
2022-05-04 10:05:14 ERROR: online migrate failure - block job (mirror) error: drive-sata0: 'mirror' has been cancelled
2022-05-04 10:05:14 aborting phase 2 - cleanup resources
2022-05-04 10:05:14 migrate_cancel
2022-05-04 10:05:16 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

I've checked machine integrity and all and everything seems to be fine. Migration from chpve01 to chpve02 (the one I'm trying to migrate from...) works flawlessly.

Any suggestions? Last thing I remember doing before this had an impact was an update on packages installed on all hosts.

Ghosty · May 4, 2022

Ghosty said:
I am also facing this today trying to move VMs from one host to another (local to LVM live migration) when this was never the case before.

on the output I see the following

Code:

2022-05-04 10:05:11 found local disk 'local:207/vm-207-disk-0.qcow2' (in current VM config) 2022-05-04 10:05:11 starting VM 207 on remote node 'chpve01' 2022-05-04 10:05:13 volume 'local:207/vm-207-disk-0.qcow2' is 'vmstor1:vm-207-disk-0' on the target 2022-05-04 10:05:13 start remote tunnel 2022-05-04 10:05:14 ssh tunnel ver 1 2022-05-04 10:05:14 starting storage migration 2022-05-04 10:05:14 sata0: start migration to nbd:unix:/run/qemu-server/207_nbd.migrate:exportname=drive-sata0 drive mirror is starting for drive-sata0 drive-sata0: Cancelling block job drive-sata0: Done. 2022-05-04 10:05:14 ERROR: online migrate failure - block job (mirror) error: drive-sata0: 'mirror' has been cancelled 2022-05-04 10:05:14 aborting phase 2 - cleanup resources 2022-05-04 10:05:14 migrate_cancel 2022-05-04 10:05:16 ERROR: migration finished with problems (duration 00:00:05) TASK ERROR: migration problems

I've checked machine integrity and all and everything seems to be fine. Migration from chpve01 to chpve02 (the one I'm trying to migrate from...) works flawlessly.

Any suggestions? Last thing I remember doing before this had an impact was an update on packages installed on all hosts.

Interesting fact... these are VM's I've pulled out of Azure and seems they're the only ones having this weird behavior.
Maybe this will be useful for someone else in the future... I didn't check further, I've cleaned up the VM's from the Azure VM Agent and all but... no dice.

fiona · May 5, 2022

Hi,

Ghosty said:
I am also facing this today trying to move VMs from one host to another (local to LVM live migration) when this was never the case before.

on the output I see the following

Code:

2022-05-04 10:05:11 found local disk 'local:207/vm-207-disk-0.qcow2' (in current VM config) 2022-05-04 10:05:11 starting VM 207 on remote node 'chpve01' 2022-05-04 10:05:13 volume 'local:207/vm-207-disk-0.qcow2' is 'vmstor1:vm-207-disk-0' on the target 2022-05-04 10:05:13 start remote tunnel 2022-05-04 10:05:14 ssh tunnel ver 1 2022-05-04 10:05:14 starting storage migration 2022-05-04 10:05:14 sata0: start migration to nbd:unix:/run/qemu-server/207_nbd.migrate:exportname=drive-sata0 drive mirror is starting for drive-sata0 drive-sata0: Cancelling block job drive-sata0: Done. 2022-05-04 10:05:14 ERROR: online migrate failure - block job (mirror) error: drive-sata0: 'mirror' has been cancelled 2022-05-04 10:05:14 aborting phase 2 - cleanup resources 2022-05-04 10:05:14 migrate_cancel 2022-05-04 10:05:16 ERROR: migration finished with problems (duration 00:00:05) TASK ERROR: migration problems

I've checked machine integrity and all and everything seems to be fine. Migration from chpve01 to chpve02 (the one I'm trying to migrate from...) works flawlessly.

Any suggestions? Last thing I remember doing before this had an impact was an update on packages installed on all hosts.

how big is the virtual disk (check with qemu-img info /var/lib/vz/images/207/vm-207-disk-0.qcow2)? Since QEMU 6.0, drive-mirror needs the same exact size on source and target. LVM is usually aligned to 4MiB, so if the disk on the source isn't aligned to that too, mirroring will fail. If you resize the source disk to be aligned to 4MiB, it might work around the issue.

Ghosty · May 5, 2022

Fabian_E said:
Hi,

how big is the virtual disk (check with qemu-img info /var/lib/vz/images/207/vm-207-disk-0.qcow2)? Since QEMU 6.0, drive-mirror needs the same exact size on source and target. LVM is usually aligned to 4MiB, so if the disk on the source isn't aligned to that too, mirroring will fail. If you resize the source disk to be aligned to 4MiB, it might work around the issue.

Oops... machine's gone already because I was a bit in a rush and decided to just redeploy it clean, taking the opportunity to even upgrade the VM's OS at the same time.

Sorry :|

miguelwill · Sep 19, 2024

fiona said:
Hi,

how big is the virtual disk (check with qemu-img info /var/lib/vz/images/207/vm-207-disk-0.qcow2)? Since QEMU 6.0, drive-mirror needs the same exact size on source and target. LVM is usually aligned to 4MiB, so if the disk on the source isn't aligned to that too, mirroring will fail. If you resize the source disk to be aligned to 4MiB, it might work around the issue.

Thank you for pointing out this detail about the alignment in LVM, I couldn't find out why I couldn't migrate a disk image from NFS to iscsi-lvm, and with that I was able to continue with the migration process

As I had images imported from vmware, the sizes were not aligned correctly, so after setting the size via cli using "qm disk resize", I was able to continue with the transfer of the images.

Regards

storage migration failed: block job (mirror) error

ruysilva

New Member

ruysilva

New Member

ruysilva

New Member

np-prxmx

Well-Known Member

ruysilva

New Member

squirell

Well-Known Member

ruysilva

New Member

tjk

Well-Known Member

ruysilva

New Member

Fanat_sst

New Member

emoxam

Well-Known Member

Ghosty

New Member

Ghosty

New Member

fiona

Proxmox Staff Member

Ghosty

New Member

miguelwill

Renowned Member

We value your privacy