ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-efidisk0: Input/output error (io-status: ok)

herzkerl

Member
Mar 18, 2021
99
18
13
I've been giving "remote migration" a try for the first time today, moving machines live from a single host running ZFS to a new cluster running CePH. It worked tremendously well—without issues and on the first try—for all VM's but one, which always fails with the following errors.

I tried quite a few things:
• Using a different remote host to migrate to
• Migrating to local-zfs instead of CePH
• Changing the machine version 7.1 to 8.2

Read quite a few threads regarding these issues, but to no avail. Looking forward to any suggestions you might have!

Here's the config from that VM:

Code:
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=ide2;scsi0
cores: 8
cpu: x86-64-v3
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: pc-i440fx-8.1
memory: 16384
name: W2019-DC
net0: virtio=7A:48:81:5E:B1:14,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
protection: 1
scsi0: local-zfs:vm-101-disk-1,discard=on,iothread=1,size=150G,ssd=1
scsi1: local-zfs:vm-101-disk-2,discard=on,iothread=1,size=300G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=80426df5-91a8-4be1-b1d1-99fd144cfda0
sockets: 1
vmgenid: c4d0dd5a-ac6a-4009-ae34-3cd2cf455626

I successfully migrated very similar machines (all running Windows Server 2019), though.

Code:
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=ide2;scsi0
cores: 6
efidisk0: local-zfs:vm-102-disk-2,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
lock: migrate
machine: pc-i440fx-8.1
memory: 65536
name: W2019-MX
net0: virtio=D2:C0:FE:A5:43:65,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
protection: 1
scsi0: local-zfs:vm-102-disk-0,discard=on,iothread=1,size=250G,ssd=1
scsi1: local-zfs:vm-102-disk-1,discard=on,iothread=1,size=150G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=83c92ef2-f8e2-4cc9-9c43-024c5380f0a7
sockets: 2
vmgenid: 6d0ec721-dbba-4a37-99b1-6fcafa9152e3

Code:
agent: 1,fstrim_cloned_disks=1
balloon: 32768
bios: ovmf
boot: order=ide2;scsi0
cores: 8
efidisk0: local-zfs:vm-103-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
lock: migrate
machine: pc-i440fx-8.1
memory: 131072
name: W2019-TS
net0: virtio=BA:2D:CA:68:77:CC,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
protection: 1
scsi0: local-zfs:vm-103-disk-1,discard=on,iothread=1,size=200G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=0eb103f1-1096-4c54-b112-d1779b3116d3
sockets: 2
vmgenid: 792b8cc8-3fd4-49bd-89ad-c9c2cdb554b9

And here's the error log:

Code:
2024-12-27 18:52:44 remote: started tunnel worker 'UPID:pve-r6415-2:000301A6:002701C8:676EE96C:qmtunnel:101:root@pam!migration:'
tunnel: -> sending command "version" to remote
tunnel: <- got reply
2024-12-27 18:52:44 local WS tunnel version: 2
2024-12-27 18:52:44 remote WS tunnel version: 2
2024-12-27 18:52:44 minimum required WS tunnel version: 2
websocket tunnel started
2024-12-27 18:52:44 starting migration of VM 101 to node 'pve-r6415-2' (192.168.100.12)
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
2024-12-27 18:52:44 found local disk 'local-zfs:vm-101-disk-0' (attached)
2024-12-27 18:52:44 found local disk 'local-zfs:vm-101-disk-1' (attached)
2024-12-27 18:52:44 found local disk 'local-zfs:vm-101-disk-2' (attached)
2024-12-27 18:52:44 mapped: net0 from vmbr0 to vmbr0
2024-12-27 18:52:44 Allocating volume for drive 'scsi0' on remote storage 'data'..
tunnel: -> sending command "disk" to remote
tunnel: <- got reply
2024-12-27 18:52:44 volume 'local-zfs:vm-101-disk-1' is 'data:vm-101-disk-0' on the target
2024-12-27 18:52:44 Allocating volume for drive 'scsi1' on remote storage 'data'..
tunnel: -> sending command "disk" to remote
tunnel: <- got reply
2024-12-27 18:52:44 volume 'local-zfs:vm-101-disk-2' is 'data:vm-101-disk-1' on the target
2024-12-27 18:52:44 Allocating volume for drive 'efidisk0' on remote storage 'data'..
tunnel: -> sending command "disk" to remote
tunnel: <- got reply
2024-12-27 18:52:45 volume 'local-zfs:vm-101-disk-0' is 'data:vm-101-disk-2' on the target
tunnel: -> sending command "config" to remote
tunnel: <- got reply
tunnel: -> sending command "start" to remote
tunnel: <- got reply
2024-12-27 18:52:46 Setting up tunnel for '/run/qemu-server/101.migrate'
2024-12-27 18:52:46 Setting up tunnel for '/run/qemu-server/101_nbd.migrate'
2024-12-27 18:52:46 starting storage migration
2024-12-27 18:52:46 scsi1: start migration to nbd:unix:/run/qemu-server/101_nbd.migrate:exportname=drive-scsi1
drive mirror is starting for drive-scsi1
tunnel: accepted new connection on '/run/qemu-server/101_nbd.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101_nbd.migrate'
drive-scsi1: transferred 87.0 MiB of 300.0 GiB (0.03%) in 1s
[...]
drive-scsi1: transferred 300.1 GiB of 300.1 GiB (100.00%) in 50m 16s, ready
all 'mirror' jobs are ready
2024-12-27 19:43:02 efidisk0: start migration to nbd:unix:/run/qemu-server/101_nbd.migrate:exportname=drive-efidisk0
drive mirror is starting for drive-efidisk0
tunnel: accepted new connection on '/run/qemu-server/101_nbd.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101_nbd.migrate'
drive-efidisk0: transferred 0.0 B of 528.0 KiB (0.00%) in 0s
drive-efidisk0: transferred 528.0 KiB of 528.0 KiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
2024-12-27 19:43:03 scsi0: start migration to nbd:unix:/run/qemu-server/101_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
tunnel: accepted new connection on '/run/qemu-server/101_nbd.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101_nbd.migrate'
drive-scsi0: transferred 79.0 MiB of 150.0 GiB (0.05%) in 1s
[...]
drive-scsi0: transferred 150.3 GiB of 150.3 GiB (100.00%) in 25m 1s, ready
all 'mirror' jobs are ready
2024-12-27 20:08:04 switching mirror jobs to actively synced mode
drive-efidisk0: switching to actively synced mode
drive-scsi0: switching to actively synced mode
drive-scsi1: switching to actively synced mode
drive-efidisk0: successfully switched to actively synced mode
drive-scsi0: successfully switched to actively synced mode
drive-scsi1: successfully switched to actively synced mode
2024-12-27 20:08:05 starting online/live migration on unix:/run/qemu-server/101.migrate
2024-12-27 20:08:05 set migration capabilities
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
2024-12-27 20:08:05 migration downtime limit: 100 ms
2024-12-27 20:08:05 migration cachesize: 2.0 GiB
2024-12-27 20:08:05 set migration parameters
2024-12-27 20:08:05 start migrate command to unix:/run/qemu-server/101.migrate
tunnel: accepted new connection on '/run/qemu-server/101.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101.migrate'
2024-12-27 20:08:06 migration active, transferred 79.0 MiB of 16.0 GiB VM-state, 122.9 MiB/s
2024-12-27 20:08:06 xbzrle: send updates to 373916 pages in 190.0 MiB encoded memory, cache-miss 17.56%, overflow 31529
[...]
2024-12-27 20:10:55 auto-increased downtime to continue migration: 800 ms
2024-12-27 20:10:56 migration active, transferred 16.6 GiB of 16.0 GiB VM-state, 86.9 MiB/s, VM dirties lots of memory: 128.5 MiB/s
2024-12-27 20:10:56 xbzrle: send updates to 551775 pages in 211.4 MiB encoded memory, cache-miss 33.71%, overflow 32568
tunnel: done handling forwarded connection from '/run/qemu-server/101.migrate'
2024-12-27 20:10:56 average migration speed: 95.9 MiB/s - downtime 303 ms
2024-12-27 20:10:56 migration status: completed
all 'mirror' jobs are ready
drive-efidisk0: Completing block job...
drive-efidisk0: Completed successfully.
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
tunnel: done handling forwarded connection from '/run/qemu-server/101_nbd.migrate'
tunnel: done handling forwarded connection from '/run/qemu-server/101_nbd.migrate'
tunnel: done handling forwarded connection from '/run/qemu-server/101_nbd.migrate'
drive-scsi1: Completed successfully.
drive-efidisk0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-efidisk0: Done.
WARN: drive-scsi1: Input/output error (io-status: ok)
drive-scsi1: Done.
drive-scsi0: Done.
2024-12-27 20:10:59 ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-efidisk0: Input/output error (io-status: ok)
2024-12-27 20:10:59 aborting phase 2 - cleanup resources
2024-12-27 20:10:59 migrate_cancel
tunnel: -> sending command "stop" to remote
tunnel: <- got reply
tunnel: -> sending command "quit" to remote
tunnel: <- got reply
tunnel: thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', library/std/src/io/stdio.rs:1009:9
tunnel: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
CMD websocket tunnel died: command 'proxmox-websocket-tunnel' failed: exit code 101

2024-12-27 20:11:45 ERROR: no reply to command '{"cleanup":1,"cmd":"quit"}': reading from tunnel failed: got timeout
print() on closed filehandle GEN24 at /usr/share/perl5/PVE/Tunnel.pm line 99.
readline() on closed filehandle GEN21 at /usr/share/perl5/PVE/Tunnel.pm line 71.
Use of uninitialized value $res in concatenation (.) or string at /usr/share/perl5/PVE/Tunnel.pm line 117.
2024-12-27 20:12:15 tunnel still running - terminating now with SIGTERM
2024-12-27 20:12:25 tunnel still running - terminating now with SIGKILL
2024-12-27 20:12:26 ERROR: tunnel child process (PID 3022180) couldn't be collected
2024-12-27 20:12:26 ERROR: failed to decode tunnel reply '' (command '{"cleanup":0,"cmd":"quit"}') - malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Tunnel.pm line 116.
2024-12-27 20:12:26 ERROR: migration finished with problems (duration 01:19:42)

TASK ERROR: migration problems
 
There may be a read error inside the win10 VM's virtual disk.

What I would recommend is using something like Veeam Free Agent to do a bare-metal backup in-vm, and restore that. Note that some files/dirs may be unrecoverable. You might want to do a chkdsk/f and sfc/scannow in-vm and see how a defrag runs
 
It might have been a network issue after all: We set up a bond (lacp, hash policy layer3+4)—after changing to a single nic config on the old host, the migration worked just fine.

EDIT: It could also be due to the different sizes of the EFI image. When trying to move from local-zfs to CePH I'm still seeing an error—albeit a different one:

Code:
create full clone of drive efidisk0 (local-zfs:vm-101-disk-2)
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: Source and target image have different sizes (io-status: ok)

I'll try to move the disk after turning off the machine, as stated here: https://forum.proxmox.com/threads/t...-mirror-has-been-cancelled.102202/post-550688
 
Last edited:
  • Like
Reactions: Kingneutron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!