ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-efidisk0: Input/output error (io-status: ok)

herzkerl

Active Member
Mar 18, 2021
104
22
38
I've been giving "remote migration" a try for the first time today, moving machines live from a single host running ZFS to a new cluster running CePH. It worked tremendously well—without issues and on the first try—for all VM's but one, which always fails with the following errors.

I tried quite a few things:
• Using a different remote host to migrate to
• Migrating to local-zfs instead of CePH
• Changing the machine version 7.1 to 8.2

Read quite a few threads regarding these issues, but to no avail. Looking forward to any suggestions you might have!

Here's the config from that VM:

Code:
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=ide2;scsi0
cores: 8
cpu: x86-64-v3
efidisk0: local-zfs:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: pc-i440fx-8.1
memory: 16384
name: W2019-DC
net0: virtio=7A:48:81:5E:B1:14,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
protection: 1
scsi0: local-zfs:vm-101-disk-1,discard=on,iothread=1,size=150G,ssd=1
scsi1: local-zfs:vm-101-disk-2,discard=on,iothread=1,size=300G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=80426df5-91a8-4be1-b1d1-99fd144cfda0
sockets: 1
vmgenid: c4d0dd5a-ac6a-4009-ae34-3cd2cf455626

I successfully migrated very similar machines (all running Windows Server 2019), though.

Code:
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=ide2;scsi0
cores: 6
efidisk0: local-zfs:vm-102-disk-2,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
lock: migrate
machine: pc-i440fx-8.1
memory: 65536
name: W2019-MX
net0: virtio=D2:C0:FE:A5:43:65,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
protection: 1
scsi0: local-zfs:vm-102-disk-0,discard=on,iothread=1,size=250G,ssd=1
scsi1: local-zfs:vm-102-disk-1,discard=on,iothread=1,size=150G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=83c92ef2-f8e2-4cc9-9c43-024c5380f0a7
sockets: 2
vmgenid: 6d0ec721-dbba-4a37-99b1-6fcafa9152e3

Code:
agent: 1,fstrim_cloned_disks=1
balloon: 32768
bios: ovmf
boot: order=ide2;scsi0
cores: 8
efidisk0: local-zfs:vm-103-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
lock: migrate
machine: pc-i440fx-8.1
memory: 131072
name: W2019-TS
net0: virtio=BA:2D:CA:68:77:CC,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
protection: 1
scsi0: local-zfs:vm-103-disk-1,discard=on,iothread=1,size=200G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=0eb103f1-1096-4c54-b112-d1779b3116d3
sockets: 2
vmgenid: 792b8cc8-3fd4-49bd-89ad-c9c2cdb554b9

And here's the error log:

Code:
2024-12-27 18:52:44 remote: started tunnel worker 'UPID:pve-r6415-2:000301A6:002701C8:676EE96C:qmtunnel:101:root@pam!migration:'
tunnel: -> sending command "version" to remote
tunnel: <- got reply
2024-12-27 18:52:44 local WS tunnel version: 2
2024-12-27 18:52:44 remote WS tunnel version: 2
2024-12-27 18:52:44 minimum required WS tunnel version: 2
websocket tunnel started
2024-12-27 18:52:44 starting migration of VM 101 to node 'pve-r6415-2' (192.168.100.12)
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
2024-12-27 18:52:44 found local disk 'local-zfs:vm-101-disk-0' (attached)
2024-12-27 18:52:44 found local disk 'local-zfs:vm-101-disk-1' (attached)
2024-12-27 18:52:44 found local disk 'local-zfs:vm-101-disk-2' (attached)
2024-12-27 18:52:44 mapped: net0 from vmbr0 to vmbr0
2024-12-27 18:52:44 Allocating volume for drive 'scsi0' on remote storage 'data'..
tunnel: -> sending command "disk" to remote
tunnel: <- got reply
2024-12-27 18:52:44 volume 'local-zfs:vm-101-disk-1' is 'data:vm-101-disk-0' on the target
2024-12-27 18:52:44 Allocating volume for drive 'scsi1' on remote storage 'data'..
tunnel: -> sending command "disk" to remote
tunnel: <- got reply
2024-12-27 18:52:44 volume 'local-zfs:vm-101-disk-2' is 'data:vm-101-disk-1' on the target
2024-12-27 18:52:44 Allocating volume for drive 'efidisk0' on remote storage 'data'..
tunnel: -> sending command "disk" to remote
tunnel: <- got reply
2024-12-27 18:52:45 volume 'local-zfs:vm-101-disk-0' is 'data:vm-101-disk-2' on the target
tunnel: -> sending command "config" to remote
tunnel: <- got reply
tunnel: -> sending command "start" to remote
tunnel: <- got reply
2024-12-27 18:52:46 Setting up tunnel for '/run/qemu-server/101.migrate'
2024-12-27 18:52:46 Setting up tunnel for '/run/qemu-server/101_nbd.migrate'
2024-12-27 18:52:46 starting storage migration
2024-12-27 18:52:46 scsi1: start migration to nbd:unix:/run/qemu-server/101_nbd.migrate:exportname=drive-scsi1
drive mirror is starting for drive-scsi1
tunnel: accepted new connection on '/run/qemu-server/101_nbd.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101_nbd.migrate'
drive-scsi1: transferred 87.0 MiB of 300.0 GiB (0.03%) in 1s
[...]
drive-scsi1: transferred 300.1 GiB of 300.1 GiB (100.00%) in 50m 16s, ready
all 'mirror' jobs are ready
2024-12-27 19:43:02 efidisk0: start migration to nbd:unix:/run/qemu-server/101_nbd.migrate:exportname=drive-efidisk0
drive mirror is starting for drive-efidisk0
tunnel: accepted new connection on '/run/qemu-server/101_nbd.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101_nbd.migrate'
drive-efidisk0: transferred 0.0 B of 528.0 KiB (0.00%) in 0s
drive-efidisk0: transferred 528.0 KiB of 528.0 KiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
2024-12-27 19:43:03 scsi0: start migration to nbd:unix:/run/qemu-server/101_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
tunnel: accepted new connection on '/run/qemu-server/101_nbd.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101_nbd.migrate'
drive-scsi0: transferred 79.0 MiB of 150.0 GiB (0.05%) in 1s
[...]
drive-scsi0: transferred 150.3 GiB of 150.3 GiB (100.00%) in 25m 1s, ready
all 'mirror' jobs are ready
2024-12-27 20:08:04 switching mirror jobs to actively synced mode
drive-efidisk0: switching to actively synced mode
drive-scsi0: switching to actively synced mode
drive-scsi1: switching to actively synced mode
drive-efidisk0: successfully switched to actively synced mode
drive-scsi0: successfully switched to actively synced mode
drive-scsi1: successfully switched to actively synced mode
2024-12-27 20:08:05 starting online/live migration on unix:/run/qemu-server/101.migrate
2024-12-27 20:08:05 set migration capabilities
tunnel: -> sending command "bwlimit" to remote
tunnel: <- got reply
2024-12-27 20:08:05 migration downtime limit: 100 ms
2024-12-27 20:08:05 migration cachesize: 2.0 GiB
2024-12-27 20:08:05 set migration parameters
2024-12-27 20:08:05 start migrate command to unix:/run/qemu-server/101.migrate
tunnel: accepted new connection on '/run/qemu-server/101.migrate'
tunnel: requesting WS ticket via tunnel
tunnel: established new WS for forwarding '/run/qemu-server/101.migrate'
2024-12-27 20:08:06 migration active, transferred 79.0 MiB of 16.0 GiB VM-state, 122.9 MiB/s
2024-12-27 20:08:06 xbzrle: send updates to 373916 pages in 190.0 MiB encoded memory, cache-miss 17.56%, overflow 31529
[...]
2024-12-27 20:10:55 auto-increased downtime to continue migration: 800 ms
2024-12-27 20:10:56 migration active, transferred 16.6 GiB of 16.0 GiB VM-state, 86.9 MiB/s, VM dirties lots of memory: 128.5 MiB/s
2024-12-27 20:10:56 xbzrle: send updates to 551775 pages in 211.4 MiB encoded memory, cache-miss 33.71%, overflow 32568
tunnel: done handling forwarded connection from '/run/qemu-server/101.migrate'
2024-12-27 20:10:56 average migration speed: 95.9 MiB/s - downtime 303 ms
2024-12-27 20:10:56 migration status: completed
all 'mirror' jobs are ready
drive-efidisk0: Completing block job...
drive-efidisk0: Completed successfully.
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
tunnel: done handling forwarded connection from '/run/qemu-server/101_nbd.migrate'
tunnel: done handling forwarded connection from '/run/qemu-server/101_nbd.migrate'
tunnel: done handling forwarded connection from '/run/qemu-server/101_nbd.migrate'
drive-scsi1: Completed successfully.
drive-efidisk0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-efidisk0: Done.
WARN: drive-scsi1: Input/output error (io-status: ok)
drive-scsi1: Done.
drive-scsi0: Done.
2024-12-27 20:10:59 ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-efidisk0: Input/output error (io-status: ok)
2024-12-27 20:10:59 aborting phase 2 - cleanup resources
2024-12-27 20:10:59 migrate_cancel
tunnel: -> sending command "stop" to remote
tunnel: <- got reply
tunnel: -> sending command "quit" to remote
tunnel: <- got reply
tunnel: thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', library/std/src/io/stdio.rs:1009:9
tunnel: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
CMD websocket tunnel died: command 'proxmox-websocket-tunnel' failed: exit code 101

2024-12-27 20:11:45 ERROR: no reply to command '{"cleanup":1,"cmd":"quit"}': reading from tunnel failed: got timeout
print() on closed filehandle GEN24 at /usr/share/perl5/PVE/Tunnel.pm line 99.
readline() on closed filehandle GEN21 at /usr/share/perl5/PVE/Tunnel.pm line 71.
Use of uninitialized value $res in concatenation (.) or string at /usr/share/perl5/PVE/Tunnel.pm line 117.
2024-12-27 20:12:15 tunnel still running - terminating now with SIGTERM
2024-12-27 20:12:25 tunnel still running - terminating now with SIGKILL
2024-12-27 20:12:26 ERROR: tunnel child process (PID 3022180) couldn't be collected
2024-12-27 20:12:26 ERROR: failed to decode tunnel reply '' (command '{"cleanup":0,"cmd":"quit"}') - malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Tunnel.pm line 116.
2024-12-27 20:12:26 ERROR: migration finished with problems (duration 01:19:42)

TASK ERROR: migration problems
 
There may be a read error inside the win10 VM's virtual disk.

What I would recommend is using something like Veeam Free Agent to do a bare-metal backup in-vm, and restore that. Note that some files/dirs may be unrecoverable. You might want to do a chkdsk/f and sfc/scannow in-vm and see how a defrag runs
 
It might have been a network issue after all: We set up a bond (lacp, hash policy layer3+4)—after changing to a single nic config on the old host, the migration worked just fine.

EDIT: It could also be due to the different sizes of the EFI image. When trying to move from local-zfs to CePH I'm still seeing an error—albeit a different one:

Code:
create full clone of drive efidisk0 (local-zfs:vm-101-disk-2)
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: Source and target image have different sizes (io-status: ok)

I'll try to move the disk after turning off the machine, as stated here: https://forum.proxmox.com/threads/t...-mirror-has-been-cancelled.102202/post-550688
 
Last edited:
  • Like
Reactions: Kingneutron
I can confirm I'm experiencing the exact same issue since upgrading to Proxmox VE 8.


My setup:


  • Version: pve-manager/8.4.1/2a5fa54a8503f96d
  • Kernel: 6.8.12-9-pve
  • Cluster type: non-shared (each node has its own local storage)
  • Storage: dir type, using qcow2 disks
  • Migration type: Live Migration (online) between nodes

Many live migrations fail with the following error (taken from the task log):


2025-05-09 13:24:10 migration active, transferred 16.1 GiB of 16.1 GiB VM-state, 497.6 MiB/s
2025-05-09 13:24:11 average migration speed: 446.3 MiB/s - downtime 102 ms
2025-05-09 13:24:11 migration completed, transferred 16.7 GiB VM-state
2025-05-09 13:24:11 migration status: completed
all 'mirror' jobs are ready
drive-virtio0: Completing block job...
drive-virtio0: Completed successfully.
drive-virtio0: Cancelling block job
drive-virtio0: Done.
2025-05-09 13:24:12 ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-virtio0: Input/output error (io-status: ok)
2025-05-09 13:24:12 aborting phase 2 - cleanup resources
2025-05-09 13:24:12 migrate_cancel
2025-05-09 13:24:17 ERROR: migration finished with problems (duration 00:47:08)
TASK ERROR: migration problems

Everything appears to work fine — even mirror jobs show as ready — but right at the end, when cancelling the block job, it throws an I/O error with no further details.


This never happened with PVE 7, where the same configuration worked perfectly.


I've looked through the forums and the roadmap, but couldn’t find anything specific about this issue.


Is anyone from the team or community aware of this problem or any known workaround?
 
Hi,
I can confirm I'm experiencing the exact same issue since upgrading to Proxmox VE 8.
My setup:


  • Version: pve-manager/8.4.1/2a5fa54a8503f96d
  • Kernel: 6.8.12-9-pve
  • Cluster type: non-shared (each node has its own local storage)
  • Storage: dir type, using qcow2 disks
  • Migration type: Live Migration (online) between nodes

Many live migrations fail with the following error (taken from the task log):


2025-05-09 13:24:10 migration active, transferred 16.1 GiB of 16.1 GiB VM-state, 497.6 MiB/s
2025-05-09 13:24:11 average migration speed: 446.3 MiB/s - downtime 102 ms
2025-05-09 13:24:11 migration completed, transferred 16.7 GiB VM-state
2025-05-09 13:24:11 migration status: completed
all 'mirror' jobs are ready
drive-virtio0: Completing block job...
drive-virtio0: Completed successfully.
drive-virtio0: Cancelling block job
drive-virtio0: Done.
2025-05-09 13:24:12 ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-virtio0: Input/output error (io-status: ok)
2025-05-09 13:24:12 aborting phase 2 - cleanup resources
2025-05-09 13:24:12 migrate_cancel
2025-05-09 13:24:17 ERROR: migration finished with problems (duration 00:47:08)
TASK ERROR: migration problems

Everything appears to work fine — even mirror jobs show as ready — but right at the end, when cancelling the block job, it throws an I/O error with no further details.


This never happened with PVE 7, where the same configuration worked perfectly.


I've looked through the forums and the roadmap, but couldn’t find anything specific about this issue.


Is anyone from the team or community aware of this problem or any known workaround?
please share the system logs/journal on the migration target node from around the time the issue happened. Please also share the VM configuration qm config <ID> as well as the output of pveversion -v from both source and target.
 
This also happens on 8.4.1 when trying to move a disk to another storage. In this case from ZFS over iSCSI fot a ceph storage.


moving disk with snapshots, snapshots will not be moved!
create full clone of drive efidisk0 (asp:vm-106-disk-0)
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: Source and target image have different sizes (io-status: ok)
 
  • Like
Reactions: ravage2k
Hi,

please share the system logs/journal on the migration target node from around the time the issue happened. Please also share the VM configuration qm config <ID> as well as the output of pveversion -v from both source and target.
Hi!

I have exactly same error. Migration was from host with Intel(R) Xeon(R) Gold 6338 CPU to Intel(R) Xeon(R) Gold 6240R CPU.
There is such log on target node in time, when migration failed:
Code:
QEMU[143805]: kvm: Putting registers after init: Failed to set special registers: Invalid argument

Usually, during live migration to a host with a different CPU model, the virtual machine shuts down on the destination host. But here, even that doesn't happen. So, does it mean I have to shut down the virtual machine and migrate it while it's powered off?

- qm config:
Code:
acpi: 1
agent: 1
autostart: 0
bios: seabios
boot: c
bootdisk: virtio0
cipassword: **********
ciuser: ciuser
cores: 2
cpu: host
cpuunits: 1000
description: description
hotplug: network,disk,usb
ide2: lvm1:vm-ID-cloudinit,media=cdrom,size=4M
ipconfig0: ip=IP,gw=GW
kvm: 1
memory: 4096
meta: creation-qemu=7.2.10,ctime=1737620710
name: name
net0: virtio=MAC,bridge=vmbr
numa: 1
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=bb708686-ee9b-4166-beb4-d89502ad8767
sockets: 1
sshkeys: sshkey
tablet: 1
virtio0: lvm1:vm-ID-disk-0,format=raw,mbps_rd=75,mbps_rd_max=100,mbps_wr=75,mbps_wr_max=100,size=50G
virtio1: lvm1:vm-ID-disk-1,format=raw,mbps_rd=75,mbps_rd_max=100,mbps_wr=75,mbps_wr_max=100,size=50G
vmgenid: 268ef95b-e2aa-4611-9446-281f138e896a

- source pveversion -v:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.11.11-1-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.11.11-1-pve: 6.11.11-1
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8: 6.8.12-11
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.0.0-1.1
intel-microcode: 3.20250512.1~deb11u1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.1
libpve-cluster-perl: 8.1.1
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.2-1
proxmox-backup-file-restore: 3.4.2-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.11
pve-cluster: 8.1.1
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-4
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.13
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2

- target pveversion -v:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.11.11-1-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.11.11-1-pve: 6.11.11-1
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8: 6.8.12-11
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.0.0-1.1
intel-microcode: 3.20250512.1~deb11u1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.1
libpve-cluster-perl: 8.1.1
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.2-1
proxmox-backup-file-restore: 3.4.2-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.11
pve-cluster: 8.1.1
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-4
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.13
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
 
you have different CPUs, so you can't (reliably) use cputype host and live migration - you need to either switch to a different CPU type, or use offline migration.
 
you have different CPUs, so you can't (reliably) use cputype host and live migration - you need to either switch to a different CPU type, or use offline migration.
Just adding my findings to this, which will mean I disagree with your statement. :-( Obviously you know more than I do about Proxmox, but from my experience: (PS: I found this thread, as I have the same issue with one of my VMs)
I have many hosts and many VMs and have always used Processor Type: Host. Been using Proxmox since 7.4 and they have been upgraded to 8.2 and are now on 8.4.1. This is the first time having this issue, since moving from 8.2 and I have used live migrations many times.

Source host: AMD EPYC 7713P -> Destination host: AMD EPYC 7702P
My Error:
Code:
2025-09-15 12:55:30 ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-scsi0: Input/output error (io-status: ok)
2025-09-15 12:55:30 aborting phase 2 - cleanup resources
2025-09-15 12:55:30 migrate_cancel
2025-09-15 12:55:35 ERROR: migration finished with problems (duration 00:04:59)
TASK ERROR: migration problems

Tried the migration with a different VM from the same host to another host and it "succeeded" with this error:
Code:
2025-09-15 13:13:24 stopping NBD storage migration server on target.
2025-09-15 13:13:25 ERROR: tunnel replied 'ERR: resume failed - VM 280 not running' to command 'resume 280'
2025-09-15 13:13:29 ERROR: migration finished with problems (duration 00:10:55)
TASK ERROR: migration problems
With this one, it migrated, but I just had to start the VM again.
Unfortunately these production VMs, so I can't just try a offline migration.
Still investigating to see how to resolve my first error.
 
Hi,
Just adding my findings to this, which will mean I disagree with your statement. :-( Obviously you know more than I do about Proxmox, but from my experience: (PS: I found this thread, as I have the same issue with one of my VMs)
I have many hosts and many VMs and have always used Processor Type: Host. Been using Proxmox since 7.4 and they have been upgraded to 8.2 and are now on 8.4.1. This is the first time having this issue, since moving from 8.2 and I have used live migrations many times.

Source host: AMD EPYC 7713P -> Destination host: AMD EPYC 7702P
migrations with VM CPU type host between different physical CPUs are never guaranteed to work. They might work between specific models if you are lucky, but even then, they might break upon kernel or QEMU changes. See: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_type

The system logs on the target node should contain more information about your errors.
 
Thank you for the info. So I created a new VM on a AMD EPYC 7713P host and migrated it to a AMD EPYC 7702P host and got this error:
Code:
2025-09-15 16:16:33 stopping NBD storage migration server on target.
2025-09-15 16:16:34 ERROR: tunnel replied 'ERR: resume failed - VM 354 not running' to command 'resume 354'
2025-09-15 16:16:37 ERROR: migration finished with problems (duration 00:03:35)
TASK ERROR: migration problems

Then I migrated it back, no issues:
Code:
2025-09-15 16:36:09 stopping NBD storage migration server on target.
2025-09-15 16:36:15 migration finished successfully (duration 00:03:21)
TASK OK

I will have a look at the syslog on the target node, to see if I can get more info.
 
That's because the 7713P (Zen3) is newer then 7702P (Zen2).

When live migrating a vm from a newer cpu to an older one you basically remove cpu instructions from the vm while running -> crash.
The other way around it works because the vm does not "loose" any cpu instructions when moving to the newer cpu.

Only workaround is to set your vm's cpu type from "host" to the lowest denominator both physical cpu's can handle. Downside is you may loose some performance if the vm runs on the newer cpu because it won't use all those newer cpu instructions. In your case that could be "EPYC-Rome" or "EPYC-Rome-v2" for security mitigations.
 
  • Like
Reactions: Kingneutron
Interesting thing though, migrating it again from the AMD EPYC 7713P host to AMD EPYC 7702P went without a hitch, so if @MarkusKo and @fiona is correct, and I don't mean to imply they are not, we can make the assumption that the first migration failed, because of the missing cpu instructions. Then migrating back, those cpu instructions don't get added back, hence why the 3rd migration is successful.
Code:
2025-09-15 17:29:08 stopping NBD storage migration server on target.
2025-09-15 17:29:13 migration finished successfully (duration 00:03:31)
TASK OK

Now just to figure out why we get this, cause I don't think it is related. Will have to review logs.
Code:
2025-09-15 12:55:30 ERROR: online migrate failure - Failed to complete storage migration: block job (mirror) error: drive-scsi0: Input/output error (io-status: ok)
2025-09-15 12:55:30 aborting phase 2 - cleanup resources
2025-09-15 12:55:30 migrate_cancel
2025-09-15 12:55:35 ERROR: migration finished with problems (duration 00:04:59)
TASK ERROR: migration problems
 
Code:
don't get added back
When you live migrate no new cpu instructions can be "added" to the vm or removed. When the vm boots the os inside checks the cpu and it's instruction set, from os perspective, why should that change? If the os uses some cpu instruction that is not available anymore after you migrated then the os obviously has to crash. That's why offline migration mostly works between different cpu's and live migration not.
 
  • Like
Reactions: wbarnard81
Code:
Interesting thing though, migrating it again from the AMD EPYC 7713P host to AMD EPYC 7702P went without a hitch
Not sure about that ...
 
Good evening!

I already had a similar problem six months ago, after updating proxmox from 8.2 to a newer version.

The key problem is in the pve-qemu-kvm package, version 9.0 and above.

If you downgrade to, for example, pve-qemu-kvm=8.2.2-1, live migration works without problems. Someone posted a fixed library on a forum somewhere, but I don't know if it's still available.

http://download.proxmox.com/temp/pv...flush/pve-qemu-kvm_9.0.2-4+fixflush_amd64.deb
 
  • Like
Reactions: wbarnard81
Hi,
I already had a similar problem six months ago, after updating proxmox from 8.2 to a newer version.
The key problem is in the pve-qemu-kvm package, version 9.0 and above.
what exact problem? Please share the full migration task logs, version information, VM config and also check the system journal on the target.

If you downgrade to, for example, pve-qemu-kvm=8.2.2-1, live migration works without problems. Someone posted a fixed library on a forum somewhere, but I don't know if it's still available.
If, like the other recent posts, you are talking about migrating between different CPU models with VM CPU type host, then the key problem is that this is not supported to begin with. Yes, QEMU updates can break migration in such a case. See my previous post.

If you are talking about a different issue, then please open up a separate thread, since it will just be confusing to have multiple conversations happening at the same time.

That package was built for something completely different to test a specific candidate for a fix and should not be used anymore: https://forum.proxmox.com/threads/proxmox-ve-8-3-live-migration-problems.158072/
 
  • Like
Reactions: Johannes S
Hi,

what exact problem? Please share the full migration task logs, version information, VM config and also check the system journal on the target.


If, like the other recent posts, you are talking about migrating between different CPU models with VM CPU type host, then the key problem is that this is not supported to begin with. Yes, QEMU updates can break migration in such a case. See my previous post.

If you are talking about a different issue, then please open up a separate thread, since it will just be confusing to have multiple conversations happening at the same time.


That package was built for something completely different to test a specific candidate for a fix and should not be used anymore: https://forum.proxmox.com/threads/proxmox-ve-8-3-live-migration-problems.158072/
Hello Fiona, thank you so much for the reply. I missed your answer above. I was trying to help someone with his situation)

P.S. Could specifying "host" instead of "CPU type" (e.g., x86-64-v3) potentially resolve this live migration issue?