[SOLVED] Cannot Migrate VM to other node (ZFS replicated)

Nov 24, 2020
20
12
8
Austria
Hi!

I cannot migrate a VM to another node, because of: unable to parse value of 'efidisk0' - format error and online migrate failure - number of replicated disks on source and target node do not match - target node too old?

I tried to remove and readded the replication, but no success. The replication is working (green icon). I am using all the latest pve-enterprise updates.

FULL migration LOG:

Code:
task started by HA resource agent
2021-11-03 13:19:55 starting migration of VM 105 to node 'node2'
2021-11-03 13:19:55 found local, replicated disk 'zfs:vm-105-disk-0' (in current VM config)
2021-11-03 13:19:55 found local, replicated disk 'zfs:vm-105-disk-1' (in current VM config)
2021-11-03 13:19:55 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2021-11-03 13:19:55 efidisk0: start tracking writes using block-dirty-bitmap 'repl_efidisk0'
2021-11-03 13:19:55 replicating disk images
2021-11-03 13:19:55 start replication job
2021-11-03 13:19:55 guest => VM 105, running => 1015376
2021-11-03 13:19:55 volumes => zfs:vm-105-disk-0,zfs:vm-105-disk-1
2021-11-03 13:19:55 freeze guest filesystem
2021-11-03 13:20:02 create snapshot '__replicate_105-0_1635941995__' on zfs:vm-105-disk-0
2021-11-03 13:20:02 create snapshot '__replicate_105-0_1635941995__' on zfs:vm-105-disk-1
2021-11-03 13:20:02 thaw guest filesystem
2021-11-03 13:20:02 using secure transmission, rate limit: none
2021-11-03 13:20:02 incremental sync 'zfs:vm-105-disk-0' (__replicate_105-0_1635941940__ => __replicate_105-0_1635941995__)
2021-11-03 13:20:03 send from @__replicate_105-0_1635941940__ to rpool/data/vm-105-disk-0@__replicate_105-0_1635941995__ estimated size is 9.42M
2021-11-03 13:20:03 total estimated size is 9.42M
2021-11-03 13:20:03 successfully imported 'zfs:vm-105-disk-0'
2021-11-03 13:20:03 incremental sync 'zfs:vm-105-disk-1' (__replicate_105-0_1635941940__ => __replicate_105-0_1635941995__)
2021-11-03 13:20:04 send from @__replicate_105-0_1635941940__ to rpool/data/vm-105-disk-1@__replicate_105-0_1635941995__ estimated size is 624B
2021-11-03 13:20:04 total estimated size is 624B
2021-11-03 13:20:04 successfully imported 'zfs:vm-105-disk-1'
2021-11-03 13:20:04 delete previous replication snapshot '__replicate_105-0_1635941940__' on zfs:vm-105-disk-0
2021-11-03 13:20:04 delete previous replication snapshot '__replicate_105-0_1635941940__' on zfs:vm-105-disk-1
2021-11-03 13:20:04 (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1635941940__' on zfs:vm-105-disk-0
2021-11-03 13:20:04 (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1635941940__' on zfs:vm-105-disk-1
2021-11-03 13:20:04 end replication job
2021-11-03 13:20:04 starting VM 105 on remote node 'node2'
2021-11-03 13:20:05 [node2] vm 105 - unable to parse value of 'efidisk0' - format error
2021-11-03 13:20:05 [node2] efitype: property is not defined in schema and the schema does not allow additional properties
2021-11-03 13:20:06 [node2] vm 105 - unable to parse value of 'efidisk0' - format error
2021-11-03 13:20:06 [node2] efitype: property is not defined in schema and the schema does not allow additional properties
2021-11-03 13:20:06 volume 'zfs:vm-105-disk-0' is 'zfs:vm-105-disk-0' on the target
2021-11-03 13:20:06 ERROR: online migrate failure - number of replicated disks on source and target node do not match - target node too old?
2021-11-03 13:20:06 aborting phase 2 - cleanup resources
2021-11-03 13:20:06 migrate_cancel
2021-11-03 13:20:06 efidisk0: removing block-dirty-bitmap 'repl_efidisk0'
2021-11-03 13:20:06 scsi0: removing block-dirty-bitmap 'repl_scsi0'
2021-11-03 13:20:06 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems



Code:
root@node1:~# cat /etc/pve/qemu-server/105.conf
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 4
efidisk0: zfs:vm-105-disk-1,efitype=4m,size=528K
hotplug: disk,network,usb
machine: pc-q35-6.0
memory: 6144
name: cust1
net0: virtio=9E:5A:60:3B:FB:D0,bridge=vmbr0,firewall=1
net1: virtio=EA:F0:EA:8B:9C:59,bridge=vnet1,firewall=1,tag=10
numa: 0
onboot: 1
ostype: win10
scsi0: zfs:vm-105-disk-0,discard=on,size=128G
scsihw: virtio-scsi-pci
smbios1: uuid=f46cc7d8-9539-400a-ae78-0e71f23bbe7e
sockets: 1
vmgenid: 4304982c-34d1-4e51-ae61-b7353db96cc9
 
the package versions on the target node are older than on the source node - this is not guaranteed to work reliably in all cases. upgrade the target node, then it should work.
 
Ok, this is strange:

both nodes are up to date with the enterprise repos (apt-get update && apt-get dist-upgrade -> nothin to do).

However, the versions differ a little bit on both nodes (only different packets are shown)

node1
Code:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-edk2-firmware: 3.20210831-1
qemu-server: 7.0-16

node2
Code:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-edk2-firmware: 3.20200531-1
qemu-server: 7.0-14

I restarted node2, but its still runnind the old packages, how can I get the latest packages? Of course I have subscriptions for both nodes.
 
node1 must have either a non-enterprise repo enabled, or have been installed/upgraded recently using a non-enterprise repo. neither pve-manager 7.0-13 nor qemu-server 7.0-16 are on pve-enterprise yet.