[SOLVED] Proxmox 6.2 live migration on zfs with replica

OmegaBlue

Member
Jan 16, 2019
6
0
6
54
I've searched through the forums but can't find the same issue only similar issues with completely different messages.

Live migration without replicas completes just fine, but when a replica exists the migration fails

Code:
2020-09-09 13:51:57 starting migration of VM 168 to node 'vm-node-01' (192.168.42.150)
2020-09-09 13:51:57 found local, replicated disk 'zfs-pool:vm-168-disk-1' (in current VM config)
2020-09-09 13:51:57 found local, replicated disk 'zfs-pool:vm-168-disk-2' (in current VM config)
2020-09-09 13:51:57 scsi1: start tracking writes using block-dirty-bitmap 'repl_scsi1'
2020-09-09 13:51:57 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2020-09-09 13:51:57 replicating disk images
2020-09-09 13:51:57 start replication job
2020-09-09 13:51:57 guest => VM 168, running => 27747
2020-09-09 13:51:57 volumes => zfs-pool:vm-168-disk-1,zfs-pool:vm-168-disk-2
2020-09-09 13:51:58 freeze guest filesystem
2020-09-09 13:51:58 create snapshot '__replicate_168-0_1599652317__' on zfs-pool:vm-168-disk-1
2020-09-09 13:51:58 create snapshot '__replicate_168-0_1599652317__' on zfs-pool:vm-168-disk-2
2020-09-09 13:51:58 thaw guest filesystem
2020-09-09 13:51:58 using secure transmission, rate limit: none
2020-09-09 13:51:58 incremental sync 'zfs-pool:vm-168-disk-1' (__replicate_168-0_1599651600__ => __replicate_168-0_1599652317__)
2020-09-09 13:51:59 send from @__replicate_168-0_1599651600__ to zfs-pool/vm-168-disk-1@__replicate_168-0_1599652317__ estimated size is 11.2M
2020-09-09 13:51:59 total estimated size is 11.2M
2020-09-09 13:51:59 TIME        SENT   SNAPSHOT zfs-pool/vm-168-disk-1@__replicate_168-0_1599652317__
2020-09-09 13:51:59 zfs-pool/vm-168-disk-1@__replicate_168-0_1599651600__    name    zfs-pool/vm-168-disk-1@__replicate_168-0_1599651600__    -
2020-09-09 13:51:59 successfully imported 'zfs-pool:vm-168-disk-1'
2020-09-09 13:51:59 incremental sync 'zfs-pool:vm-168-disk-2' (__replicate_168-0_1599651600__ => __replicate_168-0_1599652317__)
2020-09-09 13:52:00 send from @__replicate_168-0_1599651600__ to zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__ estimated size is 3.22G
2020-09-09 13:52:00 total estimated size is 3.22G
2020-09-09 13:52:00 TIME        SENT   SNAPSHOT zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:00 zfs-pool/vm-168-disk-2@__replicate_168-0_1599651600__    name    zfs-pool/vm-168-disk-2@__replicate_168-0_1599651600__    -
2020-09-09 13:52:01 13:52:01    497M   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:02 13:52:02    921M   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:03 13:52:03   1.19G   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:04 13:52:04   1.64G   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:05 13:52:05   1.86G   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:06 13:52:06   1.90G   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:07 13:52:07   2.29G   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:08 13:52:08   2.76G   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:09 13:52:09   2.90G   zfs-pool/vm-168-disk-2@__replicate_168-0_1599652317__
2020-09-09 13:52:10 successfully imported 'zfs-pool:vm-168-disk-2'
2020-09-09 13:52:10 delete previous replication snapshot '__replicate_168-0_1599651600__' on zfs-pool:vm-168-disk-1
2020-09-09 13:52:10 delete previous replication snapshot '__replicate_168-0_1599651600__' on zfs-pool:vm-168-disk-2
2020-09-09 13:52:11 (remote_finalize_local_job) delete stale replication snapshot '__replicate_168-0_1599651600__' on zfs-pool:vm-168-disk-1
2020-09-09 13:52:11 (remote_finalize_local_job) delete stale replication snapshot '__replicate_168-0_1599651600__' on zfs-pool:vm-168-disk-2
2020-09-09 13:52:11 end replication job
2020-09-09 13:52:11 copying local disk images
2020-09-09 13:52:11 starting VM 168 on remote node 'vm-node-01'
2020-09-09 13:52:13 start remote tunnel
2020-09-09 13:52:14 ssh tunnel ver 1
2020-09-09 13:52:14 starting storage migration
2020-09-09 13:52:14 scsi1: start migration to nbd:unix:/run/qemu-server/168_nbd.migrate:exportname=drive-scsi1
drive mirror re-using dirty bitmap 'repl_scsi1'
drive mirror is starting for drive-scsi1
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2020-09-09 13:52:14 ERROR: online migrate failure - mirroring error: VM 168 qmp command 'drive-mirror' failed - Parameter 'bitmap' is unexpected
2020-09-09 13:52:14 aborting phase 2 - cleanup resources
2020-09-09 13:52:14 migrate_cancel
2020-09-09 13:52:14 scsi1: removing block-dirty-bitmap 'repl_scsi1'
2020-09-09 13:52:14 scsi0: removing block-dirty-bitmap 'repl_scsi0'
2020-09-09 13:52:15 ERROR: migration finished with problems (duration 00:00:18)
TASK ERROR: migration problems

VM config
Code:
agent: 1
balloon: 0
bootdisk: scsi0
cores: 8
cpu: EPYC
hotplug: disk,network,usb,memory,cpu
ide2: none,media=cdrom
memory: 49152
name: Zabbix
net0: virtio=56:0D:95:6E:3E:1C,bridge=vmbr0,tag=11
net1: virtio=1A:B5:F5:16:7E:2D,bridge=vmbr0,tag=42
numa: 1
onboot: 1
ostype: l26
scsi0: zfs-pool:vm-168-disk-1,discard=on,size=100G
scsi1: zfs-pool:vm-168-disk-2,discard=on,size=600G
scsihw: virtio-scsi-pci
smbios1: uuid=da5f4cbd-952b-4728-9651-e47b85dee190
sockets: 2

Code:
pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.13-2-pve: 5.3.13-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.11-pve1
ceph-fuse: 14.2.11-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 4.2.0-1
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1
 
Is the pveversion -v output from the target or the source node? Please provide it for both nodes.
 
Looks like your pve-qemu-kvm package is too old (4.2.0). Support for live migration with replication data was introduced in pve-qemu-kvm 5.0.
 
Eh, mea culpa. Downgraded to mitigate the EPYC bug and apparently broke this. We can mark this as solved.
 
I've changed this thread to solved for you. You can mark a thread as [SOLVED] by pressing the 3 dots (More options) above the first post and select Edit thread. Then as prefix select [SOLVED] and save it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!