This is a 2-node cluster running pve 9.1.5 with one additional QDevice.
ZFS synchronization and migration generally work without problems. If a node fails, VMs with cloudinit are restarted on the remaining node. So far, so good.
However, when the failed node becomes available again and VMs are supposed to be automatically migrated back to it, this fails. Every 10 seconds, a new migration attempt is started, which then aborts.
If the cloudinit image is deleted on the restarted host, the migration works again.
Shared storage for the cloudinit image is not an option that can be used.
101.conf
ZFS synchronization and migration generally work without problems. If a node fails, VMs with cloudinit are restarted on the remaining node. So far, so good.
However, when the failed node becomes available again and VMs are supposed to be automatically migrated back to it, this fails. Every 10 seconds, a new migration attempt is started, which then aborts.
If the cloudinit image is deleted on the restarted host, the migration works again.
Shared storage for the cloudinit image is not an option that can be used.
Code:
task started by HA resource agent
2026-03-07 10:13:48 conntrack state migration not supported or disabled, active connections might get dropped
2026-03-07 10:13:48 starting migration of VM 101 to node 'n2' (192.168.30.52)
2026-03-07 10:13:48 found generated disk 'zfs2:vm-101-cloudinit' (in current VM config)
2026-03-07 10:13:48 found local, replicated disk 'zfs2:vm-101-disk-0' (attached)
2026-03-07 10:13:48 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2026-03-07 10:13:48 replicating disk images
2026-03-07 10:13:48 start replication job
2026-03-07 10:13:48 guest => VM 101, running => 86469
2026-03-07 10:13:48 volumes => zfs2:vm-101-disk-0
2026-03-07 10:13:49 freeze guest filesystem
2026-03-07 10:13:49 create snapshot '__replicate_101-0_1772874828__' on zfs2:vm-101-disk-0
2026-03-07 10:13:49 thaw guest filesystem
2026-03-07 10:13:49 using secure transmission, rate limit: none
2026-03-07 10:13:49 incremental sync 'zfs2:vm-101-disk-0' (__replicate_101-0_1772874818__ => __replicate_101-0_1772874828__)
2026-03-07 10:13:50 send from @__replicate_101-0_1772874818__ to zfs2/vm-101-disk-0@__replicate_101-0_1772874828__ estimated size is 931K
2026-03-07 10:13:50 total estimated size is 931K
2026-03-07 10:13:50 TIME SENT SNAPSHOT zfs2/vm-101-disk-0@__replicate_101-0_1772874828__
2026-03-07 10:13:50 successfully imported 'zfs2:vm-101-disk-0'
2026-03-07 10:13:50 delete previous replication snapshot '__replicate_101-0_1772874818__' on zfs2:vm-101-disk-0
2026-03-07 10:13:51 (remote_finalize_local_job) delete stale replication snapshot '__replicate_101-0_1772874818__' on zfs2:vm-101-disk-0
2026-03-07 10:13:51 end replication job
2026-03-07 10:13:51 copying local disk images
2026-03-07 10:13:51 full send of zfs2/vm-101-cloudinit@__migration__ estimated size is 81.5K
2026-03-07 10:13:51 total estimated size is 81.5K
2026-03-07 10:13:51 TIME SENT SNAPSHOT zfs2/vm-101-cloudinit@__migration__
2026-03-07 10:13:51 volume 'zfs2/vm-101-cloudinit' already exists
send/receive failed, cleaning up snapshot(s)..
2026-03-07 10:13:51 ERROR: storage migration for 'zfs2:vm-101-cloudinit' to storage 'zfs2' failed - command 'set -o pipefail && pvesm export zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=n2' -o 'UserKnownHostsFile=/etc/pve/nodes/n2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.30.52 -- pvesm import zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ -delete-snapshot __migration__ -allow-rename 0' failed: exit code 255
2026-03-07 10:13:51 aborting phase 1 - cleanup resources
2026-03-07 10:13:51 scsi0: removing block-dirty-bitmap 'repl_scsi0'
2026-03-07 10:13:51 ERROR: migration aborted (duration 00:00:03): storage migration for 'zfs2:vm-101-cloudinit' to storage 'zfs2' failed - command 'set -o pipefail && pvesm export zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=n2' -o 'UserKnownHostsFile=/etc/pve/nodes/n2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.30.52 -- pvesm import zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ -delete-snapshot __migration__ -allow-rename 0' failed: exit code 255
TASK ERROR: migration
101.conf
Code:
agent: enabled=1
balloon: 1024
boot: c
bootdisk: scsi0
cicustom: vendor=snippets:snippets/ci-vendor-9501.yml
ciupgrade: 1
cores: 1
cpu: host
ipconfig0: ip=dhcp
memory: 2048
meta: creation-qemu=9.0.2,ctime=1724658870
name: ac1.int.example.com
nameserver: 1.1.1.1
net0: virtio=BC:24:11:41:A2:46,bridge=vmbr0
numa: 0
ostype: l26
scsi0: zfs2:vm-101-disk-0,cache=writeback,discard=on,format=raw,size=36352M,ssd=1
scsi2: zfs2:vm-101-cloudinit,media=cdrom,size=4M
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=1a6529e2-cf0b-4ce0-a89f-57fa394c2d55
sockets: 1
vga: std
vmgenid: 2517065f-a8d9-4ca7-a6f2-3208a2ace7db