VM failback with cloudinit and ZFS replication fails

ddtlabs

Member
Apr 1, 2023
4
0
6
This is a 2-node cluster running pve 9.1.5 with one additional QDevice.

ZFS synchronization and migration generally work without problems. If a node fails, VMs with cloudinit are restarted on the remaining node. So far, so good.

However, when the failed node becomes available again and VMs are supposed to be automatically migrated back to it, this fails. Every 10 seconds, a new migration attempt is started, which then aborts.

If the cloudinit image is deleted on the restarted host, the migration works again.

Shared storage for the cloudinit image is not an option that can be used.


Code:
task started by HA resource agent
2026-03-07 10:13:48 conntrack state migration not supported or disabled, active connections might get dropped
2026-03-07 10:13:48 starting migration of VM 101 to node 'n2' (192.168.30.52)
2026-03-07 10:13:48 found generated disk 'zfs2:vm-101-cloudinit' (in current VM config)
2026-03-07 10:13:48 found local, replicated disk 'zfs2:vm-101-disk-0' (attached)
2026-03-07 10:13:48 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2026-03-07 10:13:48 replicating disk images
2026-03-07 10:13:48 start replication job
2026-03-07 10:13:48 guest => VM 101, running => 86469
2026-03-07 10:13:48 volumes => zfs2:vm-101-disk-0
2026-03-07 10:13:49 freeze guest filesystem
2026-03-07 10:13:49 create snapshot '__replicate_101-0_1772874828__' on zfs2:vm-101-disk-0
2026-03-07 10:13:49 thaw guest filesystem
2026-03-07 10:13:49 using secure transmission, rate limit: none
2026-03-07 10:13:49 incremental sync 'zfs2:vm-101-disk-0' (__replicate_101-0_1772874818__ => __replicate_101-0_1772874828__)
2026-03-07 10:13:50 send from @__replicate_101-0_1772874818__ to zfs2/vm-101-disk-0@__replicate_101-0_1772874828__ estimated size is 931K
2026-03-07 10:13:50 total estimated size is 931K
2026-03-07 10:13:50 TIME        SENT   SNAPSHOT zfs2/vm-101-disk-0@__replicate_101-0_1772874828__
2026-03-07 10:13:50 successfully imported 'zfs2:vm-101-disk-0'
2026-03-07 10:13:50 delete previous replication snapshot '__replicate_101-0_1772874818__' on zfs2:vm-101-disk-0
2026-03-07 10:13:51 (remote_finalize_local_job) delete stale replication snapshot '__replicate_101-0_1772874818__' on zfs2:vm-101-disk-0
2026-03-07 10:13:51 end replication job
2026-03-07 10:13:51 copying local disk images
2026-03-07 10:13:51 full send of zfs2/vm-101-cloudinit@__migration__ estimated size is 81.5K
2026-03-07 10:13:51 total estimated size is 81.5K
2026-03-07 10:13:51 TIME        SENT   SNAPSHOT zfs2/vm-101-cloudinit@__migration__
2026-03-07 10:13:51 volume 'zfs2/vm-101-cloudinit' already exists
send/receive failed, cleaning up snapshot(s)..
2026-03-07 10:13:51 ERROR: storage migration for 'zfs2:vm-101-cloudinit' to storage 'zfs2' failed - command 'set -o pipefail && pvesm export zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=n2' -o 'UserKnownHostsFile=/etc/pve/nodes/n2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.30.52 -- pvesm import zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ -delete-snapshot __migration__ -allow-rename 0' failed: exit code 255
2026-03-07 10:13:51 aborting phase 1 - cleanup resources
2026-03-07 10:13:51 scsi0: removing block-dirty-bitmap 'repl_scsi0'
2026-03-07 10:13:51 ERROR: migration aborted (duration 00:00:03): storage migration for 'zfs2:vm-101-cloudinit' to storage 'zfs2' failed - command 'set -o pipefail && pvesm export zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=n2' -o 'UserKnownHostsFile=/etc/pve/nodes/n2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.30.52 -- pvesm import zfs2:vm-101-cloudinit zfs - -with-snapshots 0 -snapshot __migration__ -delete-snapshot __migration__ -allow-rename 0' failed: exit code 255
TASK ERROR: migration

101.conf
Code:
agent: enabled=1
balloon: 1024
boot: c
bootdisk: scsi0
cicustom: vendor=snippets:snippets/ci-vendor-9501.yml
ciupgrade: 1
cores: 1
cpu: host
ipconfig0: ip=dhcp
memory: 2048
meta: creation-qemu=9.0.2,ctime=1724658870
name: ac1.int.example.com
nameserver: 1.1.1.1
net0: virtio=BC:24:11:41:A2:46,bridge=vmbr0
numa: 0
ostype: l26
scsi0: zfs2:vm-101-disk-0,cache=writeback,discard=on,format=raw,size=36352M,ssd=1
scsi2: zfs2:vm-101-cloudinit,media=cdrom,size=4M
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=1a6529e2-cf0b-4ce0-a89f-57fa394c2d55
sockets: 1
vga: std
vmgenid: 2517065f-a8d9-4ca7-a6f2-3208a2ace7db
 
Hi!
2026-03-07 10:13:51 volume 'zfs2/vm-101-cloudinit' already exists
Thanks for the report! I suppose there is a HA node affinity rule which makes the HA resource failback to the old node. As the node fails, the HA Manager will move the HA resource but not clean up the cloudinit image from the failed node as it would happen in normal circumstances... I suppose we can forcefully overwrite cloudinit images on the target as these are auto-generated, but I'll investigate and get back here later.
 
To mitigate this for now, zfs2/vm-101-cloudinit can be removed on the failed node to be able to migrate it back there.
 
Thanks for your answer. Yes, you are right, there is an affinity rule to failback. I will disable it for now.
Will there be a fix that will override cloudinit images in the near future?