Hi,
I just did my first cross-cluster live migration via PDM (between two PVE 8.x
clusters, both enrolled in the same PDM instance). It worked, eventually, but I
hit several "obvious in hindsight" prerequisites that aren't all in one place
in the docs.
Posting my checklist here both to share and to ask the community: did I miss
anything? Any other gotcha worth adding?
(checklist coming in the next post — keeping the OP short)
Here's the checklist, refined after a few successful and a couple of failed
migrations. Aimed at someone doing this for the first time on PVE 8.x + current
PDM.
Network prerequisites:
[ ] Source and target clusters can reach each other on the migration network
(default: same network as cluster traffic, but you can configure it).
[ ] Firewall: TCP 22 (SSH) and the migration port range open BOTH WAYS between
every source node and every target node. Not just "from source to target".
[ ] MTU consistent end-to-end. If you have jumbo frames on one side and 1500
on the other, migration will start and hang silently.
Storage prerequisites:
[ ] Target cluster has a storage with the SAME NAME as the source storage
hosting the VM disks, OR you explicitly map storages in the migration
dialog. Same-name is easier; mapping is more flexible.
[ ] Target storage has enough free space for the VM disks (PDM doesn't
pre-check this aggressively — it'll just fail mid-migration).
[ ] Storage TYPES are compatible. RBD -> RBD works. LVM-thin -> ZFS works
(it converts). RBD -> directory works but you lose features.
VM prerequisites:
[ ] VM is on a CPU type that exists on the target. "host" CPU type across
different CPU generations = migration will fail or VM will crash post-
migration. Use "x86-64-v2-AES" or similar for portability.
[ ] No local resources tied to source node: PCI passthrough, USB passthrough,
local ISO mounted as CD-ROM, hostpci, etc. Detach before migrating.
[ ] If using SDN: target cluster has the same SDN zones/vnets configured.
PDM does not auto-sync SDN config (yet).
Cluster prerequisites:
[ ] Same major PVE version on both sides (8.x -> 8.x). Cross-major is not
supported.
[ ] Both clusters enrolled in PDM with tokens that have Sys.Modify and
VM.Migrate at the right paths. (Administrator at "/" covers all of this.)
[ ] Time sync: NTP working on all nodes, both clusters. Skewed clocks cause
weird SSL / token validation failures.
Migration time tips:
[ ] First migration of any given VM is slow because PDM has to copy the full
disks. Subsequent migrations of the same VM back and forth aren't
incremental — they're full copies each time. Don't expect Storage vMotion
/ cross-vCenter vMotion semantics yet.
[ ] You can pre-stage by replicating the storage out-of-band (if both sides
see the same Ceph cluster, for example, the migration is near-instant).
Things that bit me specifically:
- Forgot to update a security group on the migration network and the migration
hung at "establishing tunnel" forever — no clean error.
- Migrated a VM with "host" CPU type from a Skylake node to an Ice Lake node;
the VM booted but Windows blue-screened on first reboot 3 days later, after
some kernel update revealed an instruction set difference.
Open questions for the community:
- Anyone with experience migrating VMs that have running guest agent qemu-ga
sessions that interact with backup software? Curious if there are edge cases.
- Is there a clean way to do "evacuate this whole node" across clusters via PDM
yet, or do you still have to script it?
Hope this helps the next person.