migration stalls

Jun 6, 2024
28
7
3
Greetings,

Since moving to PVE 9.x (now using 9.1.1) from 8.x, live migrations would sometimes stall.
The logs show:

Code:
2026-01-06 10:19:29 conntrack state migration not supported or disabled, active connections might get dropped
2026-01-06 10:19:30 use dedicated network address for sending migration traffic (172.25.250.239)
2026-01-06 10:19:30 starting migration of VM 158 to node 'iep-pve-05' (172.25.250.239)
2026-01-06 10:19:30 starting VM 158 on remote node 'iep-pve-05'
2026-01-06 10:19:40 start remote tunnel
2026-01-06 10:19:42 ssh tunnel ver 1
2026-01-06 10:19:42 starting online/live migration on unix:/run/qemu-server/158.migrate
2026-01-06 10:19:42 set migration capabilities
2026-01-06 10:19:42 migration downtime limit: 100 ms
2026-01-06 10:19:42 migration cachesize: 4.0 GiB
2026-01-06 10:19:42 set migration parameters
2026-01-06 10:19:42 start migrate command to unix:/run/qemu-server/158.migrate
2026-01-06 10:19:43 migration active, transferred 2.0 MiB of 32.0 GiB VM-state, 0.0 B/s

and then it no longer progresses.

I do not understand either why it says conntrack state migration not supported or disabled.

We use a dedicated subnet for migration traffic with secure=true.

Any ideas what could be going on?

Thanks,
Franz STREBEL
 
Try updating the node or manually placing the node in maintenance mode before restarting.
 
Hi,
migration of HA-managed resources currently is missing the conntrack flag, but that does not explain why the migration stalls. And the thread @Magnus-mercer mentioned is likely not the same (and already fixed in qemu-server >= 9.1.3), because there are clear migration errors rather than stalls.

@IIEP_IT do you see any network traffic from the migration at all when it's stalled or just very little? Is there any unusual CPU load caused by the QEMU instance either on source or target? Please share the output of pveversion -v and the excerpt from the system logs/journal around the time of the issue from both source and target node. Is the function of the guest itself impacted during this? What does qm status ID --verbose with the numerical ID of the guest say when it's stalled?