Hello,
We are experiencing intermittent issues with live migrations hanging at the very end on Proxmox VE 9.1.2.
The migration starts correctly, runs for a while, but sometimes never finishes, remaining stuck indefinitely.
This happens only at the final stage, after memory synchronization appears to be completed.
We are also using ProxLB, which may trigger migrations back-to-back.
We are currently investigating limiting concurrent migrations, but we would like to understand why the migration can hang at the final stage.
Thank you in advance for your help.
Any feedback, known bug references, or tuning recommendations would be greatly appreciated.
We are experiencing intermittent issues with live migrations hanging at the very end on Proxmox VE 9.1.2.
The migration starts correctly, runs for a while, but sometimes never finishes, remaining stuck indefinitely.
This happens only at the final stage, after memory synchronization appears to be completed.
Environment
- Proxmox VE: 9.1.2
- Cluster: multi-node production cluster 8 nodes
- VM type: QEMU/KVM
- Storage:
- Shared SAN
- iSCSI (multipathing)
- LVM-thick
- snapshot-as-volume-chain enabled
- Network:
- Dedicated migration network
- High-bandwidth links (25 Gb) x2 LACP
- VM disks:
- Mostly QCOW2
- iothread enabled
- Migration type: Live (online)
Symptoms
- Migration starts normally
- VM is launched on the destination node
- Memory transfer progresses
- At the final phase, the migration hangs indefinitely
- The task never completes unless manually aborted
Code:
task started by HA resource agent
2025-12-16 17:06:09 conntrack state migration not supported or disabled, active connections might get dropped
2025-12-16 17:06:09 starting migration of VM 152 to node 'APPLIDISCCVPVE01' (10.87.2.3)
2025-12-16 17:06:09 starting VM 152 on remote node 'APPLIDISCCVPVE01'
2025-12-16 17:06:12 start remote tunnel
2025-12-16 17:06:13 ssh tunnel ver 1
2025-12-16 17:06:13 starting online/live migration on unix:/run/qemu-server/152.migrate
2025-12-16 17:06:13 set migration capabilities
2025-12-16 17:06:13 migration downtime limit: 100 ms
2025-12-16 17:06:13 migration cachesize: 4.0 GiB
2025-12-16 17:06:13 set migration parameters
2025-12-16 17:06:13 start migrate command to unix:/run/qemu-server/152.migrate
2025-12-16 17:06:14 migration active, transferred 531.4 MiB of 32.0 GiB VM-state, 553.5 MiB/s
2025-12-16 17:06:15 migration active, transferred 1.1 GiB of 32.0 GiB VM-state, 822.9 MiB/s
2025-12-16 17:06:16 migration active, transferred 1.7 GiB of 32.0 GiB VM-state, 601.0 MiB/s
2025-12-16 17:06:17 migration active, transferred 2.3 GiB of 32.0 GiB VM-state, 835.0 MiB/s
2025-12-16 17:06:18 migration active, transferred 2.6 GiB of 32.0 GiB VM-state, 108.2 MiB/s
2025-12-16 17:06:19 migration active, transferred 2.7 GiB of 32.0 GiB VM-state, 114.8 MiB/s
2025-12-16 17:06:20 migration active, transferred 2.8 GiB of 32.0 GiB VM-state, 105.3 MiB/s
2025-12-16 17:06:21 migration active, transferred 2.9 GiB of 32.0 GiB VM-state, 93.6 MiB/s
2025-12-16 17:06:22 migration active, transferred 3.0 GiB of 32.0 GiB VM-state, 106.9 MiB/s
2025-12-16 17:06:23 migration active, transferred 3.1 GiB of 32.0 GiB VM-state, 101.4 MiB/s
2025-12-16 17:06:24 migration active, transferred 3.2 GiB of 32.0 GiB VM-state, 181.3 MiB/s
2025-12-16 17:06:25 migration active, transferred 3.4 GiB of 32.0 GiB VM-state, 126.2 MiB/s
2025-12-16 17:06:26 migration active, transferred 3.5 GiB of 32.0 GiB VM-state, 107.2 MiB/s
2025-12-16 17:06:27 migration active, transferred 3.6 GiB of 32.0 GiB VM-state, 91.7 MiB/s
2025-12-16 17:06:28 migration active, transferred 3.7 GiB of 32.0 GiB VM-state, 110.4 MiB/s
2025-12-16 17:06:29 migration active, transferred 3.8 GiB of 32.0 GiB VM-state, 101.7 MiB/s
2025-12-16 17:06:30 migration active, transferred 4.0 GiB of 32.0 GiB VM-state, 291.0 MiB/s
2025-12-16 17:06:31 migration active, transferred 4.0 GiB of 32.0 GiB VM-state, 96.1 MiB/s
2025-12-16 17:06:32 migration active, transferred 4.1 GiB of 32.0 GiB VM-state, 97.6 MiB/s
2025-12-16 17:06:33 migration active, transferred 4.2 GiB of 32.0 GiB VM-state, 97.5 MiB/s
2025-12-16 17:06:34 migration active, transferred 4.7 GiB of 32.0 GiB VM-state, 541.3 MiB/s
2025-12-16 17:06:35 migration active, transferred 5.2 GiB of 32.0 GiB VM-state, 653.0 MiB/s
2025-12-16 17:06:36 migration active, transferred 5.8 GiB of 32.0 GiB VM-state, 689.4 MiB/s
2025-12-16 17:06:37 migration active, transferred 6.3 GiB of 32.0 GiB VM-state, 776.8 MiB/s
2025-12-16 17:06:38 migration active, transferred 6.9 GiB of 32.0 GiB VM-state, 678.0 MiB/s
2025-12-16 17:06:39 migration active, transferred 7.4 GiB of 32.0 GiB VM-state, 600.6 MiB/s
2025-12-16 17:06:40 migration active, transferred 7.9 GiB of 32.0 GiB VM-state, 735.6 MiB/s
2025-12-16 17:06:41 migration active, transferred 8.5 GiB of 32.0 GiB VM-state, 665.1 MiB/s
2025-12-16 17:06:42 migration active, transferred 9.0 GiB of 32.0 GiB VM-state, 740.4 MiB/s
2025-12-16 17:06:43 migration active, transferred 9.6 GiB of 32.0 GiB VM-state, 830.2 MiB/s
2025-12-16 17:06:44 migration active, transferred 10.1 GiB of 32.0 GiB VM-state, 627.5 MiB/s
2025-12-16 17:06:45 migration active, transferred 10.7 GiB of 32.0 GiB VM-state, 752.5 MiB/s
2025-12-16 17:06:46 migration active, transferred 11.3 GiB of 32.0 GiB VM-state, 682.1 MiB/s
2025-12-16 17:06:47 migration active, transferred 11.8 GiB of 32.0 GiB VM-state, 716.4 MiB/s
2025-12-16 17:06:48 migration active, transferred 12.3 GiB of 32.0 GiB VM-state, 704.4 MiB/s
2025-12-16 17:06:49 migration active, transferred 12.7 GiB of 32.0 GiB VM-state, 661.1 MiB/s
2025-12-16 17:06:50 migration active, transferred 13.2 GiB of 32.0 GiB VM-state, 649.1 MiB/s
2025-12-16 17:06:51 migration active, transferred 13.8 GiB of 32.0 GiB VM-state, 796.2 MiB/s
2025-12-16 17:06:52 migration active, transferred 14.3 GiB of 32.0 GiB VM-state, 776.8 MiB/s
2025-12-16 17:06:53 migration active, transferred 14.9 GiB of 32.0 GiB VM-state, 846.2 MiB/s
2025-12-16 17:06:54 migration active, transferred 15.4 GiB of 32.0 GiB VM-state, 565.6 MiB/s
2025-12-16 17:06:55 migration active, transferred 15.9 GiB of 32.0 GiB VM-state, 735.5 MiB/s
2025-12-16 17:06:56 migration active, transferred 16.5 GiB of 32.0 GiB VM-state, 829.4 MiB/s
2025-12-16 17:06:57 migration active, transferred 17.0 GiB of 32.0 GiB VM-state, 665.9 MiB/s
2025-12-16 17:06:58 migration active, transferred 17.6 GiB of 32.0 GiB VM-state, 733.1 MiB/s
2025-12-16 17:07:00 migration active, transferred 18.2 GiB of 32.0 GiB VM-state, 614.2 MiB/s
2025-12-16 17:07:01 migration active, transferred 18.7 GiB of 32.0 GiB VM-state, 788.9 MiB/s
2025-12-16 17:07:02 migration active, transferred 19.3 GiB of 32.0 GiB VM-state, 855.8 MiB/s
2025-12-16 17:07:03 migration active, transferred 19.9 GiB of 32.0 GiB VM-state, 831.0 MiB/s
2025-12-16 17:07:04 migration active, transferred 20.4 GiB of 32.0 GiB VM-state, 820.5 MiB/s
2025-12-16 17:07:05 migration active, transferred 21.0 GiB of 32.0 GiB VM-state, 805.9 MiB/s
2025-12-16 17:07:06 migration active, transferred 21.6 GiB of 32.0 GiB VM-state, 813.2 MiB/s
2025-12-16 17:07:07 migration active, transferred 22.1 GiB of 32.0 GiB VM-state, 903.0 MiB/s
2025-12-16 17:07:08 migration active, transferred 22.7 GiB of 32.0 GiB VM-state, 1.2 GiB/s
2025-12-16 17:07:09 migration active, transferred 23.1 GiB of 32.0 GiB VM-state, 112.5 MiB/s
2025-12-16 17:07:10 migration active, transferred 23.2 GiB of 32.0 GiB VM-state, 88.3 MiB/s
2025-12-16 17:07:11 migration active, transferred 23.2 GiB of 32.0 GiB VM-state, 101.4 MiB/s
2025-12-16 17:07:12 migration active, transferred 23.3 GiB of 32.0 GiB VM-state, 106.9 MiB/s
2025-12-16 17:07:13 migration active, transferred 23.3 GiB of 32.0 GiB VM-state, 105.9 MiB/s
2025-12-16 17:07:14 migration active, transferred 23.4 GiB of 32.0 GiB VM-state, 109.0 MiB/s
2025-12-16 17:07:15 migration active, transferred 23.4 GiB of 32.0 GiB VM-state, 183.9 MiB/s
2025-12-16 17:07:16 migration active, transferred 23.5 GiB of 32.0 GiB VM-state, 100.9 MiB/s
2025-12-16 17:07:17 migration active, transferred 23.5 GiB of 32.0 GiB VM-state, 109.2 MiB/s
2025-12-16 17:07:18 migration active, transferred 23.6 GiB of 32.0 GiB VM-state, 112.1 MiB/s
2025-12-16 17:07:19 migration active, transferred 23.7 GiB of 32.0 GiB VM-state, 101.8 MiB/s
2025-12-16 17:07:20 migration active, transferred 23.8 GiB of 32.0 GiB VM-state, 106.9 MiB/s
2025-12-16 17:07:21 migration active, transferred 23.9 GiB of 32.0 GiB VM-state, 89.6 MiB/s
2025-12-16 17:07:22 migration active, transferred 24.0 GiB of 32.0 GiB VM-state, 124.0 MiB/s
2025-12-16 17:07:23 migration active, transferred 24.2 GiB of 32.0 GiB VM-state, 91.0 MiB/s
2025-12-16 17:07:24 migration active, transferred 24.3 GiB of 32.0 GiB VM-state, 94.5 MiB/s
2025-12-16 17:07:25 migration active, transferred 24.4 GiB of 32.0 GiB VM-state, 174.9 MiB/s
2025-12-16 17:07:26 migration active, transferred 24.5 GiB of 32.0 GiB VM-state, 228.9 MiB/s
2025-12-16 17:07:27 migration active, transferred 24.9 GiB of 32.0 GiB VM-state, 638.4 MiB/s
2025-12-16 17:07:28 migration active, transferred 25.3 GiB of 32.0 GiB VM-state, 832.6 MiB/s
2025-12-16 17:07:29 migration active, transferred 25.7 GiB of 32.0 GiB VM-state, 684.2 MiB/s
2025-12-16 17:07:29 xbzrle: send updates to 18728 pages in 4.1 MiB encoded memory, cache-miss 63.25%, overflow 300
Observations
- The issue is intermittent
- More likely to occur when:
CPU and network do not appear saturated- Multiple migrations are triggered close together (manually or via ProxLB)
- The VM has significant memory usage
- Storage latency appears normal during the issue
- Aborting the migration leaves the VM running on the source node
- Is this a known issue with live migration on Proxmox VE 9.x using:
- shared SAN storage
- LVM-thick
- snapshot-as-volume-chain?
- Are there known limitations related to:
- QCOW2 on shared LVM
- The final memory synchronization phase?
- Are there recommended:
- Migration parameters
- Kernel or QEMU tunables
- Limits on concurrent live migrations
Additional context
We are also using ProxLB, which may trigger migrations back-to-back.
We are currently investigating limiting concurrent migrations, but we would like to understand why the migration can hang at the final stage.
Thank you in advance for your help.
Any feedback, known bug references, or tuning recommendations would be greatly appreciated.