Hello all,
I'm wondering why the VM Migration between nodes on a 3 node cluster in a Ceph environment is taking time at all.
If I understand the underlying technology correctly, Ceph replicates the data on all osd and nodes to be available at all time on all nodes.
Nevertheless, when I initiate a VM migration between two nodes, that are also hosting the Ceph cluster, the migration looks like this:
It took more than 2:30 minutes to migrate the VM while the data should already be available on the destination node thanks to Ceph.
Can someone explain what is Proxmox doing during this time and what data is effectively transferred?
I'm wondering why the VM Migration between nodes on a 3 node cluster in a Ceph environment is taking time at all.
If I understand the underlying technology correctly, Ceph replicates the data on all osd and nodes to be available at all time on all nodes.
Nevertheless, when I initiate a VM migration between two nodes, that are also hosting the Ceph cluster, the migration looks like this:
Code:
2019-03-18 12:27:33 use dedicated network address for sending migration traffic (10.0.0.102)
2019-03-18 12:27:33 starting migration of VM 110 to node 'srv-pve2' (10.0.0.102)
2019-03-18 12:27:33 copying disk images
2019-03-18 12:27:33 starting VM 110 on remote node 'srv-pve2'
2019-03-18 12:27:36 start remote tunnel
2019-03-18 12:27:37 ssh tunnel ver 1
2019-03-18 12:27:37 starting online/live migration on unix:/run/qemu-server/110.migrate
2019-03-18 12:27:37 migrate_set_speed: 8589934592
2019-03-18 12:27:37 migrate_set_downtime: 0.1
2019-03-18 12:27:37 set migration_caps
2019-03-18 12:27:37 set cachesize: 2147483648
2019-03-18 12:27:37 start migrate command to unix:/run/qemu-server/110.migrate
2019-03-18 12:27:38 migration status: active (transferred 83752885, remaining 17051308032), total 17197506560)
2019-03-18 12:27:38 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:39 migration status: active (transferred 200636297, remaining 16934047744), total 17197506560)
2019-03-18 12:27:39 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:40 migration status: active (transferred 317460931, remaining 16817442816), total 17197506560)
2019-03-18 12:27:40 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:41 migration status: active (transferred 434548203, remaining 16700583936), total 17197506560)
2019-03-18 12:27:41 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:42 migration status: active (transferred 551685209, remaining 16583454720), total 17197506560)
2019-03-18 12:27:42 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
...
2019-03-18 12:30:02 migration status: active (transferred 16869331463, remaining 12906496), total 17197506560)
2019-03-18 12:30:02 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 13565 overflow 0
2019-03-18 12:30:02 migration status: active (transferred 16873607831, remaining 10682368), total 17197506560)
2019-03-18 12:30:02 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 14606 overflow 0
2019-03-18 12:30:02 migration speed: 112.99 MB/s - downtime 278 ms
2019-03-18 12:30:02 migration status: completed
2019-03-18 12:30:05 migration finished successfully (duration 00:02:32)
TASK OK
It took more than 2:30 minutes to migrate the VM while the data should already be available on the destination node thanks to Ceph.
Can someone explain what is Proxmox doing during this time and what data is effectively transferred?