[SOLVED] VM Migration on a cluster running Ceph

Le PAH

Member
Oct 17, 2018
38
0
6
France
Hello all,

I'm wondering why the VM Migration between nodes on a 3 node cluster in a Ceph environment is taking time at all.

If I understand the underlying technology correctly, Ceph replicates the data on all osd and nodes to be available at all time on all nodes.

Nevertheless, when I initiate a VM migration between two nodes, that are also hosting the Ceph cluster, the migration looks like this:

Code:
2019-03-18 12:27:33 use dedicated network address for sending migration traffic (10.0.0.102)
2019-03-18 12:27:33 starting migration of VM 110 to node 'srv-pve2' (10.0.0.102)
2019-03-18 12:27:33 copying disk images
2019-03-18 12:27:33 starting VM 110 on remote node 'srv-pve2'
2019-03-18 12:27:36 start remote tunnel
2019-03-18 12:27:37 ssh tunnel ver 1
2019-03-18 12:27:37 starting online/live migration on unix:/run/qemu-server/110.migrate
2019-03-18 12:27:37 migrate_set_speed: 8589934592
2019-03-18 12:27:37 migrate_set_downtime: 0.1
2019-03-18 12:27:37 set migration_caps
2019-03-18 12:27:37 set cachesize: 2147483648
2019-03-18 12:27:37 start migrate command to unix:/run/qemu-server/110.migrate
2019-03-18 12:27:38 migration status: active (transferred 83752885, remaining 17051308032), total 17197506560)
2019-03-18 12:27:38 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:39 migration status: active (transferred 200636297, remaining 16934047744), total 17197506560)
2019-03-18 12:27:39 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:40 migration status: active (transferred 317460931, remaining 16817442816), total 17197506560)
2019-03-18 12:27:40 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:41 migration status: active (transferred 434548203, remaining 16700583936), total 17197506560)
2019-03-18 12:27:41 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-18 12:27:42 migration status: active (transferred 551685209, remaining 16583454720), total 17197506560)
2019-03-18 12:27:42 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0

...

2019-03-18 12:30:02 migration status: active (transferred 16869331463, remaining 12906496), total 17197506560)
2019-03-18 12:30:02 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 13565 overflow 0
2019-03-18 12:30:02 migration status: active (transferred 16873607831, remaining 10682368), total 17197506560)
2019-03-18 12:30:02 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 14606 overflow 0
2019-03-18 12:30:02 migration speed: 112.99 MB/s - downtime 278 ms
2019-03-18 12:30:02 migration status: completed
2019-03-18 12:30:05 migration finished successfully (duration 00:02:32)
TASK OK

It took more than 2:30 minutes to migrate the VM while the data should already be available on the destination node thanks to Ceph.

Can someone explain what is Proxmox doing during this time and what data is effectively transferred?
 
Can someone explain what is Proxmox doing during this time and what data is effectively transferred?
what you see transferred is the content of the guest memory, not the disk
but do not worry, while the memory is being transferred the vm stays up, only for the littlest last bit the vm gets paused (in your case 0.278s)
 
Okay, great that makes sense!

I've got another question, related to the former subject.

How should I configure Proxmox if I want to make a VM available at all time, even if the underlying node gets interrupted?

The replication on the host doesn't seem to work with Ceph storage (no replicable storage found).

I'm not sure if the HA could solve this question.

Thanks for your kind advice.
 
HA is your friend. As you are using shared storage all hypervisors have access to the VM disks so u don't need replication. If you use HA and the node that the VM is on fails it will move to a new node in the cluster if the HA is enabled
 
  • Like
Reactions: Le PAH