[SOLVED] Just for understanding, what goes on in a live migration

muekno

Member
Dec 15, 2023
220
13
18
I hope I am in the right forum.
I like to understand what's going on in a live migration, the risks, the difference to a migration with shut down VMs
Situation, 2 oder 3 node cluster, with replication active, all cluster nodes up and running normal. Manual started migration.

Did a search on the net but did not really found something.

Thanks for any information

Regards

Rainer
 
Well it depends, in a nutshell - if you are using shared storage, only the state (memory contents) are 'migrated' to the alternate host, and the disk image stays put. If not using shared storage, I believe from watching the output, it first moves/copies (depending on if you select delete source) the disk image over, then it moves the memory contents over, then the cluster engine updates the location of the vm in the cluster. I think it may be using criu on the backend.

Now if the VM's are shutdown, then only the disk is moved, and the cluster is updated - since there is no state to preserve onto the new pve host. Hopefully that makes sense.
 
  • Like
Reactions: muekno
Well it depends, in a nutshell - if you are using shared storage, only the state (memory contents) are 'migrated' to the alternate host, and the disk image stays put. If not using shared storage, I believe from watching the output, it first moves/copies (depending on if you select delete source) the disk image over, then it moves the memory contents over, then the cluster engine updates the location of the vm in the cluster. I think it may be using criu on the backend.

Now if the VM's are shutdown, then only the disk is moved, and the cluster is updated - since there is no state to preserve onto the new pve host. Hopefully that makes sense.
I assume with shared storage a NAS or something like that, you mean a common storage all nodes are connected.
What I have are cluster nodes with own hard disks, but are common within the cluster, having the same name and the second and third storage ist added with "add storage" checkbox unchecked.
But I assume in a live migration, at some point, the VM has to be stopped at least while copying RAM, otherwise RAM will not be consistent.
 
I assume with shared storage a NAS or something like that, you mean a common storage all nodes are connected.
Yes. Shared Storage is when all nodes have access to the same Storage, like Ceph, GlusterFS (never used that), NAS or "Shared Nothing" like DRBD (=not supported out of the box by Proxmox). If shared storage is not available, Disk mirroring will be done too. Not having Shared Storage means, Live Migration can take quite long time because of having to replicate the disk data - but Live migration is still possible.

It also means, that direct failover in case of node outage is not possible, because the data die with the node.
But I assume in a live migration, at some point, the VM has to be stopped at least while copying RAM, otherwise RAM will not be consistent.
At first Disk Mirroring is happening if necessary. When this is done, RAM-Copying takes place. RAM-Copying is done while the guest VM is running. At the end of copying of RAM, The guest VM is being stop for a very short time (In most cases <100ms Milliseconds. But can also be over a full second. This depends on how much activity is within the guest VM). After that the final bytes are being transferred and directly after that the guest VM is resumed at the target node.

To have stable Live migration, I recommend dedicated network links for migration data and good quality, homogeneous cluster hardware.
 
Last edited:
  • Like
Reactions: muekno
The amazing things with live migration is that you can be logged in via Remote Desktop on a Windows VM that is being live migrated, surf on the web continuously and you don't notice when you are moved to a new node. No disconnect, no noticable delay or anything. I've actually controlled the live migration via the Proxmox web interface from the VM I migrate, works like a charm!
 
  • Like
Reactions: muekno