Live migration zero downtime

johnnyb911

New Member
Apr 29, 2016
2
0
1
72
Hi,

I'm wondering about live migration with zero downtime.

Is it possible with proxmox/kvm with shared storage (glusterfs or ceph) ?

Is it a real live migration zero downtime ?

Thank you for your feedback or xp

Regards
 
Yes.
I use ceph for my shared storage and do live migration fairly regularly with zero downtime. This only woks with kvm though, using containers does incur a restart during a migration.
 
It not exist zero downtime. The downtime is aboul ms(econs). Under 1 second, it is ok for most of the case. But depends how many clients do you have, or what kind of applicantion do they use.
Another ideea is to use haproxy for this VM, so in case of a big downtime, haproxy can route the clients to another VM.
Also I can say that ospf can help if you can use it, in combination with some ospf capable switches ...
 
Footnote on the thread, for what it might be worth.

- Zero Downtime live migration here in proxmox, is not any less or more downtime than you might see with Xen, VMware, etc. There is always ~some millisecond level 'cut' in operation when <old is paused> and <new is lit up>. Generally the delay is such that you may, or may not, notice it if you are constantly pinging the target host being live migrated.

As always, live-migration is contingent on the simple reality that
-- activity inside the VM being migrated, generates RAM deltas on the source host. And possibly deltas on attached disk(s) as well (depends if you are using shared-san-style disk; or not).
-- those changes need to be pushed from Source>>Target before the live-migrate-cut-over can happen
-- if the rate of change inside the VM being migrated - exceeds the bandwidth you have between the 2 physical servers who are the source, target -- then your live-migration will 'stretch out' -- source machine will stay online; deltas will be copied; incrementally / iteratively ; until the target 'catches up' with the source.

So: If your source machine "Is too damn busy" when compared to "available bandwidth for pushing the deltas" - then the live migration will in fact never happen. The cut-over time just keeps getting pushed ahead into the future. I've seen this happen on various environments (OpenStack KVM, Proxmox, etc) - it is a matter of not trying to do the impossible. Since - it is impossible. :-)

Note that proxmox, as most KVM implementations now have this feature - allows 'live migration' with non-shared storage. The catch is that - you must wait patiently for disk blocks to be copied first from source to target. Some people dislike waiting so only use shared-storage when they do live migrations. But .. it is doable .. just requires patience, depending on speed of link between proxmox nodes; and the size of the VM images being pushed through the pipe.


Tim
 
  • Like
Reactions: Johannes S and bjzy
Footnote on the thread, for what it might be worth.

- Zero Downtime live migration here in proxmox, is not any less or more downtime than you might see with Xen, VMware, etc. There is always ~some millisecond level 'cut' in operation when <old is paused> and <new is lit up>. Generally the delay is such that you may, or may not, notice it if you are constantly pinging the target host being live migrated.

As always, live-migration is contingent on the simple reality that
-- activity inside the VM being migrated, generates RAM deltas on the source host. And possibly deltas on attached disk(s) as well (depends if you are using shared-san-style disk; or not).
-- those changes need to be pushed from Source>>Target before the live-migrate-cut-over can happen
-- if the rate of change inside the VM being migrated - exceeds the bandwidth you have between the 2 physical servers who are the source, target -- then your live-migration will 'stretch out' -- source machine will stay online; deltas will be copied; incrementally / iteratively ; until the target 'catches up' with the source.

So: If your source machine "Is too damn busy" when compared to "available bandwidth for pushing the deltas" - then the live migration will in fact never happen. The cut-over time just keeps getting pushed ahead into the future. I've seen this happen on various environments (OpenStack KVM, Proxmox, etc) - it is a matter of not trying to do the impossible. Since - it is impossible. :-)

Note that proxmox, as most KVM implementations now have this feature - allows 'live migration' with non-shared storage. The catch is that - you must wait patiently for disk blocks to be copied first from source to target. Some people dislike waiting so only use shared-storage when they do live migrations. But .. it is doable .. just requires patience, depending on speed of link between proxmox nodes; and the size of the VM images being pushed through the pipe.


Tim
Hi,

I am wondering for live migration do I need minimum of three nodes or just two, thanks a lot!
 
otherwise when one node is off, the cluster has no quorum and also the other node restarts to “fix” that.
Nitpicking: it does "fence" itself only if "High Availability" is active. "The other" node in a cluster without HA won't do that.
 
I just wanted to chime in, I agree that 2 node cluster live-migration is possible, but having 3 nodes or a 3rd 'witness / QDevice' is nice if possible. But not mandatory.

Tim
 
But not mandatory.
Well... technically you are right.

But: 1) it IS mandatory for High-Availability and 2) you can not administrate the surviving, single node without further tricks.

Without having Quorum it will deny any command. Yes, the workaround "pvecm expected 1" is well known and officially documented, but that is not a usual command and should basically only be used for disaster recovery, not for "normal operation".

Just my 2€¢...
 
  • Like
Reactions: Johannes S
yes, and note that HA is different - superset - from "pair of nodes in cluster" or even "three node in cluster".
original question was asking about live migration.
:-)

anyhow. I agree, if doing a 2-node cluster you really do want to have a 3-node or a 3-node plus quorum-breaker-service on different linux nearby box.
or just a very pico proxmox node :-)
so there are many ways to make the thing work. And ideally 'best practice' is not just something people dream of.

Tim
 
  • Like
Reactions: UdoB and Johannes S