Live migration zero downtime

johnnyb911 · May 25, 2017

Hi,

I'm wondering about live migration with zero downtime.

Is it possible with proxmox/kvm with shared storage (glusterfs or ceph) ?

Is it a real live migration zero downtime ?

Thank you for your feedback or xp

Regards

eric_f · May 25, 2017

Yes.
I use ceph for my shared storage and do live migration fairly regularly with zero downtime. This only woks with kvm though, using containers does incur a restart during a migration.

LnxBil · May 25, 2017

For the record: Currently, the live migration only works "inside" a cluster, so all nodes have to be part of one cluster.

guletz · May 25, 2017

It not exist zero downtime. The downtime is aboul ms(econs). Under 1 second, it is ok for most of the case. But depends how many clients do you have, or what kind of applicantion do they use.
Another ideea is to use haproxy for this VM, so in case of a big downtime, haproxy can route the clients to another VM.
Also I can say that ospf can help if you can use it, in combination with some ospf capable switches ...

fireon · May 26, 2017

Yes this works really great here with Ceph.

fortechitsolutions · May 26, 2017

Footnote on the thread, for what it might be worth.

- Zero Downtime live migration here in proxmox, is not any less or more downtime than you might see with Xen, VMware, etc. There is always ~some millisecond level 'cut' in operation when <old is paused> and <new is lit up>. Generally the delay is such that you may, or may not, notice it if you are constantly pinging the target host being live migrated.

As always, live-migration is contingent on the simple reality that
-- activity inside the VM being migrated, generates RAM deltas on the source host. And possibly deltas on attached disk(s) as well (depends if you are using shared-san-style disk; or not).
-- those changes need to be pushed from Source>>Target before the live-migrate-cut-over can happen
-- if the rate of change inside the VM being migrated - exceeds the bandwidth you have between the 2 physical servers who are the source, target -- then your live-migration will 'stretch out' -- source machine will stay online; deltas will be copied; incrementally / iteratively ; until the target 'catches up' with the source.

So: If your source machine "Is too damn busy" when compared to "available bandwidth for pushing the deltas" - then the live migration will in fact never happen. The cut-over time just keeps getting pushed ahead into the future. I've seen this happen on various environments (OpenStack KVM, Proxmox, etc) - it is a matter of not trying to do the impossible. Since - it is impossible.

Note that proxmox, as most KVM implementations now have this feature - allows 'live migration' with non-shared storage. The catch is that - you must wait patiently for disk blocks to be copied first from source to target. Some people dislike waiting so only use shared-storage when they do live migrations. But .. it is doable .. just requires patience, depending on speed of link between proxmox nodes; and the size of the VM images being pushed through the pipe.

Tim

TomiWebPro · May 5, 2025

fortechitsolutions said:
Footnote on the thread, for what it might be worth.

- Zero Downtime live migration here in proxmox, is not any less or more downtime than you might see with Xen, VMware, etc. There is always ~some millisecond level 'cut' in operation when <old is paused> and <new is lit up>. Generally the delay is such that you may, or may not, notice it if you are constantly pinging the target host being live migrated.

As always, live-migration is contingent on the simple reality that
-- activity inside the VM being migrated, generates RAM deltas on the source host. And possibly deltas on attached disk(s) as well (depends if you are using shared-san-style disk; or not).
-- those changes need to be pushed from Source>>Target before the live-migrate-cut-over can happen
-- if the rate of change inside the VM being migrated - exceeds the bandwidth you have between the 2 physical servers who are the source, target -- then your live-migration will 'stretch out' -- source machine will stay online; deltas will be copied; incrementally / iteratively ; until the target 'catches up' with the source.

So: If your source machine "Is too damn busy" when compared to "available bandwidth for pushing the deltas" - then the live migration will in fact never happen. The cut-over time just keeps getting pushed ahead into the future. I've seen this happen on various environments (OpenStack KVM, Proxmox, etc) - it is a matter of not trying to do the impossible. Since - it is impossible.

Note that proxmox, as most KVM implementations now have this feature - allows 'live migration' with non-shared storage. The catch is that - you must wait patiently for disk blocks to be copied first from source to target. Some people dislike waiting so only use shared-storage when they do live migrations. But .. it is doable .. just requires patience, depending on speed of link between proxmox nodes; and the size of the VM images being pushed through the pipe.

Tim

Hi,

I am wondering for live migration do I need minimum of three nodes or just two, thanks a lot!

SteveITS · May 5, 2025

Well for a cluster you’d want two plus a Qdevice at minimum otherwise when one node is off, the cluster has no quorum and also the other node restarts to “fix” that.

UdoB · May 5, 2025

SteveITS said:
otherwise when one node is off, the cluster has no quorum and also the other node restarts to “fix” that.

Nitpicking: it does "fence" itself only if "High Availability" is active. "The other" node in a cluster without HA won't do that.

fortechitsolutions · May 5, 2025

I just wanted to chime in, I agree that 2 node cluster live-migration is possible, but having 3 nodes or a 3rd 'witness / QDevice' is nice if possible. But not mandatory.

Tim

UdoB · May 5, 2025

fortechitsolutions said:
But not mandatory.

Well... technically you are right.

But: 1) it IS mandatory for High-Availability and 2) you can not administrate the surviving, single node without further tricks.

Without having Quorum it will deny any command. Yes, the workaround "pvecm expected 1" is well known and officially documented, but that is not a usual command and should basically only be used for disaster recovery, not for "normal operation".

Just my 2€¢...

fortechitsolutions · May 5, 2025

yes, and note that HA is different - superset - from "pair of nodes in cluster" or even "three node in cluster".
original question was asking about live migration.

anyhow. I agree, if doing a 2-node cluster you really do want to have a 3-node or a 3-node plus quorum-breaker-service on different linux nearby box.
or just a very pico proxmox node

so there are many ways to make the thing work. And ideally 'best practice' is not just something people dream of.

Tim

Search

Search

Live migration zero downtime

johnnyb911

New Member

eric_f

New Member

LnxBil

Distinguished Member

guletz

Distinguished Member

fireon

Distinguished Member

fortechitsolutions

Renowned Member

TomiWebPro

New Member

SteveITS

Active Member

UdoB

Distinguished Member

fortechitsolutions

Renowned Member

UdoB

Distinguished Member

fortechitsolutions

Renowned Member

We value your privacy