Improvement: Reduce migration downtime to seconds with two step transfer

carsten2 · Jul 7, 2020

The near online migration of a running container could be drastically improved.

Currently the actions are
1) Shutdown locally
2) Migrate volume
3) Startup remote

The data replication could take some time (minutes or hours) in which the container is down.

A very simple but drastic improvement in replication in two steps would be:

1) Snapshot and send snapshot to remote (Container keeps running)
2) Shutdown locally
3) Snapshot and send differential snapshot to remote
4) Startup remote

This way, the downtime comes down to seconds instead of minutes or even hours

The same two step transfer process should be used in containers which are periodically replicated, e.g. ones a day. In this case, the most of the data is already on the remote end but still it might take everal minutes to replicate all changes to the remote end during which the container is down. With the two step replication process this can be reduced to seconds also.

The same process should also be used for VMs.

Rares · Jul 7, 2020

I am using ZFS replication for my containers to replicate every night to another server. Before migration I issue a manual run of the replication task and only then click Migrate to reduce downtime.

LnxBil · Jul 7, 2020

Or use a high available storage so that it is just a shutdown locally and startup remote.

jayg30 · Jul 7, 2020

I don't migrate containers or VM's all that often so maybe I'm missing something.
The proposal would only fit local storage using storage migration, correct?
And it would only be possible with filesystems that support snapshots, correct? So ZFS and...?

The few times I tested "online live migration" in 6.2 of nodes with ZFS local storage I thought this was the behavior. I had already setup a replication job every hour between the nodes though and ping only dropped once during the migration. I assumed it was using the last replication (if one existed) to determine changed data, transmit, then quickly shutdown/suspend and migrate the VM. The whole thing seemed to happen very quickly. Perhaps I'm missing something.

carsten2 · Jul 8, 2020

The suggestion was just the very simple change to always migrate in two steps no matter if there is a replication or not. The effect would be nearly the same, as if you would use the manual workaround:

1) Create a replication job by hand.
2) Start replicate it and wait until is it finished.
3) Immediately Migrate the container
4) Remove the replication job.

But why have a manual work arround, when this should be always done by the proxmox task? Even with the around my suggestion with two step replication would be faster with a container in use, because the difference between the end of the replication job and the migration step would only be a couple of milliseconds instead of seconds or minutes (depending how long you sit in front of the proxmox UI and watching the progess). It also would be similar as in VM migration because these are also done in several steps to minimize the interruption in the last transfer step.

jayg30 · Jul 8, 2020

carsten2 said:
The suggestion was just the very simple change to always migrate in two steps no matter if there is a replication or not. The effect would be nearly the same, as if you would use the manual workaround:

1) Create a replication job by hand.
2) Start replicate it and wait until is it finished.
3) Immediately Migrate the container
4) Remove the replication job.

But why have a manual work arround, when this should be always done by the proxmox task? Even with the around my suggestion with two step replication would be faster with a container in use, because the difference between the end of the replication job and the migration step would only be a couple of milliseconds instead of seconds or minutes (depending how long you sit in front of the proxmox UI and watching the progess). It also would be similar as in VM migration because these are also done in several steps to minimize the interruption in the last transfer step.

So if I understand you correctly, the current behavior is that when performing a container or VM migration from local storage to another nodes local storage, the container or VM goes down for an extended period while the data is being migrated if their isn't already a replication job that has already been running?

I'm just curious where this issue exactly lies. Thanks

EDIT:
I don't think I'm seeing the same behavior. Maybe I'm just not hitting the exact scenario you are? I have a Ubuntu 20.04 KVM virtual machine with guest agent tools installed. It's running on node pve02. I'm migrating it to pve01. Both nodes use ZFS local storage. When I initiate the migration it tells me the "method" is ONLINE. The process shows it begins by drive mirroring. The VM continues to run without issue during the process. It also looks like maybe a RAM transfer process taking place. Once complete there was a <1sec drop in connection of the console. I believe this migration uses the qemu tools (not ZFS). What am I missing here? Does it behave different for a container?

Note that at least in this case when the VM is powered down and migrated it used ZFS snapshots and replication to migrate the data. The data transfer is faster. However I suspect they don't want to use this for online migration for data consistency reasons.

2020-07-08 09:15:17 starting migration of VM 104 to node 'pve01' (192.x,x,x)
2020-07-08 09:15:17 found local disk 'local-zfs:vm-104-disk-0' (in current VM config)
2020-07-08 09:15:17 copying local disk images
2020-07-08 09:15:17 starting VM 104 on remote node 'pve01'
2020-07-08 09:15:20 start remote tunnel
2020-07-08 09:15:21 ssh tunnel ver 1
2020-07-08 09:15:21 starting storage migration
2020-07-08 09:15:21 scsi0: start migration to nbd:unix:/run/qemu-server/104_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 99614720 bytes remaining: 21375221760 bytes total: 21474836480 bytes progression: 0.46 % busy: 1 ready: 0
drive-scsi0: transferred: 214958080 bytes remaining: 21259878400 bytes total: 21474836480 bytes progression: 1.00 % busy: 1 ready: 0
drive-scsi0: transferred: 343932928 bytes remaining: 21130903552 bytes total: 21474836480 bytes progression: 1.60 % busy: 1 ready: 0
drive-scsi0: transferred: 456130560 bytes remaining: 21018705920 bytes total: 21474836480 bytes progression: 2.12 % busy: 1 ready: 0
drive-scsi0: transferred: 578813952 bytes remaining: 20896022528 bytes total: 21474836480 bytes progression: 2.70 % busy: 1 ready: 0
...
...
...
drive-scsi0: transferred: 21024997376 bytes remaining: 453705728 bytes total: 21478703104 bytes progression: 97.89 % busy: 1 ready: 0
drive-scsi0: transferred: 21153972224 bytes remaining: 324730880 bytes total: 21478703104 bytes progression: 98.49 % busy: 1 ready: 0
drive-scsi0: transferred: 21267218432 bytes remaining: 211484672 bytes total: 21478703104 bytes progression: 99.02 % busy: 1 ready: 0
drive-scsi0: transferred: 21381513216 bytes remaining: 97189888 bytes total: 21478703104 bytes progression: 99.55 % busy: 1 ready: 0
drive-scsi0: transferred: 21478703104 bytes remaining: 0 bytes total: 21478703104 bytes progression: 100.00 % busy: 1 ready: 0
drive-scsi0: transferred: 21478703104 bytes remaining: 0 bytes total: 21478703104 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2020-07-08 09:18:39 volume 'local-zfs:vm-104-disk-0' is 'local-zfs:vm-104-disk-0' on the target
2020-07-08 09:18:39 starting online/live migration on unix:/run/qemu-server/104.migrate
2020-07-08 09:18:39 set migration_caps
2020-07-08 09:18:39 migration speed limit: 8589934592 B/s
2020-07-08 09:18:39 migration downtime limit: 100 ms
2020-07-08 09:18:39 migration cachesize: 268435456 B
2020-07-08 09:18:39 set migration parameters
2020-07-08 09:18:39 start migrate command to unix:/run/qemu-server/104.migrate
2020-07-08 09:18:40 migration status: active (transferred 118668942, remaining 2015191040), total 2156732416)
2020-07-08 09:18:40 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-08 09:18:41 migration status: active (transferred 236372021, remaining 1878986752), total 2156732416)
2020-07-08 09:18:41 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-08 09:18:42 migration status: active (transferred 354018265, remaining 1753739264), total 2156732416)
2020-07-08 09:18:42 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2020-07-08 09:18:43 migration status: active (transferred 471943950, remaining 1629908992), total 2156732416)
2020-07-08 09:18:43 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
...
...
...
2020-07-08 09:18:52 migration xbzrle cachesize: 268435456 transferred 3443695 pages 4839 cachemiss 45555 overflow 19
2020-07-08 09:18:52 migration status: active (transferred 1500989729, remaining 93634560), total 2156732416)
2020-07-08 09:18:52 migration xbzrle cachesize: 268435456 transferred 3600948 pages 5026 cachemiss 48472 overflow 19
2020-07-08 09:18:52 migration status: active (transferred 1513120503, remaining 70545408), total 2156732416)
2020-07-08 09:18:52 migration xbzrle cachesize: 268435456 transferred 3685287 pages 5164 cachemiss 51403 overflow 19
2020-07-08 09:18:53 migration status: active (transferred 1525126066, remaining 49360896), total 2156732416)
2020-07-08 09:18:53 migration xbzrle cachesize: 268435456 transferred 3757571 pages 5459 cachemiss 54309 overflow 19
2020-07-08 09:18:53 migration status: active (transferred 1537255947, remaining 10854400), total 2156732416)
2020-07-08 09:18:53 migration xbzrle cachesize: 268435456 transferred 3787989 pages 5626 cachemiss 57246 overflow 21
2020-07-08 09:18:53 migration speed: 9.66 MB/s - downtime 72 ms
2020-07-08 09:18:53 migration status: completed
drive-scsi0: transferred: 21478965248 bytes remaining: 0 bytes total: 21478965248 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0 : finished
2020-07-08 09:18:54 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@192.168.0.244 pvesr set-state 104 \''{}'\'
2020-07-08 09:18:54 stopping NBD storage migration server on target.
2020-07-08 09:19:00 migration finished successfully (duration 00:03:43)

guletz · Jul 8, 2020

jayg30 said:
I'm just curious where this issue exactly lies. Thanks

Hi,

@jayg30 , the initiator of this post discuss about Container migration, and you speak about VM migration ....

Good luck /Bafta !

carsten2 · Jul 9, 2020

Yes, I am talking about container migration. With VMs the migration is now live but with container migration there is an unnesseary extended downtime, because the container is shutdown at the START of the migration instead of keeping it running unteil 99% of the data has been transferred. With the suggested simple change to a two step process, it would be almost no downtime (like in VMs).

guletz · Jul 9, 2020

carsten2 said:
Yes, I am talking about container migration. With VMs the migration is now live but with container migration there is an unnesseary extended downtime, because the container is shutdown at the START of the migration instead of keeping it running unteil 99% of the data has been transferred. With the suggested simple change to a two step process, it would be almost no downtime (like in VMs).

Sorry, the message was for @jayg30! My mistake!!

Your ideea is OK!

jayg30 · Jul 9, 2020

Okay, that was what I was asking...what exact scenarios lead to this behavior. So specific to containers. I don't use containers much so probably why I don't see this.

I support the proposed goal. I wonder how to get to it and a way that proxmox team would actually support.

I know that for KVM the proxmox team didn't want to simply leverage ZFS snapshot/replication or other snapshot capability for live migration of local storage. They were waiting for Qemu to provide the functionality at that level. I wonder if this is a similar case. They're looking for functionality provided by LXC for live migration.

In just some 5 minutes of googling and reading I'm seeing this information;
https://discuss.linuxcontainers.org/t/enable-live-migration/2261
https://www.youtube.com/watch?v=ol85OJxDaHc

And the underlying theme I've read "so far" is that any live migration of lxc/lxd containers would be handled in CRIU. Comments on allowing it without CRIU through renaming as a workaround for now.

Then I read this;
https://www.diva-portal.org/smash/get/diva2:1085809/FULLTEXT02.pdf
On page 15/16 it mentions support of live migration in LXC using CRIU and is supported on LXD.

Very confusing.

jayg30 · Jul 9, 2020

Well it looks like this was mentioned and commented in previously here by the Proxmox staff;

https://forum.proxmox.com/threads/lxc-and-live-migration.35908/

So it would seem like the request here would be to just "force" a snap/replication of a running container if using ZFS (or other supported file system).

ca_maer · Jul 9, 2020

Like carsten2 mentionned you can pretty much get an instant transfer of the LXC container when using replication job. If there is a replication configured to the other node before then when you hit migrate it will shutdown and only send a differential snapshot and restart the CT on the other node. Right now live migration of LXC is not supported. Only VMs.

Assuming you have 2 nodes:

100GB CT running on Node1
Create replication job for CT to Node2. CT is still online during this time and replication is happening in the background
Once the replication job is OK then select migrate to Node2.
The CT will shutdown and a differential snapshot will be sent which is normally only a couple of mb
The CT will start on Node2 and the replication job will now point to Node1

This way, your migration will only take a couple of seconds

fabian · Jul 10, 2020

such a two-phase migration is indeed possible (either if the storage underneath supports incremental sending, such as ZFS, or via rsync of the contents). we use a similar mechanism for the 'suspend' mode for containers in vzdump. it's just not yet implemented for container migration..

carsten2 · Jul 19, 2020

fabian said:
such a two-phase migration is indeed possible (either if the storage underneath supports incremental sending, such as ZFS, or via rsync of the contents). we use a similar mechanism for the 'suspend' mode for containers in vzdump. it's just not yet implemented for container migration..

So I would like to suggest this to be implemented. It seems that almost all code is already there or even used in other scenarious.

SamTzu · Aug 20, 2020

If I remember correctly this used to be the case with OpenVZ migrations before Proxmox moved to LXC.

carsten2 · Sep 6, 2020

https://bugzilla.proxmox.com/show_bug.cgi?id=2984

carsten2 · Feb 14, 2021

Any news on the subject. Several people have the same problem and the fix shoud not too complicated.

Improvement: Reduce migration downtime to seconds with two step transfer

carsten2

Renowned Member

Rares

Renowned Member

LnxBil

Distinguished Member

jayg30

Member

carsten2

Renowned Member

jayg30

Member

guletz

Distinguished Member

carsten2

Renowned Member

guletz

Distinguished Member

jayg30

Member

jayg30

Member

ca_maer

Well-Known Member

fabian

Proxmox Staff Member

carsten2

Renowned Member

SamTzu

Renowned Member

carsten2

Renowned Member

carsten2

Renowned Member

We value your privacy