Online migration

shocker · Jun 22, 2016

Hello,
I'm trying to live migrate VM's from one node to another.
If I'm trying to live migrate manually one node (with right click on it and migrate) I can check the online mark and it's working.
When I want to migrate all VM's from one node to another (with 10 parallels jobs) is not working, first the VM's are going in shut down phase and it's not online.

Is there a way to migrate all VM's from one server to another online in order to perform maintenance ?

Another thing is after a node recovery the migration is not online as well. If the faulty node is recovering and it have a higher group priority the migration process starts automatically to migrate the VM to the highest priority node, this is not online as well. Is there away to enable this somehow?

Thanks,
Alex

shocker · Jun 22, 2016

Reading the roadmap I've found "Linux Containers (LXC) live-migration (experimental)".
Testing again and indeed for VM it's working and for CT is not working even that I have the option for live migration. Is going in shutdown immediately. How can I activate the "experimental" thing?

The other question remains if a host recovers and a VM (not CT) is moving again to the high priority node, then is not doing the online migration. Is there a way to enable this?

Thanks,
Alex

LnxBil · Jun 22, 2016

First of all: What is your shared storage?

Normally we do mass-migration via command line with the qm tool

Code:

for vm in $(qm list | grep running | awk '{print $1}' ); do qm migrate $vm proxmox1 -online; done

shocker · Jun 22, 2016

I think the migrate back was not online due to some updates performed. After I've restarted all nodes everything is working ok now.

The shared storage is NFS. The VM storage is qemu cow2.
Now I'm on the right track and preparing to move in production. Proxmox is just awesome!

The only remaining thing is LXC (with experimental live migration) vs Nested OpenVZ via KVM. What will be the best?
Anyone knows how to enable the experimental LXC live migration?

Thanks,
Alex

tom · Jun 22, 2016

shocker said:
Anyone knows how to enable the experimental LXC live migration?

this not yet implemented.

lweidig · Sep 12, 2016

Any sort of update on the progress of this. We still seem to be worse off with LXC than with VZ based containers. This was a BIG step back for us. Thanks!

Jean-François Dagenais · Sep 16, 2016

I'm running my lxc containers on zfs storage. Having some offline time is not such a big problem. The problem is how long the containers are down. Problem is the long part of the migration is the zfs snapshots transfer.

In theory, it is feasable to:

zfs snapshot the running container
pre-send that snapshot (long time, but container still running) to new node
shutdown container
re-snapshot
then really migrate (should only need to send the delta since the previous snapshot)
then restart container on new node.

Downtime would be minimal.

I can probably fool around and do it with a script, just wondering if it's been done before.

Jean-François Dagenais · Sep 24, 2016

WARNING!! I AM NOT A PERL DEVELOPPER, THIS IS MY FIRST ATTEMPT AT HACKING PERL.

So I made a crude implementation of what I would call "quick stop" (or suspend) pseudo live migration of LXC. Here's the patch ( I will also send on the mailing list):

Code:

Index: /usr/share/perl5/PVE/LXC/Migrate.pm
===================================================================
--- /usr/share/perl5/PVE/LXC/Migrate.pm
+++ /usr/share/perl5/PVE/LXC/Migrate.pm
@@ -31,12 +31,36 @@ sub prepare {
     PVE::LXC::Config->check_lock($conf);
+    # test ssh connection
+    my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
+    eval { $self->cmd_quiet($cmd); };
+    die "Can't connect to destination address using public key\n" if $@;
+
+    PVE::LXC::Config->foreach_mountpoint($conf, sub {
+    my ($ms, $mountpoint, $snapname) = @_;
+
+    my $volid = $mountpoint->{volume};
+        my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
+        my $scfg =  PVE::Storage::storage_config($self->{storecfg}, $sid);
+        if ($scfg->{type} eq 'zfspool') {
+        $self->log('info', "Sending a current snapshot of CT $vmid to remote node '$self->{node}'");
+        eval { PVE::Storage::storage_zfs_pre_migrate($self->{storecfg}, $volid, $self->{nodeip}, $sid); };
+        die "__pre_migration__ failed" if $@;
+        $self->log('info', "current snapshot of CT $vmid done. Moving on with actual migration");
+        }
+    });
+
     my $running = 0;
+    my $wasRunning = 0;
     if (PVE::LXC::check_running($vmid)) {
-    die "lxc live migration is currently not implemented\n";
-
-    die "can't migrate running container without --online\n" if !$online;
-    $running = 1;
+    $self->log('info', "Shutting down CT $vmid...");
+    eval { $self->cmd_quiet([ 'lxc-stop', '-n', "$vmid" ]); };
+    die "Could not lxc-stop the container\n" if $@;
+
+    # make sure container is stopped
+    eval { $self->cmd_quiet([ 'lxc-wait', '-n', "$vmid", '-s', 'STOPPED' ]); };
+    die "lxc-wait failed\n" if $@;
+    $wasRunning = 1;
     }
     my $force = $self->{opts}->{force} // 0;
@@ -78,12 +102,7 @@ sub prepare {
     # todo: test if VM uses local resources
-    # test ssh connection
-    my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
-    eval { $self->cmd_quiet($cmd); };
-    die "Can't connect to destination address using public key\n" if $@;
-
-    return $running;
+    return $wasRunning; # need to return this for phase2 (restart on other node) to occure
}
sub phase1 {
@@ -246,8 +265,8 @@ sub phase1 {
     my $conffile = PVE::LXC::Config->config_file($vmid);
     my $newconffile = PVE::LXC::Config->config_file($vmid, $self->{node});
-    if ($self->{running}) {
-    die "implement me";
+    if (PVE::LXC::check_running($vmid)) {
+    die "Full live LXC migration is not supported.";
     }
     # make sure everything on (shared) storage is unmounted
@@ -280,6 +299,16 @@ sub phase1_cleanup {
     }
}
+sub phase2 {
+    my ($self, $vmid) = @_;
+
+    $self->log('info', "starting CT $vmid on remote node '$self->{node}'");
+
+    my $cmd = [@{$self->{rem_ssh}}, 'pct', 'start', $vmid, '-skiplock' ];
+    eval { $self->cmd($cmd); };
+    $self->log('err', "CT is migrated to '$self->{node}', however it failed to start. Review error and fix yourself...") if $@;
+}
+
sub phase3 {
     my ($self, $vmid) = @_;
Index: /usr/share/perl5/PVE/Storage.pm
===================================================================
--- /usr/share/perl5/PVE/Storage.pm
+++ /usr/share/perl5/PVE/Storage.pm
@@ -578,17 +578,17 @@ sub storage_migrate {
         my $snap = ['zfs', 'snapshot', "$zfspath\@__migration__"];
-        my $send = [['zfs', 'send', '-Rpv', "$zfspath\@__migration__"], ['ssh', "root\@$target_host",
-            'zfs', 'recv', $zfspath]];
+        my $send = [['zfs', 'send', '-pvI', "$zfspath\@__pre_migration__", "$zfspath\@__migration__"], ['ssh', "root\@$target_host",
+            'zfs', 'recv', '-uF', $zfspath]];
-        my $destroy_target = ['ssh', "root\@$target_host", 'zfs', 'destroy', "$zfspath\@__migration__"];
+        my $destroy_target = ['ssh', "root\@$target_host", 'zfs', 'destroy', "$zfspath\@__pre_migration__\%__migration__"];
          run_command($snap);
         eval{
         run_command($send);
         };
         my $err;
         if ($err = $@){
-        run_command(['zfs', 'destroy', "$zfspath\@__migration__"]);
+        run_command(['zfs', 'destroy', "$zfspath\@__pre_migration__\%__migration__"]);
         die $err;
         }
         run_command($destroy_target);
@@ -633,6 +633,58 @@ sub storage_migrate {
     }
}
+sub storage_zfs_pre_migrate {
+    my ($cfg, $volid, $target_host, $target_storeid, $target_volname) = @_;
+
+    my ($storeid, $volname) = parse_volume_id($volid);
+    $target_volname = $volname if !$target_volname;
+
+    my $scfg = storage_config($cfg, $storeid);
+
+    # no need to migrate shared content
+    return if $storeid eq $target_storeid && $scfg->{shared};
+
+    my $tcfg = storage_config($cfg, $target_storeid);
+
+    my $target_volid = "${target_storeid}:${target_volname}";
+
+    my $errstr = "unable to migrate '$volid' to '${target_volid}' on host '$target_host'";
+
+    die "$errstr - source type '$scfg->{type}' not implemented\n" if ($scfg->{type} ne 'zfspool');
+
+    die "$errstr - pool on target does not have the same name as on source!"
+    if $tcfg->{pool} ne $scfg->{pool};
+
+    my $sshoptions = "-o 'BatchMode=yes'";
+    my $ssh = "/usr/bin/ssh $sshoptions";
+
+    #local $ENV{RSYNC_RSH} = $ssh;
+
+    my (undef, $zfsDataset) = parse_volname($cfg, $volid);
+
+    my $zfspath = "$scfg->{pool}\/$zfsDataset";
+
+    my $snap = ['zfs', 'snapshot', "$zfspath\@__pre_migration__"];
+
+    my $send = [['zfs', 'send', '-Rpv', "$zfspath\@__pre_migration__"], ['ssh', "root\@$target_host",
+        'zfs', 'recv', $zfspath]];
+
+    run_command($snap);
+    eval{
+    run_command($send);
+    };
+    my $err;
+    my $pre_migration_snapshot_destroy_cmd = ['zfs', 'destroy', "$zfspath\@__pre_migration__"];
+    if ($err = $@){
+    run_command($pre_migration_snapshot_destroy_cmd);
+    die $err;
+    }
+
+    # return cmds to execute in case to cleanup what was done here
+    return ( $pre_migration_snapshot_destroy_cmd,
+             ['ssh', "root\@$target_host", "zfs", "destroy", "-r", "$zfspath"] );
+}
+
sub vdisk_clone {
     my ($cfg, $volid, $vmid, $snap) = @_;

on debian jessie, pve-... 1.0-75

Basically, on "prepare" phase I make and send a __pre_migration__ snapshot. On phase1, when I get to sending the snapshot, I expect this snapshot to exist and only send an incremental from that. I also note whether the CT was online and use the phase3 hook to restart the node on the target ASAP.

Works well, but certainly doesn't contain all the error handling and cleaning up it deserves. For example, one has to manually delete the __pre_migration snapshots on the source and the whole zfs dataset on the target in case things fail before phase1 completes.

I hope this helps someone. Here's where I maintain the patch (and how I apply it onto /usr after each apt run)
https://github.com/jeff-dagenais/proxpatches/tree/master/patches

Enjoy!

Jean-François Dagenais · Jan 2, 2020

FYI, this never got implemented. However, the following not so obvious steps bit alleviate the original problem, which is the long down time while the transfer is occurring:

establishing a "replication" task to the target node
let this replication proceed at least once (this happens while the container is online)
Do the migration (restart mode), a shutdown occurs still, but at least, the replication has already transferred almost all the data. A quick snapshot update is performed, then the migration occurs quickly and the container is restarted.
the replication is also automatically inverted

Good job guys.

Search

Search

Online migration

shocker

Renowned Member

shocker

Renowned Member

LnxBil

Distinguished Member

shocker

Renowned Member

tom

Proxmox Staff Member

lweidig

Active Member

Jean-François Dagenais

Member

Jean-François Dagenais

Member

Jean-François Dagenais

Member

We value your privacy