No online migration possible to upgraded cluster nodes

woodstock

Renowned Member
Feb 18, 2016
45
2
73
Hi all,

the latest package updates also include a new kernel and today I observed this:

While upgrading all Nodes in our 8-node cluster (version 4.1) I was unable to do a online migration to already upgraded nodes.
This applied to HA and non-HA VMs and also to Containers.
Offline migrations did work.

Questions:
Is this expected or did I miss something?
How can we do a rollung upgrade of all cluster nodes without downtimes?

Thanks.

Versions before upgrade:
Code:
proxmox-ve: 4.1-34 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-5 (running version: 4.1-5/f910ef5c)
pve-kernel-4.2.6-1-pve: 4.2.6-34
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-30
qemu-server: 4.0-46
pve-firmware: 1.1-7
libpve-common-perl: 4.0-43
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-21
pve-container: 1.0-37
pve-firewall: 2.0-15
pve-ha-manager: 1.0-18
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie

Versions after upgrade:
Code:
proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-37
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-32
qemu-server: 4.0-55
pve-firmware: 1.1-7
libpve-common-perl: 4.0-48
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-40
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-5
pve-container: 1.0-44
pve-firewall: 2.0-17
pve-ha-manager: 1.0-21
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie
 
Sorry, I forgot to test from console. So all I have are the task details from gui.
Not sure if that helps:

Code:
task started by HA resource agent
Feb 22 09:55:43 starting migration of VM 101 to node 'pve***2' (xxx.yyy.zzz.12)
Feb 22 09:55:43 copying disk images
Feb 22 09:55:43 starting VM 101 on remote node 'pve***2'
Feb 22 09:55:46 starting ssh migration tunnel
Feb 22 09:55:46 starting online/live migration on localhost:60000
Feb 22 09:55:46 migrate_set_speed: 8589934592
Feb 22 09:55:46 migrate_set_downtime: 0.1
Feb 22 09:55:48 ERROR: online migrate failure - aborting
Feb 22 09:55:48 aborting phase 2 - cleanup resources
Feb 22 09:55:48 migrate_cancel
Feb 22 09:55:49 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems

Code:
Feb 22 09:32:12 starting migration of VM 103 to node 'pve***3' (xxx.yyy.zzz.13)
Feb 22 09:32:12 copying disk images
Feb 22 09:32:12 starting VM 103 on remote node 'pve***3'
Feb 22 09:32:13 starting ssh migration tunnel
Feb 22 09:32:14 starting online/live migration on localhost:60000
Feb 22 09:32:14 migrate_set_speed: 8589934592
Feb 22 09:32:14 migrate_set_downtime: 0.1
Feb 22 09:32:16 ERROR: online migrate failure - aborting
Feb 22 09:32:16 aborting phase 2 - cleanup resources
Feb 22 09:32:16 migrate_cancel
Feb 22 09:32:17 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems
 
Ok, they are chance that it could be related to qemu upgrade from 2.4 to 2.5.

We have found a bug recently, but it was for migrateback from qemu 2.5 to 2.4 , but maybe it's related.
can you try to install on target node:
http://odisoweb1.odiso.net:/pve-qemu-kvm_2.5-7_amd64.deb

and test.



Before doing that, if you want to help for debug,

you can check the kvm command on the source host:

(ps -aux|grep kvm|grep vmid).

Then, on target host, copy/paste the kvm command line, and add at the end

-machine pc-i440fx-2.4 -incoming tcp:hostipaddress:60000 -S
and remove "-daemonize"
and add PVE_MIGRATED_FROM="yourfirstnodehostname" at the begin


PVE_MIGRATED_FROM="node1" kvm ........ -machine pc-i440fx-2.4 -incoming tcp:hostipaddress:60000 -S


It should start the vm in pause mode, waiting for migration. (keep you ssh console open here)


Then, in the source node, in the vm monitor in gui, send
"migrate tcp:targethostip:60000"

it should start the migrate, and if something bad happen, you should have the log in the target process crash.
 
Last edited:
can you try to edit on the source node
/usr/share/perl5/PVE/QemuServer.pm

edit
Code:
sub qemu_machine_pxe {
    my ($vmid, $conf, $machine) = @_;

    $machine =  PVE::QemuServer::get_current_qemu_machine($vmid) if !$machine;

    foreach my $opt (keys %$conf) {
        next if $opt !~ m/^net(\d+)$/;
        my $net = PVE::QemuServer::parse_net($conf->{$opt});
        next if !$net;
        my $romfile = PVE::QemuServer::vm_mon_cmd_nocheck($vmid, 'qom-get', path => $opt, property => 'romfile');
        return $machine.".pxe" if $romfile =~ m/pxe/;
        last;
    }

}

and add at the end "return $machine"

Code:
sub qemu_machine_pxe {
    my ($vmid, $conf, $machine) = @_;

    $machine =  PVE::QemuServer::get_current_qemu_machine($vmid) if !$machine;

    foreach my $opt (keys %$conf) {
        next if $opt !~ m/^net(\d+)$/;
        my $net = PVE::QemuServer::parse_net($conf->{$opt});
        next if !$net;
        my $romfile = PVE::QemuServer::vm_mon_cmd_nocheck($vmid, 'qom-get', path => $opt, property => 'romfile');
        return $machine.".pxe" if $romfile =~ m/pxe/;
        last;
    }
   return $machine
}

then restart

/etc/init.d/pvedaemon restart

and start the migration again
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!