live migration between 4.3.71 + 4.3-72

udo

Distinguished Member
Apr 22, 2009
5,975
196
163
Ahrensburg; Germany
Hi,
after upgrade to the actual pve-enterprise repository I can't live migrate VMs from an "old" node to an actual one.
Code:
qm migrate 106 pve03 --online
Dec 02 16:56:38 starting migration of VM 106 to node 'pve03' (10.1.1.13)
Dec 02 16:56:38 copying disk images
Dec 02 16:56:38 starting VM 106 on remote node 'pve03'
Dec 02 16:56:41 starting online/live migration on tcp:localhost:60000
Dec 02 16:56:41 migrate_set_speed: 8589934592
Dec 02 16:56:41 migrate_set_downtime: 0.1
Dec 02 16:56:41 set migration_caps
Dec 02 16:56:41 set cachesize: 214748364
Dec 02 16:56:41 start migrate command to tcp:localhost:60000
Dec 02 16:56:43 migration status error: failed
Dec 02 16:56:43 ERROR: online migrate failure - aborting
Dec 02 16:56:43 aborting phase 2 - cleanup resources
Dec 02 16:56:43 migrate_cancel
Dec 02 16:56:44 ERROR: migration finished with problems (duration 00:00:06)
migration problems
are there any tricks to migrate an online VM during the upgrade process?

Udo
 
Please can you also post the VM config? What type of storage do you use?
Hi Dietmar,
here is the config (but it happens on all VMs). All VMs use ceph-storage.
Code:
boot: cdn
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom
memory: 2048
name: ubuntu
net0: virtio=01:CB:FF:86:D7:AB,bridge=vmbr0,tag=4
ostype: l26
scsihw: virtio-scsi-pci
sockets: 1
virtio0: pve-ceph:vm-106-disk-1,size=10G
virtio1: pve-ceph:vm-106-disk-2,backup=0,size=20G
But perhaps it's has to do with an version mix? because I switched from pve-no-subscrition during install to pve-enterprise some weeks ago:
Code:
pveversion -v
proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve)
pve-manager: 4.3-9 (running version: 4.3-9/f7c6f0cd)
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-92
pve-firmware: 1.1-10
libpve-common-perl: 4.0-79
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-12
pve-qemu-kvm: 2.7.0-4
pve-container: 1.0-80
pve-firewall: 2.0-31
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.5-1
lxcfs: 2.0.4-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
openvswitch-switch: 2.5.0-1
ceph: 0.94.9-1~bpo80+1
Udo
 
This is a vm config of mine:
#eth0%3A 172.16.1.11
agent: 1
balloon: 512
boot: cdn
bootdisk: scsi0
cores: 1
cpu: Opteron_G5
ide2: none,media=cdrom
memory: 1024
name: git
net0: virtio=D2:C8:FF:5C:E0:61,bridge=vmbr10
numa: 0
onboot: 1
ostype: l26
scsi0: omnios_ib:vm-156-disk-1,size=4G
scsi1: omnios_ib:vm-156-disk-2,size=10G
scsihw: virtio-scsi-single
smbios1: uuid=4644e129-02f6-4ada-afab-b40982265406
sockets: 1
startup: order=7
vga: qxl

In my case there was no mix version issue as I upgraded strict between last pve-enterprise to latest pve-enterprise. Storage is ZFS_over_iSCSI.
 
Do you use 'migration_unsecure' in /etc/pve/datacenter.cfg ?

can you try updating also the qemu-server package on the "old" node and setting the migration type directly, on a test VM:
Code:
qm migrate VMID NODE --online --migration_type TYPE
where TYPE is "insecure" or "secure", depending on what you want, probably its insecure.

You should be able to solve this by updating qemu-server and pve-cluster on the "old" node in advance, then doing a normal migration.

It seems like a bug which triggers when unsecure_migration is set and a migration from a node with qemu server <= 4.0-92 to a node with qemu server >= 4.0-93 :/
 
Look at my mail at the developer list. I tried both with and without unsecure_migration they both failed
1) With unsecure_migration migration completely failed
2) Without unsecure_migration migration succeeded but VM was not able to start on the new node
 
1) With unsecure_migration migration completely failed

This is the bug I could reproduce and for which I sent also a fix to the pve-devel list.

2) Without unsecure_migration migration succeeded but VM was not able to start on the new node

Code:
[...]
Dec 01 22:42:33 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Dec 01 22:42:35 migration speed: 256.00 MB/s - downtime 48 ms
Dec 01 22:42:35 migration status: completed
Dec 01 22:42:35 migration status: completed
Dec 01 22:42:37 ERROR: VM 156 not running
[...]

This looks like another problem, I cannot reproduce that currently, I tried it with the qemu-server version 4.0-91 to another node with 4.0-96, secure works here.
Can you please try it with another test VM if you have not already done this?
 
For the record my entire cluster is upgraded. To the migration problem. With unsecure_migration disabled migration has a success rate at about 25% so it did not fail in a reproduceable way and there was nothing in common between which VM succeeded and which did not :-(
 
Using migration_unsecure: 1 in datacenter.cfg.

Tried to patch just qemu-server from enterprise repo on an 'old' node n2, but migration to a 'new' node n7 still fails with:

Use of uninitialized value $migration_type in string eq at /usr/share/perl5/PVE/QemuServer.pm line 4478.

What would be best to do so not to break future enterprise updates?

Old node:
root@n2:~# pveversion --verbose
proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve)
pve-manager: 4.3-9 (running version: 4.3-9/f7c6f0cd)
pve-kernel-4.4.21-1-pve: 4.4.21-71
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-96
pve-firmware: 1.1-10
libpve-common-perl: 4.0-79
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-12
pve-qemu-kvm: 2.7.0-4
pve-container: 1.0-80
pve-firewall: 2.0-31
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.5-1
lxcfs: 2.0.4-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
openvswitch-switch: 2.5.0-1

New node:
root@n7:~# pveversion --verbose
proxmox-ve: 4.3-72 (running kernel: 4.4.24-1-pve)
pve-manager: 4.3-12 (running version: 4.3-12/6894c9d9)
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.24-1-pve: 4.4.24-72
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-47
qemu-server: 4.0-96
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-17
pve-qemu-kvm: 2.7.0-8
pve-container: 1.0-85
pve-firewall: 2.0-31
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-1
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
openvswitch-switch: 2.6.0-2
 
Any other suggestions to get out of this mess where live migrations doesn't work?

Currently I can migrate off node w/qemu-server 4.0-100 to nodes w/qemu-server 4.0.92 (though not the reverse) also not from 4.0-100 to node n7 w/latest qemu-server 4.0-96 and pve-cluster 4.0-47.
 
Reverting to secure migration worked, so I commented out this in /etc/pve/datacenter.cfg:
#migration_unsecure: 1
or set it to '0'
after all nodes got to 4.3-73 w/qemu-server 4.0-96, unsecure migrations seems to work again :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!