Proxmox Cluster 4.4 to 5.1 upgrade - VM migration problem

Jun 8, 2016
344
69
68
47
Johannesburg, South Africa
I may have jinxed myself in the previous forum post... ;)

We have been busy upgrading a Proxmox 4.4 Ceph cluster from v4.4 to v5.1. First system is up and running v5.1 and Ceph is consistent again. We want to live migrate VMs from v4.4 to v5.1, to work on the next node.

When attempting to migrate VMs from kvm2 (v4.4) to kvm1 (v5.1) we receive the following error:
task started by HA resource agent
Code:
task started by HA resource agent
Nov 15 14:32:58 starting migration of VM 101 to node 'kvm1' (192.10.5.37)
Nov 15 14:32:58 copying disk images
Nov 15 14:32:58 starting VM 101 on remote node 'kvm1'
Nov 15 14:33:02 start remote tunnel
Nov 15 14:33:03 starting online/live migration on unix:/run/qemu-server/101.migrate
Nov 15 14:33:03 migrate_set_speed: 8589934592
Nov 15 14:33:03 migrate_set_downtime: 0.1
Nov 15 14:33:03 set migration_caps
Nov 15 14:33:03 set cachesize: 429496729
Nov 15 14:33:03 start migrate command to unix:/run/qemu-server/101.migrate
Nov 15 14:33:05 migration status error: failed
Nov 15 14:33:05 ERROR: online migrate failure - aborting
Nov 15 14:33:05 aborting phase 2 - cleanup resources
Nov 15 14:33:05 migrate_cancel
Nov 15 14:33:07 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems

PS: We really, really want to avoid having to schedule downtime on the individual VMs and yes, we did add 'vga: cirrus' to the top of the VM configuration file but it was modified as follows when it was migrated from kvm1 to kvm2 (when both were still v4.4):
Code:
bootdisk: virtio0
cores: 2
ide2: file=none,media=cdrom
memory: 4096
name: gayatricpt-vip
net0: virtio=00:16:3e:5f:00:05,bridge=vmbr0
numa: 1
onboot: 1
ostype: win8
protection: 1
scsihw: virtio-scsi-pci
smbios1: uuid=3a04b7a5-084a-4c2e-81b3-7675e5327b48
sockets: 1
startup: order=2
vga: cirrus
virtio0: virtuals:vm-101-disk-1,cache=writeback,size=80G

[PENDING]
localtime: 1
 
Last edited:
This appears to be requested fairly often, so proactively supplied:

kvm1 (v5.1):
Code:
[root@kvm1 ceph]# pveversion -v
proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-26
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9

kvm2 (v4.4):
Code:
[root@kvm2 ~]# pveversion -v
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-53
qemu-server: 4.0-113
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
 
Just to be clear about the problem:
  • We successfully migrated VMs from v4.4 to another v4.4 host prior to initiating updates to v5.1
  • We could not migrate VMs on a v4.4 host to a v5.1 host
  • We resorted to shutting down the VMs and doing an offline migration, which worked
  • We could subsequently migrate VMs from v5.1 to another v5.1 host

This was a small cluster, our main cluster runs approximately 200 VMs and we would like this quirk to be resolved to avoid having to book extended maintenance windows.

Summary: We are not able to live migrate VMs from v4.4 to v5.1 once a node has been upgraded.
 
did you verify that the vms use the cirrus parameter ?
afair, win8 machines use std by default (not cirrus) so that might be the issue here
 
Same problem confirmed on our second cluster. We have upgraded Ceph to Luminous on our third (primary) cluster and won't be able to proceed with the Proxmox 5.1 upgrade unless we either schedule an extended maintenance window (to shutdown and start over 200 VMs) or hear from Proxmox that the live migration problem from v4.4 to v5.1 has been resolved.
 
Last edited:
Any news on that? If live migrating from 4.4 to 5.1 possible?
We would like to do rolling upgrades on some of our clusters and it would be nice to to so with zero VM downtime.
 
Any news on that? If live migrating from 4.4 to 5.1 possible?
We would like to do rolling upgrades on some of our clusters and it would be nice to to so with zero VM downtime.

I did an inplace upgrade of an 4.4 Cluster with Ceph (3 nodes) and live migrated step by step all VMs, no downtime for me.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!