Proxmox Cluster 4.4 to 5.1 upgrade - VM migration problem

David Herselman · Nov 15, 2017

I may have jinxed myself in the previous forum post...

We have been busy upgrading a Proxmox 4.4 Ceph cluster from v4.4 to v5.1. First system is up and running v5.1 and Ceph is consistent again. We want to live migrate VMs from v4.4 to v5.1, to work on the next node.

When attempting to migrate VMs from kvm2 (v4.4) to kvm1 (v5.1) we receive the following error:
task started by HA resource agent

Code:

task started by HA resource agent
Nov 15 14:32:58 starting migration of VM 101 to node 'kvm1' (192.10.5.37)
Nov 15 14:32:58 copying disk images
Nov 15 14:32:58 starting VM 101 on remote node 'kvm1'
Nov 15 14:33:02 start remote tunnel
Nov 15 14:33:03 starting online/live migration on unix:/run/qemu-server/101.migrate
Nov 15 14:33:03 migrate_set_speed: 8589934592
Nov 15 14:33:03 migrate_set_downtime: 0.1
Nov 15 14:33:03 set migration_caps
Nov 15 14:33:03 set cachesize: 429496729
Nov 15 14:33:03 start migrate command to unix:/run/qemu-server/101.migrate
Nov 15 14:33:05 migration status error: failed
Nov 15 14:33:05 ERROR: online migrate failure - aborting
Nov 15 14:33:05 aborting phase 2 - cleanup resources
Nov 15 14:33:05 migrate_cancel
Nov 15 14:33:07 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems

PS: We really, really want to avoid having to schedule downtime on the individual VMs and yes, we did add 'vga: cirrus' to the top of the VM configuration file but it was modified as follows when it was migrated from kvm1 to kvm2 (when both were still v4.4):

Code:

bootdisk: virtio0
cores: 2
ide2: file=none,media=cdrom
memory: 4096
name: gayatricpt-vip
net0: virtio=00:16:3e:5f:00:05,bridge=vmbr0
numa: 1
onboot: 1
ostype: win8
protection: 1
scsihw: virtio-scsi-pci
smbios1: uuid=3a04b7a5-084a-4c2e-81b3-7675e5327b48
sockets: 1
startup: order=2
vga: cirrus
virtio0: virtuals:vm-101-disk-1,cache=writeback,size=80G

[PENDING]
localtime: 1

David Herselman · Nov 15, 2017

This appears to be requested fairly often, so proactively supplied:

kvm1 (v5.1):

Code:

[root@kvm1 ceph]# pveversion -v
proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-26
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9

kvm2 (v4.4):

Code:

[root@kvm2 ~]# pveversion -v
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-53
qemu-server: 4.0-113
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80

David Herselman · Nov 15, 2017

Just to be clear about the problem:

We successfully migrated VMs from v4.4 to another v4.4 host prior to initiating updates to v5.1
We could not migrate VMs on a v4.4 host to a v5.1 host
We resorted to shutting down the VMs and doing an offline migration, which worked
We could subsequently migrate VMs from v5.1 to another v5.1 host

This was a small cluster, our main cluster runs approximately 200 VMs and we would like this quirk to be resolved to avoid having to book extended maintenance windows.

Summary: We are not able to live migrate VMs from v4.4 to v5.1 once a node has been upgraded.

dcsapak · Nov 15, 2017

did you verify that the vms use the cirrus parameter ?
afair, win8 machines use std by default (not cirrus) so that might be the issue here

David Herselman · Nov 15, 2017

Yip, we pro-actively set all to 'vga: cirrus' and most guests are Linux based...

We have 3 clusters, doing the upgrade on the other small cluster tonight so I'll be able to report whether the issue occurs there as well...

David Herselman · Nov 15, 2017

Same problem confirmed on our second cluster. We have upgraded Ceph to Luminous on our third (primary) cluster and won't be able to proceed with the Proxmox 5.1 upgrade unless we either schedule an extended maintenance window (to shutdown and start over 200 VMs) or hear from Proxmox that the live migration problem from v4.4 to v5.1 has been resolved.

David Herselman · Nov 22, 2017

Live migration on PVE 5.1 is a concern, 1 in 5 Linux based routers lock up during migration, all Linux based Check Point vSEC gateways lockup during migration and Windows servers with more than 8GB of RAM either fail migration or are also unresponsive after migration.

robhost · Dec 8, 2017

Any news on that? If live migrating from 4.4 to 5.1 possible?
We would like to do rolling upgrades on some of our clusters and it would be nice to to so with zero VM downtime.

tom · Dec 9, 2017

robhost said:
Any news on that? If live migrating from 4.4 to 5.1 possible?
We would like to do rolling upgrades on some of our clusters and it would be nice to to so with zero VM downtime.

I did an inplace upgrade of an 4.4 Cluster with Ceph (3 nodes) and live migrated step by step all VMs, no downtime for me.

robhost · Dec 9, 2017

tom said:
I did an inplace upgrade of an 4.4 Cluster with Ceph (3 nodes) and live migrated step by step all VMs, no downtime for me.

Sounds great, thanks

Search

Search

Proxmox Cluster 4.4 to 5.1 upgrade - VM migration problem

David Herselman

Renowned Member

David Herselman

Renowned Member

David Herselman

Renowned Member

dcsapak

Proxmox Staff Member

David Herselman

Renowned Member

David Herselman

Renowned Member

David Herselman

Renowned Member

robhost

Active Member

tom

Proxmox Staff Member

robhost

Active Member

We value your privacy