Live migration between 4.0 and 4.2 fails in both directions

rkl

New Member
Sep 21, 2014
18
2
3
We skipped Proxmox 4.1, but decided to upgrade our 4.0 Proxmox servers (5 machines in a cluster, with iSCSI shared storage) to 4.2 recently, mainly because of the GUI changes. It should be noted here that 4.0 -> 4.0 live migration has always worked perfectly for us between any combination of the 4.0 servers.

The upgrade of one of the 4.0 servers to 4.2 seemed to go smoothly (well, once we de-installed Dell's OMSA and re-installed it later) and starting/stopping a test VM on the upgraded 4.2 server was fine.

However, testing live migration of a CentOS 6 VM between a 4.2 server and a 4.0 server (critical to avoid downtime during upgrades) was a total failure in both directions. Here's the messages from a live migration (4.0 -> 4.2) with the IP obscured as A.B.C.D:

Jun 03 14:06:00 starting migration of VM 143 to node 'proxmox04' (A.B.C.D)
Jun 03 14:06:00 copying disk images
Jun 03 14:06:00 starting VM 143 on remote node 'proxmox04'
Jun 03 14:06:03 ERROR: online migrate failure - unable to detect remote migration address
Jun 03 14:06:03 aborting phase 2 - cleanup resources
Jun 03 14:06:03 migrate_cancel
Jun 03 14:06:04 ERROR: migration finished with problems (duration 00:00:05)
migration problems

And there's a different error message going from 4.2 -> 4.0:

Jun 03 14:20:54 starting migration of VM 143 to node 'proxmox02' (A.B.C.D)
Jun 03 14:20:54 copying disk images
Jun 03 14:20:54 starting VM 143 on remote node 'proxmox02'
Jun 03 14:20:55 start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 143 -p 'CPUShares=1000' /usr/bin/kvm -id 143 [stuff deleted] -machine 'type=pc-i440fx-2.5' -incoming 'tcp:[localhost]:60000' -S' failed: exit code 1
Jun 03 14:20:55 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@A.B.C.D qm start 143 --stateuri tcp --skiplock --migratedfrom proxmox04 --machine pc-i440fx-2.5' failed: exit code 255
Jun 03 14:20:55 aborting phase 2 - cleanup resources
Jun 03 14:20:55 migrate_cancel
Jun 03 14:20:55 ERROR: migration finished with problems (duration 00:00:01)
migration problems

It's not the first time we've had these issues live migrating during an upgrade procedure (and other users have too) and I suspect it won't be last. Does anyone else have issues with live migration between different minor 4.X releases? Do the devs test such combinations before a minor update is released? It's looking like we're going to have to do offline migrations for dozens of VMs because of this issue - downtime for each and every migration, which is very annoying indeed.
 
there was a bug in qemu-server that requires an upgrade of that package on both sides for (live) migration to work. it should work without needing to restart the VM.
 
there was a bug in qemu-server that requires an upgrade of that package on both sides for (live) migration to work. it should work without needing to restart the VM.
The snag with "apt-get update qemu-server" on 4.0 is that it brings in the entire 4.2 release as an update, which is a little risky with running VMs on a 4.0 server. I found these commands did the trick on 4.0 though:

apt-get update
apt-get install qemu-server pve-cluster pve-manager libpve-common-perl

This made the 4.0 -> 4.2 live migration work, though the other way around (4.2 -> 4.0) still fails, but we don't need that luckily enough. Obviously, I'll complete the 4.0 -> 4.2 upgrade via apt-get once all the VMs are live-migrated off the 4.0 machine.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!