proxmox 5.0 live migration fails

grin · Jul 11, 2017

This doesn't seem to be the already mentioned ssh problem (which is at 1:7.4p1-10 anyway):

Jul 11 18:34:38 copying disk images
Jul 11 18:34:38 starting VM 103 on remote node 'bowie'
Jul 11 18:34:40 start remote tunnel
Jul 11 18:34:40 starting online/live migration on unix:/run/qemu-server/103.migrate
Jul 11 18:34:40 migrate_set_speed: 8589934592
Jul 11 18:34:40 migrate_set_downtime: 0.1
Jul 11 18:34:40 set migration_caps
Jul 11 18:34:40 set cachesize: 214748364
Jul 11 18:34:40 start migrate command to unix:/run/qemu-server/103.migrate
Jul 11 18:34:42 migration status error: failed
Jul 11 18:34:42 ERROR: online migrate failure - aborting
Jul 11 18:34:42 aborting phase 2 - cleanup resources
Jul 11 18:34:42 migrate_cancel
Jul 11 18:34:44 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems

proxmox-ve: 5.0-15 (running kernel: 4.10.15-1-pve)
pve-manager: 5.0-23 (running version: 5.0-23/af4267bf)
pve-kernel-4.10.15-1-pve: 4.10.15-15
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-10
qemu-server: 5.0-12
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-5
libpve-storage-perl: 5.0-12
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-8
pve-qemu-kvm: 2.9.0-2
pve-container: 2.0-14
pve-firewall: 3.0-1
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve2
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90

drz-support · Jul 12, 2017

What are the names of 'bowie's' cluster brothers?

I had the same problem, with Linux VM's and live migration. No problems with Windows VM's.

Maybe it's the same problem with the default display.

A workaround is to add the following code to the VM config

Code:

vga: cirrus

The better solution is shutdown the machine, offline migrate und start it again. Then you don't have to add any code to the configs.

Best
Andreas

grin · Jul 12, 2017

drz-support said:
What are the names of 'bowie's' cluster brothers?

Why, proudly Freddy and Elton! ;-)
[Someone joked about that along the line "they would fire me when I would name the servers as…" and I was, like, fuck yeah I can do that. ;-)]

drz-support said:
I had the same problem, with Linux VM's and live migration. No problems with Windows VM's.

Maybe it's the same problem with the default display.

A workaround is to add the following code to the VM config

Code:

vga: cirrus

Okay, I said that your advice is obviously bullshit, why would the display(!!) setting matter when piping over the memory through an ssh tunnel!! Until the time when I have actually tried it (set display from default to StandardVGA on GUI and restarted the VM; this is a lab test fortunately). And it works, for chrissake.

So thanks!

And proxmox folks, please fix that, it's horrible.

dcsapak · Jul 12, 2017

grin said:
And proxmox folks, please fix that, it's horrible.

a fix/workaround is already in the works,
see here https://pve.proxmox.com/pipermail/pve-devel/2017-July/027699.html
and here https://pve.proxmox.com/pipermail/pve-devel/2017-July/027703.html

grin · Jul 12, 2017

Apart from that there has been no real problems so far upgrading 4.3 + ceph jewel to 5.0 + lumi (12.1.0). I even let all the stuff running on the server bing in-place upgraded, no problem observed, they were moved away before reboot.

t.lamprecht · Jul 12, 2017

grin said:
Okay, I said that your advice is obviously bullshit, why would the display(!!) setting matter when piping over the memory through an ssh tunnel!! Until the time when I have actually tried it (set display from default to StandardVGA on GUI and restarted the VM; this is a lab test fortunately). And it works, for chrissake.

Just for the record, your right: how the memory gets piped over SSH does not get changed by setting display or not, this wasn't the problem here.
But we have to start the VM already on the target node (in an frozen state) before we start to copy over memory.
And as on the PVE 5 target the default display type was changed (with a lot good reasons - security and feature wise - but with an missing check for 4.4 migrations) the VM (if not windows) was started with another display type on the target node.
Now you can imagine that this cannot work, the source send over all memory and information from a cirrus display device but the target expects another (std) one, which has a complete different layout.
It's like you want to plug a USB in a floppy drive - both can store things but they are quite a bit incompatible to each other (as a simple analogy).
We hope that we can include the fix soon, sorry for any inconvenience caused in the mean time

t.lamprecht · Jul 17, 2017

This was addressed in qemu-server 5.0-13 which is available in the pvetest repository and – as not many changes were made (only small bug fixes beside this) – it will be moved to the community repo probably quite soon.

Search

Search

proxmox 5.0 live migration fails

grin

Renowned Member

drz-support

Renowned Member

grin

Renowned Member

dcsapak

Proxmox Staff Member

grin

Renowned Member

t.lamprecht

Proxmox Staff Member

t.lamprecht

Proxmox Staff Member