proxmox 5.0 live migration fails

grin

Renowned Member
Dec 8, 2008
172
21
83
Hungary
grin.hu
This doesn't seem to be the already mentioned ssh problem (which is at 1:7.4p1-10 anyway):

Jul 11 18:34:38 copying disk images
Jul 11 18:34:38 starting VM 103 on remote node 'bowie'
Jul 11 18:34:40 start remote tunnel
Jul 11 18:34:40 starting online/live migration on unix:/run/qemu-server/103.migrate
Jul 11 18:34:40 migrate_set_speed: 8589934592
Jul 11 18:34:40 migrate_set_downtime: 0.1
Jul 11 18:34:40 set migration_caps
Jul 11 18:34:40 set cachesize: 214748364
Jul 11 18:34:40 start migrate command to unix:/run/qemu-server/103.migrate
Jul 11 18:34:42 migration status error: failed
Jul 11 18:34:42 ERROR: online migrate failure - aborting

Jul 11 18:34:42 aborting phase 2 - cleanup resources
Jul 11 18:34:42 migrate_cancel
Jul 11 18:34:44 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems​


proxmox-ve: 5.0-15 (running kernel: 4.10.15-1-pve)
pve-manager: 5.0-23 (running version: 5.0-23/af4267bf)
pve-kernel-4.10.15-1-pve: 4.10.15-15
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-10
qemu-server: 5.0-12
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-5
libpve-storage-perl: 5.0-12
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-8
pve-qemu-kvm: 2.9.0-2
pve-container: 2.0-14
pve-firewall: 3.0-1
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve2
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
 
What are the names of 'bowie's' cluster brothers?:)

I had the same problem, with Linux VM's and live migration. No problems with Windows VM's.

Maybe it's the same problem with the default display.

A workaround is to add the following code to the VM config
Code:
vga: cirrus

The better solution is shutdown the machine, offline migrate und start it again. Then you don't have to add any code to the configs.

Best
Andreas
 
  • Like
Reactions: grin
What are the names of 'bowie's' cluster brothers?:)
Why, proudly Freddy and Elton! ;-)
[Someone joked about that along the line "they would fire me when I would name the servers as…" and I was, like, fuck yeah I can do that. ;-)]

I had the same problem, with Linux VM's and live migration. No problems with Windows VM's.

Maybe it's the same problem with the default display.

A workaround is to add the following code to the VM config
Code:
vga: cirrus

Okay, I said that your advice is obviously bullshit, why would the display(!!) setting matter when piping over the memory through an ssh tunnel!! Until the time when I have actually tried it (set display from default to StandardVGA on GUI and restarted the VM; this is a lab test fortunately). And it works, for chrissake.

So thanks! :)

And proxmox folks, please fix that, it's horrible. :)
 
  • Like
Reactions: grin
Apart from that there has been no real problems so far upgrading 4.3 + ceph jewel to 5.0 + lumi (12.1.0). I even let all the stuff running on the server bing in-place upgraded, no problem observed, they were moved away before reboot.
 
Okay, I said that your advice is obviously bullshit, why would the display(!!) setting matter when piping over the memory through an ssh tunnel!! Until the time when I have actually tried it (set display from default to StandardVGA on GUI and restarted the VM; this is a lab test fortunately). And it works, for chrissake.

Just for the record, your right: how the memory gets piped over SSH does not get changed by setting display or not, this wasn't the problem here.
But we have to start the VM already on the target node (in an frozen state) before we start to copy over memory.
And as on the PVE 5 target the default display type was changed (with a lot good reasons - security and feature wise - but with an missing check for 4.4 migrations) the VM (if not windows) was started with another display type on the target node.
Now you can imagine that this cannot work, the source send over all memory and information from a cirrus display device but the target expects another (std) one, which has a complete different layout.
It's like you want to plug a USB in a floppy drive - both can store things but they are quite a bit incompatible to each other (as a simple analogy).
We hope that we can include the fix soon, sorry for any inconvenience caused in the mean time :)
 
  • Like
Reactions: grin
This was addressed in qemu-server 5.0-13 which is available in the pvetest repository and – as not many changes were made (only small bug fixes beside this) – it will be moved to the community repo probably quite soon.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!