Live migration with local storage gives an error

Kenneth_H

Well-Known Member
Feb 28, 2017
30
1
48
31
Hi
Earlier this week, I provisioned two new PVE 5.1 hosts with OVH and using ZFS for the local storage pools.
This works fine and VMs are running really fast.
I then wanted to try the new live migration with local storage that was introduced with PVE 5.0, but it did not seem to be working, although the same storage (local-zfs) exists on both nodes in my two-node cluster.
I have checked that I can move the local VM disks to other types of storage without shutting down the VM, but moving to another host does not seem working.
Here is the error output when trying to live migrate between my two hosts:
Code:
2017-10-29 17:19:29 starting migration of VM 202 to node 'ns6735811' (172.16.0.1)
2017-10-29 17:19:29 found local disk 'local-zfs:vm-202-disk-1' (in current VM config)
2017-10-29 17:19:29 can't migrate local disk 'local-zfs:vm-202-disk-1': can't live migrate attached local disks without with-local-disks option
2017-10-29 17:19:29 ERROR: Failed to sync data - can't migrate VM - check log
2017-10-29 17:19:29 aborting phase 1 - cleanup resources
2017-10-29 17:19:29 ERROR: migration aborted (duration 00:00:00): Failed to sync data - can't migrate VM - check log
TASK ERROR: migration aborted

And here is the config file of one of the VMs that I cannot move:
Code:
agent: 1
bootdisk: scsi0
cores: 1
hotplug: disk,network,usb,memory,cpu
ide2: none,media=cdrom
memory: 2048
name: ns2.mydomain.com
net0: virtio=03:00:10:fb:a5:e6,bridge=vmbr0
numa: 1
onboot: 1
ostype: l26
scsi0: local-zfs:vm-202-disk-1,cache=writethrough,size=40G
scsihw: virtio-scsi-pci
smbios1: uuid=19630614-03a5-4998-ac00-1826c68416ba
sockets: 1
This specific VM is running CentOS 7, but I also tried with a Server 2016 VM and the same result.
Are there any specific requirements for migration using local storage?

I also tried using our production on-premise 4-node cluster, that was upgraded from PVE 4.4 to 5.0 and then to 5.1 and the same result. The on-premise does not use ZFS, but instead uses file-based (qcow2) storage.
 
Thank you for the reply. Strange that Proxmox has advertised this as possible, but it does not work from the GUI.
I know at least half of our staff that do not know how to use the terminal/SSH/CLI.
Maybe this will work from the GUI in PVE 5.2
I will try the above
 
With 5.1 it is still not possible to do a live migration from UI. Why is that?
It also seems it's buggy, if you cancel a migration task (started from cli) it does not cleanup the LV (vm-XXX-disk-1) on the target node!
 
Hi,
you must do that on the cli:
Code:
qm migrate 202 ns6735811 --online --with-local-disks
Udo
Good afternoon.
The team you gave is responsible for the migration. The same thing happened when managing through the WEB.
The error appears exactly the same.


Because its not yet stable.
Good afternoon.
What do you mean ?
Proxmox 5.1 is an unstable version?
5.0 - stable?
 
Hi,
also with more than one vm-disks? I had problems with VMs which had two disks to migrate much often than with one disk before.
Udo

Yes, this has been fixed in february
https://git.proxmox.com/?p=qemu-server.git;a=commit;h=87955688fda3f11440b7bc292e22409d22d8112f

(This is my patch ;) so I can tell it's working fine now. The problem was with the socat tunnel we used, with 1 connection by disk , the first connection could have a timeout after 30s if second disk was not finished)
 
  • Like
Reactions: udo
In my experience they sometimes fail if there is heavy disk IO while live migrating.
For those, i pause the VM before issuing live migration. :) There is still an outage of the service obviously, but the VM never gets rebooted and most of the times migration succeeds.
 
Yes, this has been fixed in february
https://git.proxmox.com/?p=qemu-server.git;a=commit;h=87955688fda3f11440b7bc292e22409d22d8112f

(This is my patch ;) so I can tell it's working fine now. The problem was with the socat tunnel we used, with 1 connection by disk , the first connection could have a timeout after 30s if second disk was not finished)
Hi spirit,
sorry it's not working reliable - make one test and it's fails:
Code:
boot: c
bootdisk: scsi0
cores: 2
hotplug: disk,network,usb
keyboard: de
numa: 1
memory: 8192
name: vm01-dev
net0: virtio=02:76:FC:6D:7F:08,bridge=vmbr0,tag=123
onboot: 1
ostype: l26
scsi0: local_vm_storage:623/vm-623-disk-1.qcow2,discard=on,size=25G
scsi1: local_vm_storage:623/vm-623-disk-2.qcow2,iothread=1,size=25G
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=4a9fe3d2-df51-4163-b7bf-eb4657679379
sockets: 1
log on sending host:
Code:
Jun  3 20:34:35 pve08 qm[2928015]: <root@pam> starting task UPID:pve08:002CAD90:4B357DB7:5B1434BB:qmigrate:623:root@pam:
Jun  3 20:44:01 pve08 qm[2930469]: VM 623 qmp command failed - VM 623 qmp command 'change' failed - unable to connect to VM 623 qmp socket - timeout after 599 retries
Jun  3 20:44:54 pve08 qm[2928016]: VM 623 qmp command failed - interrupted by signal
Jun  3 20:45:43 pve08 qm[2928016]: VM 623 qmp command failed - VM 623 qmp command 'query-migrate' failed - interrupted by signal
Jun  3 20:45:46 pve08 qm[2928015]: <root@pam> end task UPID:pve08:002CAD90:4B357DB7:5B1434BB:qmigrate:623:root@pam: got unexpected control message:
Jun  3 20:46:39 pve08 lldpd[3820]: error while receiving frame on tap623i0 (retry: 0): Network is down
Jun  3 20:46:39 pve08 lldpd[3802]: 2018-06-03T20:46:39 [WARN/interfaces] error while receiving frame on tap623i0 (retry: 0): Network is down
Jun  3 20:46:39 pve08 qm[2931245]: VM 623 qmp command failed - VM 623 qmp command 'change' failed - unable to connect to VM 623 qmp socket - Connection refused
Jun  3 20:46:39 pve08 qm[2931203]: VM 623 qmp command failed - VM 623 qmp command 'change' failed - unable to connect to VM 623 qmp socket - Connection refused
on the target:
Code:
Jun  3 20:34:37 pve06 qm[1414930]: <root@pam> starting task UPID:pve06:00159717:01D7AE68:5B1434BD:qmstart:623:root@pam:
Jun  3 20:34:37 pve06 qm[1414935]: start VM 623: UPID:pve06:00159717:01D7AE68:5B1434BD:qmstart:623:root@pam:
Jun  3 20:34:37 pve06 systemd[1]: Started 623.scope.
Jun  3 20:34:37 pve06 systemd-udevd[1414959]: Could not generate persistent MAC address for tap623i0: No such file or directory
Jun  3 20:34:38 pve06 kernel: [309113.136860] device tap623i0 entered promiscuous mode
Jun  3 20:34:38 pve06 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap623i0
Jun  3 20:34:38 pve06 ovs-vsctl: ovs|00002|db_ctl_base|ERR|no port named tap623i0
Jun  3 20:34:38 pve06 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln623i0
Jun  3 20:34:38 pve06 ovs-vsctl: ovs|00002|db_ctl_base|ERR|no port named fwln623i0
Jun  3 20:34:38 pve06 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-port vmbr0 tap623i0 tag=123
Jun  3 20:34:38 pve06 qm[1414930]: <root@pam> end task UPID:pve06:00159717:01D7AE68:5B1434BD:qmstart:623:root@pam: OK
Jun  3 20:46:38 pve06 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln623i0
Jun  3 20:46:38 pve06 ovs-vsctl: ovs|00002|db_ctl_base|ERR|no port named fwln623i0
Jun  3 20:46:38 pve06 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap623i0
Jun  3 20:46:38 pve06 lldpd[3748]: error while receiving frame on tap623i0 (retry: 0): Network is down
Jun  3 20:46:38 pve06 lldpd[3451]: 2018-06-03T20:46:38 [WARN/interfaces] error while receiving frame on tap623i0 (retry: 0): Network is down
Jun  3 20:46:39 pve06 lldpd[3748]: unable to send packet on real device for tap623i0: No such device or address
Jun  3 20:46:39 pve06 lldpd[3451]: 2018-06-03T20:46:39 [WARN/lldp] unable to send packet on real device for tap623i0: No such device or address
both hosts are uptodate, but the source host aren't rebootet since 5 month (should rebooted tomorror morning).
I use CNTR-C, but must kill the vm process with -9.
Code:
pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.13.13-3-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-1
pve-kernel-4.13: 5.1-44
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-3-pve: 4.13.13-34
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-9
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9
The disk-data was transferred (target):
Code:
ls -lsa /data/images/623/
total 30436104
       4 drwxr-----  2 root root        4096 Jun  3 20:34 .
       4 drwxr-xr-x 27 root root        4096 Jun  3 20:34 ..
 6872944 -rw-r-----  1 root root 26847870976 Jun  3 20:40 vm-623-disk-1.qcow2
23563152 -rw-r-----  1 root root 26847870976 Jun  3 20:38 vm-623-disk-2.qcow2
Udo
 
The feature is stable (I have tested it with lot of migrations). It's only need an option in gui.
Not really. For example, if you have more then one disk the migration didn't work. So it is an experimental feature.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!