Live migration with local storage failing

e100

Renowned Member
Nov 6, 2010
1,268
46
88
Columbus, Ohio
ulbuilder.wordpress.com
Is the online live migration with local storage supposed to work or are there still some known bugs to work out?

Command to migrate:
Code:
qm migrate 102 vm1 -online -with-local-disks -migration_type insecure

Results in error at end of migration:
Code:
drive-virtio0: transferred: 34361704448 bytes remaining: 0 bytes total: 34361704448 bytes progression: 100.00 % busy: 0 ready: 1
drive-ide0: transferred: 536870912 bytes remaining: 0 bytes total: 536870912 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
Apr 17 11:08:49 starting online/live migration on tcp:192.168.8.1:60000
Apr 17 11:08:49 migrate_set_speed: 8589934592
Apr 17 11:08:49 migrate_set_downtime: 0.1
Apr 17 11:08:49 set migration_caps
Apr 17 11:08:49 set cachesize: 167772160
Apr 17 11:08:49 start migrate command to tcp:192.168.8.1:60000
Apr 17 11:08:51 migration speed: 11.94 MB/s - downtime 119 ms
Apr 17 11:08:51 migration status: completed
drive-virtio0: transferred: 34361704448 bytes remaining: 0 bytes total: 34361704448 bytes progression: 100.00 % busy: 0 ready: 1
drive-ide0: transferred: 536870912 bytes remaining: 0 bytes total: 536870912 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-virtio0: Completing block job...
drive-virtio0: Completed successfully.
drive-ide0: Completing block job...
drive-ide0: Completed successfully.
drive-virtio0 : finished
drive-ide0 : finished
Apr 17 11:09:12 ERROR: VM 102 qmp command 'cont' failed - Conflicts with use by a block device as 'root', which uses 'write' on #block159
Apr 17 11:09:12 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@192.168.8.1 qm resume 102 --skiplock --nocheck' failed: exit code 255
Apr 17 11:09:15 ERROR: migration finished with problems (duration 00:02:40)
migration problems

After the migration fails I can press resume in the GUI and the VM seems to keep on working.


pveversion -v
Code:
proxmox-ve: 5.0-6 (running kernel: 4.10.8-1-pve)
pve-manager: 5.0-6 (running version: 5.0-6/669657df)
pve-kernel-4.10.1-2-pve: 4.10.1-2
pve-kernel-4.10.5-1-pve: 4.10.5-5
pve-kernel-4.10.8-1-pve: 4.10.8-6
libpve-http-server-perl: 2.0-1
lvm2: 2.02.168-pve1
corosync: 2.4.2-pve2
libqb0: 1.0.1-1
pve-cluster: 5.0-4
qemu-server: 5.0-2
pve-firmware: 2.0-2
libpve-common-perl: 5.0-7
libpve-guest-common-perl: 2.0-1
libpve-access-control: 5.0-3
libpve-storage-perl: 5.0-3
pve-libspice-server1: 0.12.8-3
vncterm: 1.4-1
pve-docs: 5.0-1
pve-qemu-kvm: 2.9.0-1~rc3
pve-container: 2.0-6
pve-firewall: 3.0-1
pve-ha-manager: 2.0-1
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.7-500
lxcfs: 2.0.6-pve500
criu: 2.11.1-1~bpo90
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
 
  • Like
Reactions: chrone
Code:
boot: dc
bootdisk: ide0
cores: 2
ide0: local-zfs:vm-102-disk-1,format=raw,size=512M
ide2: none,media=cdrom
memory: 1600
name: name.com
net0: virtio=54:B7:B7:56:4C:6E,bridge=vmbr0,tag=30
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=c4d8009b-7ca8-4126-9c76-e85354fab637
sockets: 1
virtio0: local-zfs:vm-102-disk-2,format=raw,size=32G
 
thanks! some proposed patches are available on pve-devel if you want to test them ;)
 
Just wondering if this should be working? I'm on the following packages:

Code:
proxmox-ve: 5.0-16 (running kernel: 4.10.17-1-pve)
pve-manager: 5.0-23 (running version: 5.0-23/af4267bf)
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-1-pve: 4.10.17-16
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-14
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-5
libpve-storage-perl: 5.0-12
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-2
pve-container: 2.0-14
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve2
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90

When I try it I've got the following (Cut out all the copy lines):

Code:
qm migrate 999 proxmox-02 -online -with-local-disks

2017-07-24 11:50:22 starting migration of VM 999 to node 'proxmox-02' (172.28                                                                                                                               .28.241)
2017-07-24 11:50:22 found local disk 'performance:vm-999-disk-1' (in current VM                                                                                                                                config)
2017-07-24 11:50:22 copying disk images
2017-07-24 11:50:22 starting VM 999 on remote node 'kt-proxmox-02'
2017-07-24 11:50:24 start remote tunnel
2017-07-24 11:50:24 starting storage migration
2017-07-24 11:50:24 virtio0: start migration to to nbd:172.28.28.241:60000:expor                                                                                                                               tname=drive-virtio0
drive mirror is starting for drive-virtio0
drive-virtio0: transferred: 0 bytes remaining: 42949672960 bytes total: 42949672                                                                                                                               960 bytes progression: 0.00 % busy: 1 ready: 0
drive-virtio0: transferred: 117440512 bytes remaining: 42832232448 bytes total:                                                                                                                                42949672960 bytes progression: 0.27 % busy: 1 ready: 0
drive-virtio0: transferred: 234881024 bytes remaining: 42714791936 bytes total: 42949672960 bytes progression: 0.55 % busy: 1 ready: 0

Lots of lines as it copies fine...

drive-virtio0: transferred: 44024266752 bytes remaining: 0 bytes total: 44024266752 bytes progression: 100.00 % busy: 1 ready: 0
drive-virtio0: Cancelling block job
drive-virtio0: Done.
2017-07-24 12:06:34 ERROR: online migrate failure - mirroring error: drive-virtio0: mirroring has been cancelled
2017-07-24 12:06:34 aborting phase 2 - cleanup resources
2017-07-24 12:06:34 migrate_cancel
2017-07-24 12:06:43 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=proxmox-02' root@172.28.28.241 pvesm free performance:vm-999-disk-1' failed: exit code 1
2017-07-24 12:06:46 ERROR: migration finished with problems (duration 00:16:24)
migration problems

Storage Replication by the way is working great on ZFS. I reloaded both my hosts as they were on LVM so spent a week or two to do it and definitely worth it :)

This would be the icing on the cake of course if it worked.
 
Any news about live migration of KVM hosts with local disks?

I am able to live migrate a test KVM machine with success. But if I try to live migrate my production KVM machines with multiple local disks, the migration is canceled in the middle of the transfer of the second disk:

Code:
drive-sata0: transferred: 8592228352 bytes remaining: 0 bytes total: 8592228352 bytes progression: 100.00 % busy: 0 ready: 1
drive-sata1: transferred: 17012097024 bytes remaining: 17347641344 bytes total: 34359738368 bytes progression: 49.51 % busy: 1 ready: 0
drive-sata0: transferred: 8592228352 bytes remaining: 0 bytes total: 8592228352 bytes progression: 100.00 % busy: 0 ready: 1
drive-sata1: transferred: 17127440384 bytes remaining: 17232297984 bytes total: 34359738368 bytes progression: 49.85 % busy: 1 ready: 0
drive-sata0: Cancelling block job
drive-sata1: Cancelling block job
drive-sata0: Done.
drive-sata1: Done.
2017-08-17 15:37:41 ERROR: online migrate failure - mirroring error: drive-sata0: mirroring has been cancelled
2017-08-17 15:37:41 aborting phase 2 - cleanup resources
2017-08-17 15:37:41 migrate_cancel
2017-08-17 15:37:47 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=server001' root@128.127.69.199 pvesm free local-lvm:vm-115-disk-3' failed: exit code 5
2017-08-17 15:37:53 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=server001' root@128.127.69.199 pvesm free local-lvm:vm-115-disk-4' failed: exit code 5
2017-08-17 15:37:54 ERROR: migration finished with problems (duration 00:04:28)
migration problems

I am using
pve-manager/5.0-30/5ab26bc (running kernel: 4.10.17-2-pve)
 
Any news about live migration of KVM hosts with local disks?

I am able to live migrate a test KVM machine with success. But if I try to live migrate my production KVM machines with multiple local disks, the migration is canceled in the middle of the transfer of the second disk:

Code:
drive-sata0: transferred: 8592228352 bytes remaining: 0 bytes total: 8592228352 bytes progression: 100.00 % busy: 0 ready: 1
drive-sata1: transferred: 17012097024 bytes remaining: 17347641344 bytes total: 34359738368 bytes progression: 49.51 % busy: 1 ready: 0
drive-sata0: transferred: 8592228352 bytes remaining: 0 bytes total: 8592228352 bytes progression: 100.00 % busy: 0 ready: 1
drive-sata1: transferred: 17127440384 bytes remaining: 17232297984 bytes total: 34359738368 bytes progression: 49.85 % busy: 1 ready: 0
drive-sata0: Cancelling block job
drive-sata1: Cancelling block job
drive-sata0: Done.
drive-sata1: Done.
2017-08-17 15:37:41 ERROR: online migrate failure - mirroring error: drive-sata0: mirroring has been cancelled
2017-08-17 15:37:41 aborting phase 2 - cleanup resources
2017-08-17 15:37:41 migrate_cancel
2017-08-17 15:37:47 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=server001' root@128.127.69.199 pvesm free local-lvm:vm-115-disk-3' failed: exit code 5
2017-08-17 15:37:53 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=server001' root@128.127.69.199 pvesm free local-lvm:vm-115-disk-4' failed: exit code 5
2017-08-17 15:37:54 ERROR: migration finished with problems (duration 00:04:28)
migration problems

I am using
pve-manager/5.0-30/5ab26bc (running kernel: 4.10.17-2-pve)

is this reproducible? I assume this is from and to LVM-thin?

it looks like at some point the monitor socket is not reachable, which our code treats as "somebody has cancelled our block job" (hence the "2017-08-17 15:37:41 ERROR: online migrate failure - mirroring error: drive-sata0: mirroring has been cancelled") - the rest is just (attempts to) clean up.
 
  • Like
Reactions: chrone
Hello Fabian,
thank you for your prompt answer!

The migration is reproducibly canceling when migrating the VMs which have 2 disks attached. The mirgration is working without problems for the first 8GB disk, followed by the transfer of the second disk - which is canceled at approx 50%.

Instead it is also reproducibly working with a newly installed VM with one 32GB disk attached.

Code:
qm migrate 115 server001 --online --with-local-disks --migration_type insecure

Best regards!
Falco
 
  • Like
Reactions: chrone
Hello Fabian,
thank you for your prompt answer!

The migration is reproducibly canceling when migrating the VMs which have 2 disks attached. The mirgration is working without problems for the first 8GB disk, followed by the transfer of the second disk - which is canceled at approx 50%.

Instead it is also reproducibly working with a newly installed VM with one 32GB disk attached.

Code:
qm migrate 115 server001 --online --with-local-disks --migration_type insecure

Best regards!
Falco

are you able to do a local "move disk" with theses failing disks ?
 
P.S: Yes it is LVM-thin

What would be the command for doing the move - moving to which location? For storage I have a local-lvm per node and one globally available NFS share.

Code:
qm move_disk 115 sata0 ...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!