Live Migration fails with localdisks?

sahostking

Renowned Member
root@pve-1:~# qm migrate 101 pve-6 --online --with-local-disks --migration_type insecure --migration_network 10.0.0.0/24
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
2019-11-25 08:55:28 use dedicated network address for sending migration traffic (10.0.0.136)
2019-11-25 08:55:29 starting migration of VM 101 to node 'pve-6' (10.0.0.136)
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
2019-11-25 08:55:29 found local disk 'local-lvm:vm-101-disk-0' (in current VM config)
2019-11-25 08:55:29 copying disk images
2019-11-25 08:55:29 starting VM 101 on remote node 'pve-6'
2019-11-25 08:55:30 VM 101 already running
2019-11-25 08:55:30 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-6' root@10.0.0.136 qm start 101 --skiplock --migratedfrom pve-1 --migration_type insecure --migration_network 10.0.0.0/24 --stateuri tcp --machine pc-i440fx-4.0 --targetstorage 1' failed: exit code 255
2019-11-25 08:55:30 aborting phase 2 - cleanup resources
2019-11-25 08:55:30 migrate_cancel
2019-11-25 08:55:31 ERROR: migration finished with problems (duration 00:00:03)
migration problems


Note sure what the above means but keeps failing.
 
Tested it again by moving disks from local-lvm -> nfsserver and now it works fine

2019-11-25 09:14:43 use dedicated network address for sending migration traffic (10.0.0.136)
2019-11-25 09:14:43 starting migration of VM 101 to node 'pve-6' (10.0.0.136)
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
2019-11-25 09:14:44 copying disk images
2019-11-25 09:14:44 starting VM 101 on remote node 'pve-6'
2019-11-25 09:14:45 trying to acquire lock...
2019-11-25 09:14:45 OK
2019-11-25 09:14:46 start remote tunnel
2019-11-25 09:14:46 ssh tunnel ver 1
2019-11-25 09:14:46 starting online/live migration on unix:/run/qemu-server/101.migrate
2019-11-25 09:14:46 migrate_set_speed: 8589934592
2019-11-25 09:14:46 migrate_set_downtime: 0.1
2019-11-25 09:14:46 set migration_caps
2019-11-25 09:14:46 set cachesize: 67108864
2019-11-25 09:14:46 start migrate command to unix:/run/qemu-server/101.migrate
2019-11-25 09:14:48 migration speed: 256.00 MB/s - downtime 7 ms
2019-11-25 09:14:48 migration status: completed
2019-11-25 09:14:51 migration finished successfully (duration 00:00:08)
TASK OK


So I assume problem is with local disks and local-lvm only?
 
root@pve-1:~# qm migrate 101 pve-6 --online --with-local-disks --migration_type insecure --migration_network 10.0.0.0/24
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
2019-11-25 08:55:28 use dedicated network address for sending migration traffic (10.0.0.136)
2019-11-25 08:55:29 starting migration of VM 101 to node 'pve-6' (10.0.0.136)
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
/dev/sdc: open failed: No medium found
/dev/sdd: open failed: No medium found
2019-11-25 08:55:29 found local disk 'local-lvm:vm-101-disk-0' (in current VM config)
2019-11-25 08:55:29 copying disk images
2019-11-25 08:55:29 starting VM 101 on remote node 'pve-6'
2019-11-25 08:55:30 VM 101 already running

The '/dev/sdX' errors (does 'pvesm status' also throw those?) should not matter, but you might want to fix those as well. The reason the migration failed is 'VM 101 already running'. Seems like the VM (or at least a VM with the same ID) was detected as running on the target node already. But it is strange that it works after you moved the local disk to a shared storage. Could you run 'pveversion -v' and share the config of the VM?
 
I'm not too concerned as I'm just playing with it to get a good feel for migrations on test servers atm. I just want to practice and get the process perfected. I think moving the disks to nfs-server, then migrating the VM, then moving the disks back is going to work fine. May just take a little longer and a little bit more steps to complete each one. Oh and the vm already running message was not because the VM was there as there is nothing on that new server its just that the first few tries failed and hence left some orphaned disks in local-lvm which I just removed. Had to reboot as they were stuck to retry migration again.

Here you go:

proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1
 
Does it also fail when you migrate back now (without moving the disk to nfs first) and does it also fail for other VMs? And if it fails does 'ls /var/run/qemu-server/' on the target node show anything?
 
It works perfectly if I leave it on nfs - I can then decide to move disk back to local-lvm after migration on the new node. If I try migrating it when any VM still has local-lvm attached it fails completely.

I see this on the server I am migrating to currently.
root@pve-6:/etc/pve/qemu-server# ls /var/run/qemu-server/
100.pid 100.qga 100.qmp 100.vnc 112.pid 112.qmp 112.vnc
 
I think I may have spotted the error. Well in our case if we migrate or move vms with virtio scsi disks attached we get errors like below:


create full clone of drive scsi1 (nfs-server:114/vm-114-disk-1.raw)
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume "vm-114-disk-5" created.
WARNING: Sum of all thin volume sizes (4.93 TiB) exceeds the size of thin pool pve/data and the size of whole volume group (3.27 TiB).
drive mirror is starting for drive-scsi1 with bandwidth limit: 256000 KB/s
drive-scsi1: Cancelling block job

If I do it with IDE one attached we get it transferring:

create full clone of drive ide3 (nfs-server:114/vm-114-disk-0.raw)
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume "vm-114-disk-5" created.
WARNING: Sum of all thin volume sizes (4.18 TiB) exceeds the size of thin pool pve/data and the size of whole volume group (3.27 TiB).
drive mirror is starting for drive-ide3 with bandwidth limit: 256000 KB/s
drive-ide3: transferred: 33554432 bytes remaining: 34326183936 bytes total: 34359738368 bytes progression: 0.10 % busy: 0 ready: 0
drive-ide3: transferred: 268435456 bytes remaining: 34091302912 bytes total: 34359738368 bytes progression: 0.78 % busy: 0 ready: 0
drive-ide3: transferred: 536870912 bytes remaining: 33822867456 bytes total: 34359738368 bytes progression: 1.56 % busy: 0 ready: 0
drive-ide3: transferred: 704643072 bytes remaining: 33655095296 bytes total: 34359738368 bytes progression: 2.05 % busy: 0 ready: 0
drive-ide3: transferred: 771751936 bytes remaining: 33587986432 bytes total: 34359738368 bytes progression: 2.25 % busy: 0 ready: 0
drive-ide3: transferred: 939524096 bytes remaining: 33420214272 bytes total: 34359738368 bytes progression: 2.73 % busy: 0 ready: 0


Going to test it more and see.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!