migration fails

bruceg

Member
Aug 29, 2018
13
0
6
70
I am trying to migrate a vm between nodes with the command shown. It fails with the message below. The servers are identical in every way. It says


qm migrate 181 prox-dallas-1 --online --with-local-disks
2018-11-28 10:18:28 starting migration of VM 181 to node 'prox-dallas-1' (100.64.110.141)
2018-11-28 10:18:28 found local disk 'raid-lvm:vm-181-disk-0' (in current VM config)
2018-11-28 10:18:28 copying disk images
2018-11-28 10:18:28 starting VM 181 on remote node 'prox-dallas-1'
2018-11-28 10:18:30 start failed: command '/usr/bin/kvm -id 181 -name nfs -chardev 'socket,id=qmp,path=/var/run/qemu-server/181.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/181.pid -daemonize -smbios 'type=1,uuid=6b659764-9ab5-41b5-9169-8375350a1e1d' -smp '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/181.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 4096 -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:1b3fecb1e033' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/prox_vg/vm-181-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap181i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=C2:9E:EB:02:60:01,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc-i440fx-2.12' -incoming unix:/run/qemu-server/181.migrate -S' failed: exit code 1
2018-11-28 10:18:30 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=prox-dallas-1' root@100.64.110.141 qm start 181 --skiplock --migratedfrom prox-plano-1 --migration_type secure --stateuri unix --machine pc-i440fx-2.12 --targetstorage 1' failed: exit code 255
2018-11-28 10:18:30 aborting phase 2 - cleanup resources
2018-11-28 10:18:30 migrate_cancel
2018-11-28 10:18:31 ERROR: migration finished with problems (duration 00:00:03)
migration problems
 
on the target node should be 'start' task, please provide the log of that
 
on the target node should be 'start' task, please provide the log of that

Here is something interesting that I stumbled upon yesterday. It seems that if I initially migrate a VM while it is shutdown, then all subsequent migrations can be run while the VM is running. This is true for all VMs on both of my clusters. Not sure why this is so.
 
I'm having the same problem.

With error code 1 and error code 255 in the next line. (exactly like above)

And if I shut down the machine and migrate it offline it is then possible to online migrate the vm.
After some time online migration is again not possible.

Does anyone know why?
 
Please post the output of `pveversion -v`.
Additionally please post the complete Task Log of the vm-start on the target node.
 
Thank you Stoiko for your quick response.



pveversion -v:

proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-10
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1​

All 3 Nodes have exactly the same version.


When I start "qm migrate 304 NODE2 --online" I get the following error log:


2019-03-14 14:17:17 starting migration of VM 304 to node 'NODE2' (xxx.xxx.xxx.xxx)
2019-03-14 14:17:17 copying disk images
2019-03-14 14:17:17 starting VM 304 on remote node 'NODE2'
2019-03-14 14:17:18 start failed: command '/usr/bin/kvm -id 304 -name VM_NAME -chardev 'socket,id=qmp,path=/var/run/qemu-server/304.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/304.pid -daemonize -smbios 'type=1,uuid=901b92ef-235a-4cf4-8b4c-8e62e6311d20' -smp '20,sockets=2,cores=10,maxcpus=20' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga vmware -vnc unix:/var/run/qemu-server/304.vnc,x509,password -no-hpet -cpu 'kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,enforce' -m 131072 -device 'intel-hda,id=sound5,bus=pci.0,addr=0x18' -device 'hda-micro,id=sound5-codec0,bus=sound5.0,cad=0' -device 'hda-duplex,id=sound5-codec1,bus=sound5.0,cad=1' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'vmgenid,guid=e36512b3-b9ca-4611-843d-ede4cdce2a75' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -chardev 'socket,path=/var/run/qemu-server/304.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:f0ff61b893a' -drive 'if=none,id=drive-ide0,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0' -drive 'if=none,id=drive-ide1,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.0,unit=1,drive=drive-ide1,id=ide1' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/dev/LVM-SSD-VG/vm-304-disk-0,if=none,id=drive-sata3,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-drive,bus=ahci0.3,drive=drive-sata3,id=sata3,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap304i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=9E:9F:ED:7B:6F:0D,netdev=net0,bus=pci.0,addr=0x12,id=net0' -netdev 'type=tap,id=net1,ifname=tap304i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=56:64:90:04:55:5B,netdev=net1,bus=pci.0,addr=0x13,id=net1' -netdev 'type=tap,id=net3,ifname=tap304i3,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=2E:A1:38:B8:67:AE,netdev=net3,bus=pci.0,addr=0x15,id=net3' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-i440fx-2.12' -global 'kvm-pit.lost_tick_policy=discard' -incoming unix:/run/qemu-server/304.migrate -S' failed: exit code 1
2019-03-14 14:17:18 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=NODE2' root@xxx.xxx.xxx.xxx qm start 304 --skiplock --migratedfrom NODE1 --migration_type secure --stateuri unix --machine pc-i440fx-2.12' failed: exit code 255
2019-03-14 14:17:18 aborting phase 2 - cleanup resources
2019-03-14 14:17:18 migrate_cancel
2019-03-14 14:17:19 ERROR: migration finished with problems (duration 00:00:02)
migration problems​
 
hmm - just a guess - LVM-SSD-VG - looks like a LVM storage - is it shared via iSCSI (or FC, or SAS) between all nodes?
if the LVM is not shared you need to say this in the storage.cfg - and migrate with the '--with-local-disks' switch.

Else:
* please try to reproduce the issue with the latest version (proxmox-ve is in version 5.3-1 ....)
* post the start-task log from the target node (NODE2)
* please also post your '/etc/pve/storage.cfg'
 
Tank you for your input!

I have tried all things you mentioned.
Yes: LVM-SSD-VG is a shared iSCSI between all nodes.

The last thing I tried was updating to the latest version (which I've tried to awoid because standard is no connection to the internet).
But a update solved everything - magic-like^^
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!