live migration works with qm, but not with ha-manager (and GUI)

Liang Ma

New Member
Jan 25, 2019
5
0
1
61
Hi Everyone,

We have a two-node Proxmox VE (5.3-1) setup with GlusterFS volumes as shared storage. We encountered problem to live-migrate test VM from one node to another. If we live-migrate using the GUI or command ' ha-manager migrate vm:103 mynode-2', it takes about 15 minutes and eventually fails with the error message below. The interesting thing is that if we do it with command 'qm migrate 103 mynode-2 --online', it migrate it live without any problem. So what is the difference behind these two method, and how can we make it work with the GUI or ha-manager?

Thank you.

Liang

The error message from the GUI and ha-manager live-migration:

2019-01-24 10:17:03 starting migration of VM 103 to node 'mynode-1' (10.10.202.126)
2019-01-24 10:17:03 copying disk images
2019-01-24 10:17:03 starting VM 103 on remote node 'fw2mdap'
2019-01-24 10:32:51 start failed: command '/usr/bin/kvm -id 103 -name v-dsmda-2 -chardev 'socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/103.pid -daemonize -smbios 'type=1,uuid=26a79210-ab10-4b2d-bda7-1efe1c449189' -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/103.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 512 -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'vmgenid,guid=f555fa7b-f0e0-4c42-9216-af21c5d82c49' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:993ff3b6f475' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=gluster://10.10.13.126/gv1/images/102/vm-102-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=E2:09:BF:D3:15:C9,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc-i440fx-2.12' -incoming unix:/run/qemu-server/103.migrate -S' failed: exit code 1
2019-01-24 10:32:51 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=mynode-2' root@10.10.202.126 qm start 103 --skiplock --migratedfrom mynode-2 --migration_type secure --stateuri unix --machine pc-i440fx-2.12' failed: exit code 255
2019-01-24 10:32:51 aborting phase 2 - cleanup resources
2019-01-24 10:32:51 migrate_cancel
2019-01-24 10:32:53 ERROR: migration finished with problems (duration 00:15:50)
TASK ERROR: migration problems
 
Thank you danielb.

You meant to remove the vm from the HA resource. I tried without putting the vm in HA resource. Command ha-manager migrate vm:103 mynode-2 refuses to do it, but the GUI doesn't complain. Only gives me the failed errors after trying 15 minutes.

If less than 3 nodes is the issue, is there a way to set the cluster HA works with just two nodes?

Thanks.

Liang
 
HA makes no sense with less than 3 nodes. The cluster needs to be quorate, which requires a minimum of 3 nodes. So, unless you can add a 3rd node, you should remove the HA resource. Leaving the HA stuff appart, I don't know why a simple migrate would work from the CLI with qm and not from the web
 
Interesting. I thought two nodes are the minimum requirement to form a redundant fault-tolerant system. Yes, command 'qm migrate 103 mynode-2 --online' does migrate our VN live seamlessly.

Thanks.

Liang
 
redundant != HA. To prevent splitbrains, only the quorate part of the cluster (which means at least half + 1 node) can run VM. The other nodes are self fenced automatically. With only two nodes, both would lost the quorum at the same time, and none can decide to operate. You can probably trick this by giving one of the node more votes. But that wouldn't be HA anyway. You could add a small server, just to give the quorum (even if this 3rd node isn't able to run any VM itself)
 
I added a third node in the cluster, but it still generated the same error with live migration over web GUI.
 
in the target node, there has to be a 'start' task for each migration attempt and this should contain more output
so that you can see what the problem is
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!