live migration works with qm, but not with ha-manager (and GUI)

Liang Ma · Jan 25, 2019

Hi Everyone,

We have a two-node Proxmox VE (5.3-1) setup with GlusterFS volumes as shared storage. We encountered problem to live-migrate test VM from one node to another. If we live-migrate using the GUI or command ' ha-manager migrate vm:103 mynode-2', it takes about 15 minutes and eventually fails with the error message below. The interesting thing is that if we do it with command 'qm migrate 103 mynode-2 --online', it migrate it live without any problem. So what is the difference behind these two method, and how can we make it work with the GUI or ha-manager?

Thank you.

Liang

The error message from the GUI and ha-manager live-migration:

2019-01-24 10:17:03 starting migration of VM 103 to node 'mynode-1' (10.10.202.126)
2019-01-24 10:17:03 copying disk images
2019-01-24 10:17:03 starting VM 103 on remote node 'fw2mdap'
2019-01-24 10:32:51 start failed: command '/usr/bin/kvm -id 103 -name v-dsmda-2 -chardev 'socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/103.pid -daemonize -smbios 'type=1,uuid=26a79210-ab10-4b2d-bda7-1efe1c449189' -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/103.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 512 -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'vmgenid,guid=f555fa7b-f0e0-4c42-9216-af21c5d82c49' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:993ff3b6f475' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=gluster://10.10.13.126/gv1/images/102/vm-102-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=E2:09:BF

3:15:C9,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc-i440fx-2.12' -incoming unix:/run/qemu-server/103.migrate -S' failed: exit code 1
2019-01-24 10:32:51 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=mynode-2' root@10.10.202.126 qm start 103 --skiplock --migratedfrom mynode-2 --migration_type secure --stateuri unix --machine pc-i440fx-2.12' failed: exit code 255
2019-01-24 10:32:51 aborting phase 2 - cleanup resources
2019-01-24 10:32:51 migrate_cancel
2019-01-24 10:32:53 ERROR: migration finished with problems (duration 00:15:50)
TASK ERROR: migration problems

danielb · Jan 25, 2019

You need at least 3 nodes to have HA. Remove HA, and you should be able to migrate from the web GUI (manually only)

Liang Ma · Jan 25, 2019

Thank you danielb.

You meant to remove the vm from the HA resource. I tried without putting the vm in HA resource. Command ha-manager migrate vm:103 mynode-2 refuses to do it, but the GUI doesn't complain. Only gives me the failed errors after trying 15 minutes.

If less than 3 nodes is the issue, is there a way to set the cluster HA works with just two nodes?

Thanks.

Liang

danielb · Jan 25, 2019

HA makes no sense with less than 3 nodes. The cluster needs to be quorate, which requires a minimum of 3 nodes. So, unless you can add a 3rd node, you should remove the HA resource. Leaving the HA stuff appart, I don't know why a simple migrate would work from the CLI with qm and not from the web

Liang Ma · Jan 25, 2019

Interesting. I thought two nodes are the minimum requirement to form a redundant fault-tolerant system. Yes, command 'qm migrate 103 mynode-2 --online' does migrate our VN live seamlessly.

Thanks.

Liang

danielb · Jan 25, 2019

redundant != HA. To prevent splitbrains, only the quorate part of the cluster (which means at least half + 1 node) can run VM. The other nodes are self fenced automatically. With only two nodes, both would lost the quorum at the same time, and none can decide to operate. You can probably trick this by giving one of the node more votes. But that wouldn't be HA anyway. You could add a small server, just to give the quorum (even if this 3rd node isn't able to run any VM itself)

Liang Ma · Jan 25, 2019

Good point.

Thank you danielb for the explanation and suggestion.

Liang

Liang Ma · Jan 30, 2019

I added a third node in the cluster, but it still generated the same error with live migration over web GUI.

dcsapak · Jan 31, 2019

in the target node, there has to be a 'start' task for each migration attempt and this should contain more output
so that you can see what the problem is

Search

Search

live migration works with qm, but not with ha-manager (and GUI)

Liang Ma

New Member

danielb

Renowned Member

Liang Ma

New Member

danielb

Renowned Member

Liang Ma

New Member

danielb

Renowned Member

Liang Ma

New Member

Liang Ma

New Member

dcsapak

Proxmox Staff Member