KVM Live Migration hangs on fresh 3-node install of Proxmox VE 3.2

brad_mssw · Jun 13, 2014

We are just starting to evaluate proxmox ve and testing the various features provided. The setup uses 3 Intel-based nodes with GlusterFS. Since we do not have an enterprise subscription (yet) we installed Debian 7.5 + GlusterFS 3.5, then performed the conversion to the Proxmox kernel using the pve-no-subscription repo. Perhaps it is important to point out that we did NOT first install proxmox ve 3.1 using the standard pve repo, as the wiki indicates, we instead went direct to 3.2 using pve-no-subscription, we assume this is supported.

So far we have only configured the pvecm cluster and joined the nodes. No fencing is enabled yet.

We created a simple CentOS 6.5 minimal VM (1GB ram, 1 cpu (kvm64), 32GB qcow2 disk virtio, virtio network), and verified we can perform _offline_ migrations between the various hosts to validate the GlusterFS shared storage was indeed working.

Then we tried Live Migration from proxmox2 to proxmox1, and the last line we get is:
Jun 13 08:19:42 starting VM 100 on remote node 'proxmox1'

On that proxmox1 system, I can see the kvm process running and the qm migrate process:
root 111359 111357 0 08:19 ? 00:00:00 /usr/bin/perl /usr/sbin/qm start 100 --stateuri tcp --skiplock --migratedfrom proxmox2 --machine pc-i440fx-1.7
root 111376 1 0 08:19 ? 00:00:00 /usr/bin/kvm -id 100 -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -name centostest -smp sockets=1,cores=1 -nodefaults -boot menu=on -vga cirrus -cpu Westmere,+x2apic -k en-us -m 1024 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive file=gluster://localhost/vms/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge -device virtio-net-pci,mac=DE:8E:BA:60

7:B4,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -machine type=pc-i440fx-1.7 -incoming tcp:localhost:60000 -S

I've tried scanning the various log files in /var/log, but not finding any errors on either server.

One last comment, on each of the hosts, under Services, it shows RGManager as stopped. I'm not sure what that service does, but perhaps it is relevant.

Any help would be much appreciated, Thanks!
-Brad

Steijn van Essen · Jun 13, 2014

Hi Brad,

It might not be of much help but a couple of hours ago I stumbled over this page:

http://www.linux-kvm.org/page/Migration

It contains some syntaxs examples as well as known problems.

Steijn

brad_mssw · Jun 13, 2014

Not seeing anything in there that would help.

That said, it appeared I was having multicast issues, which is why rgmanager wasn't running. I fixed those. However, that didn't fix live migration.

One other mentionable item, I noticed whenever I start a VM, it gives me an error like: TASK ERROR: start failed: command '/usr/bin/kvm ...' failed: got timeout
but the VM actually IS running and operating properly.

So perhaps the 2 are related, it never thinks the VM starts on the remote host.

brad_mssw · Jun 13, 2014

I think this post is the same issue as mine.
http://forum.proxmox.com/threads/18486-qm-start-fails-with-timeout-error-but-vm-is-actually-started

Steijn van Essen · Jun 14, 2014

Brad,

As a newbie to ProxMox I'm afraid I cannot be of anymore help to you.

But concerning the RGManager issue: are you talking IP multicast here?! Clever that you found the problem.

I recall loosing my IPTV signal once when connecting my provider router TV port to a newly created VLAN meant for video (including TV) on my Netgear switch. After some time my IPTV settop box received an IP address from my home router when sending an DHCP renewal request, expecting my provider's router to answer ... that's when my settop box lost conectivity with my provider's multicast IPTV network. Of course, disabling DHCP on my home router interface for this network only restored TV reception after a while ...

But sorry, this is a bit off topic. I hope some real ProxMox expert might be able to give you the answers you need.

Steijn

brad_mssw · Jun 14, 2014

Turns out I had a few issues. First was ... don't install GlusterFS 3.5 from the gluster.org repos, seems there is some sort of incompatibility with Proxmox. Next is, I was still having multicast issues, I fixed the server bridge issue by disabling multicast snooping (by adding this to my /etc/network/interfaces: post-up echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping), but then I think my cisco switch had igmp snooping enabled by default too, and though it seemed to work for a while and multicast pings worked as per http://pve.proxmox.com/wiki/Multicast_notes nodes would just randomly drop out. Finally, I switched to use transport="udpu", rebooted all nodes, and haven't had an issue since.

Search

Search

KVM Live Migration hangs on fresh 3-node install of Proxmox VE 3.2

brad_mssw

Well-Known Member

Steijn van Essen

Active Member

brad_mssw

Well-Known Member

brad_mssw

Well-Known Member

Steijn van Essen

Active Member

brad_mssw

Well-Known Member