KVM Live Migration hangs on fresh 3-node install of Proxmox VE 3.2

brad_mssw

Well-Known Member
Jun 13, 2014
133
9
58
We are just starting to evaluate proxmox ve and testing the various features provided. The setup uses 3 Intel-based nodes with GlusterFS. Since we do not have an enterprise subscription (yet) we installed Debian 7.5 + GlusterFS 3.5, then performed the conversion to the Proxmox kernel using the pve-no-subscription repo. Perhaps it is important to point out that we did NOT first install proxmox ve 3.1 using the standard pve repo, as the wiki indicates, we instead went direct to 3.2 using pve-no-subscription, we assume this is supported.

So far we have only configured the pvecm cluster and joined the nodes. No fencing is enabled yet.

We created a simple CentOS 6.5 minimal VM (1GB ram, 1 cpu (kvm64), 32GB qcow2 disk virtio, virtio network), and verified we can perform _offline_ migrations between the various hosts to validate the GlusterFS shared storage was indeed working.

Then we tried Live Migration from proxmox2 to proxmox1, and the last line we get is:
Jun 13 08:19:42 starting VM 100 on remote node 'proxmox1'

On that proxmox1 system, I can see the kvm process running and the qm migrate process:
root 111359 111357 0 08:19 ? 00:00:00 /usr/bin/perl /usr/sbin/qm start 100 --stateuri tcp --skiplock --migratedfrom proxmox2 --machine pc-i440fx-1.7
root 111376 1 0 08:19 ? 00:00:00 /usr/bin/kvm -id 100 -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -name centostest -smp sockets=1,cores=1 -nodefaults -boot menu=on -vga cirrus -cpu Westmere,+x2apic -k en-us -m 1024 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive file=gluster://localhost/vms/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge -device virtio-net-pci,mac=DE:8E:BA:60:D7:B4,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -machine type=pc-i440fx-1.7 -incoming tcp:localhost:60000 -S

I've tried scanning the various log files in /var/log, but not finding any errors on either server.

One last comment, on each of the hosts, under Services, it shows RGManager as stopped. I'm not sure what that service does, but perhaps it is relevant.

Any help would be much appreciated, Thanks!
-Brad
 
Not seeing anything in there that would help.

That said, it appeared I was having multicast issues, which is why rgmanager wasn't running. I fixed those. However, that didn't fix live migration.

One other mentionable item, I noticed whenever I start a VM, it gives me an error like: TASK ERROR: start failed: command '/usr/bin/kvm ...' failed: got timeout
but the VM actually IS running and operating properly.

So perhaps the 2 are related, it never thinks the VM starts on the remote host.
 
Brad,

As a newbie to ProxMox I'm afraid I cannot be of anymore help to you.

But concerning the RGManager issue: are you talking IP multicast here?! Clever that you found the problem.

I recall loosing my IPTV signal once when connecting my provider router TV port to a newly created VLAN meant for video (including TV) on my Netgear switch. After some time my IPTV settop box received an IP address from my home router when sending an DHCP renewal request, expecting my provider's router to answer ... that's when my settop box lost conectivity with my provider's multicast IPTV network. Of course, disabling DHCP on my home router interface for this network only restored TV reception after a while ...

But sorry, this is a bit off topic. I hope some real ProxMox expert might be able to give you the answers you need.

Steijn
 
Turns out I had a few issues. First was ... don't install GlusterFS 3.5 from the gluster.org repos, seems there is some sort of incompatibility with Proxmox. Next is, I was still having multicast issues, I fixed the server bridge issue by disabling multicast snooping (by adding this to my /etc/network/interfaces: post-up echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping), but then I think my cisco switch had igmp snooping enabled by default too, and though it seemed to work for a while and multicast pings worked as per http://pve.proxmox.com/wiki/Multicast_notes nodes would just randomly drop out. Finally, I switched to use transport="udpu", rebooted all nodes, and haven't had an issue since.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!