proxmox4 - online migrate failure - unable to detect remote migration address

uwonlineict

New Member
Dec 26, 2015
10
3
3
37
Evironment is as follow:
3 node cluster setup
drbd9 devices on all 3 nodes.
Installed and configured as described in https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster
and https://pve.proxmox.com/wiki/DRBD9

When migrating a VM from node 1 to node 2 there is no error. When migrating a VM from node 1 or node 2 to node 3 I get the following error:

Dec 26 20:31:40 starting migration of VM 202 to node 'pve30' (10.0.0.30)
Dec 26 20:31:40 copying disk images
Dec 26 20:31:40 starting VM 202 on remote node 'pve30'
Dec 26 20:31:41 ERROR: online migrate failure - unable to detect remote migration address
Dec 26 20:31:41 aborting phase 2 - cleanup resources
Dec 26 20:31:41 migrate_cancel
Dec 26 20:31:41 ERROR: migration finished with problems (duration 00:00:01)
TASK ERROR: migration problems

The network configuration is on all 3 nodes quite the same, with the same bridges:

Node1:

hyp10.png


Node2:
hyp20.png


Node3:
hyp30.png


Detailed log on the server node 3 which we want to migrate to:


Dec 26 20:31:40 pve3 systemd[1]: Started User Manager for UID 0.
Dec 26 20:31:40 pve3 qm[18409]: start VM 202: UPID:pve3:000047E9:028E9F4F:567EEB1C:qmstart:202:root@pam:
Dec 26 20:31:40 pve3 qm[18408]: <root@pam> starting task UPID:pve3:000047E9:028E9F4F:567EEB1C:qmstart:202:root@pam:
Dec 26 20:31:40 pve3 systemd[1]: Starting /usr/bin/kvm -id 202 -chardev socket,id=qmp,path=/var/run/qemu-server/202.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/202.vnc,x509,password -pidfile /var/run/qemu-server/202.pid -daemonize -smbios type=1,uuid=8e8d773f-c967-418c-8823-6584b4f9021f -name website1.uwonlineict.lan1 -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -k en-us -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:fedfd5b2449 -drive file=/dev/drbd/by-res/vm-202-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap202i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown -device e1000,mac=62:37:32:38:31:36,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -machine type=pc-i440fx-2.4 -incoming tcp:localhost:60000 -S.
Dec 26 20:31:40 pve3 systemd[1]: Started /usr/bin/kvm -id 202 -chardev socket,id=qmp,path=/var/run/qemu-server/202.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/202.vnc,x509,password -pidfile /var/run/qemu-server/202.pid -daemonize -smbios type=1,uuid=8e8d773f-c967-418c-8823-6584b4f9021f -name website1.uwonlineict.lan1 -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -k en-us -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:fedfd5b2449 -drive file=/dev/drbd/by-res/vm-202-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap202i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown -device e1000,mac=62:37:32:38:31:36,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -machine type=pc-i440fx-2.4 -incoming tcp:localhost:60000 -S.
Dec 26 20:31:41 pve3 kernel: device tap202i0 entered promiscuous mode
Dec 26 20:31:41 pve3 kernel: vmbr4: port 1(tap202i0) entered forwarding state
Dec 26 20:31:41 pve3 kernel: vmbr4: port 1(tap202i0) entered forwarding state
Dec 26 20:31:41 pve3 kernel: drbd vm-202-disk-1: Preparing cluster-wide state change 3787056617 (2->-1 3/1)
Dec 26 20:31:41 pve3 kernel: drbd vm-202-disk-1: State change 3787056617: primary_nodes=6, weak_nodes=FFFFFFFFFFFFFFF8
Dec 26 20:31:41 pve3 kernel: drbd vm-202-disk-1: Committing cluster-wide state change 3787056617 (0ms)
Dec 26 20:31:41 pve3 kernel: drbd vm-202-disk-1: role( Secondary -> Primary )
Dec 26 20:31:41 pve3 qm[18408]: <root@pam> end task UPID:pve3:000047E9:028E9F4F:567EEB1C:qmstart:202:root@pam: OK
Dec 26 20:31:41 pve3 sshd[18400]: Received disconnect from 10.0.0.20: 11: disconnected by user
Dec 26 20:31:41 pve3 sshd[18400]: pam_unix(sshd:session): session closed for user root
Dec 26 20:31:41 pve3 systemd-logind[1943]: Removed session 184.
Dec 26 20:31:41 pve3 systemd[1]: Stopping User Manager for UID 0...
Dec 26 20:31:41 pve3 systemd[18405]: Stopping Default.
Dec 26 20:31:41 pve3 systemd[18405]: Stopped target Default.
Dec 26 20:31:41 pve3 systemd[18405]: Stopping Basic System.
Dec 26 20:31:41 pve3 systemd[18405]: Stopped target Basic System.
Dec 26 20:31:41 pve3 systemd[18405]: Stopping Paths.
Dec 26 20:31:41 pve3 systemd[18405]: Stopped target Paths.
Dec 26 20:31:41 pve3 systemd[18405]: Stopping Timers.
Dec 26 20:31:41 pve3 systemd[18405]: Stopped target Timers.
Dec 26 20:31:41 pve3 systemd[18405]: Stopping Sockets.
Dec 26 20:31:41 pve3 systemd[18405]: Stopped target Sockets.
Dec 26 20:31:41 pve3 systemd[18405]: Starting Shutdown.
Dec 26 20:31:41 pve3 systemd[18405]: Reached target Shutdown.
Dec 26 20:31:41 pve3 systemd[18405]: Starting Exit the Session...
Dec 26 20:31:41 pve3 systemd[18405]: Received SIGRTMIN+24 from PID 18435 (kill).
Dec 26 20:31:41 pve3 systemd[18406]: pam_unix(systemd-user:session): session closed for user root
Dec 26 20:31:41 pve3 systemd[1]: Stopped User Manager for UID 0.
Dec 26 20:31:41 pve3 systemd[1]: Stopping user-0.slice.
Dec 26 20:31:41 pve3 systemd[1]: Removed slice user-0.slice.
Dec 26 20:31:41 pve3 sshd[18439]: Accepted publickey for root from 10.0.0.20 port 56908 ssh2: RSA xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
Dec 26 20:31:41 pve3 sshd[18439]: pam_unix(sshd:session): session opened for user root by (uid=0)
Dec 26 20:31:41 pve3 systemd[1]: Starting user-0.slice.
Dec 26 20:31:41 pve3 systemd[1]: Created slice user-0.slice.
Dec 26 20:31:41 pve3 systemd[1]: Starting User Manager for UID 0...
Dec 26 20:31:41 pve3 systemd[1]: Starting Session 185 of user root.
Dec 26 20:31:41 pve3 systemd-logind[1943]: New session 185 of user root.
Dec 26 20:31:41 pve3 systemd[1]: Started Session 185 of user root.
Dec 26 20:31:41 pve3 systemd[18443]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
Dec 26 20:31:41 pve3 systemd[18443]: Starting Paths.
Dec 26 20:31:41 pve3 systemd[18443]: Reached target Paths.
Dec 26 20:31:41 pve3 systemd[18443]: Starting Timers.
Dec 26 20:31:41 pve3 systemd[18443]: Reached target Timers.
Dec 26 20:31:41 pve3 systemd[18443]: Starting Sockets.
Dec 26 20:31:41 pve3 systemd[18443]: Reached target Sockets.
Dec 26 20:31:41 pve3 systemd[18443]: Starting Basic System.
Dec 26 20:31:41 pve3 systemd[18443]: Reached target Basic System.
Dec 26 20:31:41 pve3 systemd[18443]: Starting Default.
Dec 26 20:31:41 pve3 systemd[18443]: Reached target Default.
Dec 26 20:31:41 pve3 systemd[18443]: Startup finished in 7ms.
Dec 26 20:31:41 pve3 systemd[1]: Started User Manager for UID 0.
Dec 26 20:31:41 pve3 qm[18446]: <root@pam> starting task UPID:pve3:00004810:028E9FAE:567EEB1D:qmstop:202:root@pam:
Dec 26 20:31:41 pve3 qm[18448]: stop VM 202: UPID:pve3:00004810:028E9FAE:567EEB1D:qmstop:202:root@pam:
Dec 26 20:31:41 pve3 kernel: drbd vm-202-disk-1: role( Primary -> Secondary )
Dec 26 20:31:41 pve3 qm[18446]: <root@pam> end task UPID:pve3:00004810:028E9FAE:567EEB1D:qmstop:202:root@pam: OK
Dec 26 20:31:41 pve3 sshd[18439]: Received disconnect from 10.0.0.20: 11: disconnected by user
Dec 26 20:31:41 pve3 sshd[18439]: pam_unix(sshd:session): session closed for user root
Dec 26 20:31:41 pve3 systemd-logind[1943]: Removed session 185.
Dec 26 20:31:41 pve3 systemd[1]: Stopping User Manager for UID 0...
Dec 26 20:31:41 pve3 systemd[18443]: Stopping Default.
Dec 26 20:31:41 pve3 systemd[18443]: Stopped target Default.
Dec 26 20:31:41 pve3 systemd[18443]: Stopping Basic System.
Dec 26 20:31:41 pve3 systemd[18443]: Stopped target Basic System.
Dec 26 20:31:41 pve3 systemd[18443]: Stopping Paths.
Dec 26 20:31:41 pve3 systemd[18443]: Stopped target Paths.
Dec 26 20:31:41 pve3 systemd[18443]: Stopping Timers.
Dec 26 20:31:41 pve3 systemd[18443]: Stopped target Timers.
Dec 26 20:31:41 pve3 systemd[18443]: Stopping Sockets.
Dec 26 20:31:41 pve3 systemd[18443]: Stopped target Sockets.
Dec 26 20:31:41 pve3 systemd[18443]: Starting Shutdown.
Dec 26 20:31:41 pve3 systemd[18443]: Reached target Shutdown.
Dec 26 20:31:41 pve3 systemd[18443]: Starting Exit the Session...
Dec 26 20:31:41 pve3 systemd[18443]: Received SIGRTMIN+24 from PID 18451 (kill).
Dec 26 20:31:41 pve3 systemd[18444]: pam_unix(systemd-user:session): session closed for user root
Dec 26 20:31:41 pve3 systemd[1]: Stopped User Manager for UID 0.
Dec 26 20:31:41 pve3 systemd[1]: Stopping user-0.slice.
Dec 26 20:31:41 pve3 systemd[1]: Removed slice user-0.slice.
Dec 26 20:31:41 pve3 kernel: vmbr4: port 1(tap202i0) entered disabled state

Any ideas?
 
I noticed that when I shutdown the VM, and then migrate it to node 3, it will work as expected. But when the VM is online, I still cannot migrate it to node3. How is it possible that I can migrate the VM to node2 online, and not to node 3?
 
what is the output of

# cat /etc/pve/.members

root@pve30 ~ # cat /etc/pve/.members
{
"nodename": "pve30",
"version": 9,
"cluster": { "name": "cluster1", "version": 3, "nodes": 3, "quorate": 1 },
"nodelist": {
"pve10": { "id": 1, "online": 1, "ip": "10.0.0.10"},
"pve30": { "id": 3, "online": 1, "ip": "10.0.0.30"},
"pve20": { "id": 2, "online": 1, "ip": "10.0.0.20"}
}
}

root@pve20 ~ # cat /etc/pve/.members
{
"nodename": "pve20",
"version": 23,
"cluster": { "name": "cluster1", "version": 3, "nodes": 3, "quorate": 1 },
"nodelist": {
"pve10": { "id": 1, "online": 1, "ip": "10.0.0.10"},
"pve30": { "id": 3, "online": 1, "ip": "10.0.0.30"},
"pve20": { "id": 2, "online": 1, "ip": "10.0.0.20"}
}
}


root@pve10 ~ # cat /etc/pve/.members
{
"nodename": "pve10",
"version": 5,
"cluster": { "name": "cluster1", "version": 3, "nodes": 3, "quorate": 1 },
"nodelist": {
"pve10": { "id": 1, "online": 1, "ip": "10.0.0.10"},
"pve30": { "id": 3, "online": 1, "ip": "10.0.0.30"},
"pve20": { "id": 2, "online": 1, "ip": "10.0.0.20"}
}
}
 
looks OK. And you use same versions on all nodes?

What a smart idea... sometimes its good to look with other people to fix problems. That was the issue. Some time ago I updated the 3rd node while I didn't update the other nodes. After updating all nodes again, everything works smooth again.

Thanks.
 
What a smart idea... sometimes its good to look with other people to fix problems. That was the issue. Some time ago I updated the 3rd node while I didn't update the other nodes. After updating all nodes again, everything works smooth again.

Thanks.
Are you using DRBD9 for production environment ? is it stable ? have you ever face any problem ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!