Problems with online migration

Akegata

Member
Sep 12, 2012
11
0
21
Whenever I try to do an online migration of a machine I get a very weird behavior.
The machine is migrated to the machine I specify, however, the vm stops responding and I the message "Error: migration problems" in the Tasks log.

It seems like the HA part of the migration works, but not the VM part. The popup that shows up when I start the migration job says:
Code:
[COLOR=#000000][FONT=tahoma]Executing HA migrate for VM 100 to node virt-server-7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Trying to migrate pvevm:100 to virt-server-7...Success[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK OK[/FONT][/COLOR]

Syslog has this to say:
Code:
Sep 12 15:30:11 virt-server-2 pvedaemon[26292]: <root@pam> starting task UPID:virt-server-2:0000678C:00034C33:50508E63:hamigrate:100:root@pam:
Sep 12 15:30:11 virt-server-2 rgmanager[4183]: Migrating pvevm:100 to virt-server-7
Sep 12 15:30:12 virt-server-2 pvevm: <root@pam> starting task UPID:virt-server-2:00006792:00034C72:50508E64:qmigrate:100:root@pam:
Sep 12 15:30:13 virt-server-2 rgmanager[26525]: [pvevm] Task still active, waiting
Sep 12 15:30:13 virt-server-2 pmxcfs[3860]: [status] notice: received log
Sep 12 15:30:14 virt-server-2 rgmanager[26545]: [pvevm] Task still active, waiting
Sep 12 15:30:14 virt-server-2 pmxcfs[3860]: [status] notice: received log
Sep 12 15:30:14 virt-server-2 pvedaemon[5068]: worker 5080 finished
Sep 12 15:30:14 virt-server-2 pvedaemon[5068]: starting 1 worker(s)
Sep 12 15:30:14 virt-server-2 pvedaemon[5068]: worker 26566 started
Sep 12 15:30:15 virt-server-2 rgmanager[26567]: [pvevm] Task still active, waiting
Sep 12 15:30:16 virt-server-2 rgmanager[26590]: [pvevm] Task still active, waiting
Sep 12 15:30:17 virt-server-2 multipathd: dm-4: add map (uevent)
Sep 12 15:30:17 virt-server-2 kernel: vmbr30: port 2(tap100i0) entering disabled state
Sep 12 15:30:17 virt-server-2 kernel: vmbr30: port 2(tap100i0) entering disabled state
Sep 12 15:30:17 virt-server-2 rgmanager[26623]: [pvevm] Task still active, waiting
Sep 12 15:30:19 virt-server-2 rgmanager[26663]: [pvevm] Task still active, waiting
Sep 12 15:30:19 virt-server-2 multipathd: dm-4: remove map (uevent)
Sep 12 15:30:19 virt-server-2 multipathd: dm-4: devmap not registered, can't remove
Sep 12 15:30:19 virt-server-2 multipathd: dm-4: remove map (uevent)
Sep 12 15:30:19 virt-server-2 multipathd: dm-4: devmap not registered, can't remove
Sep 12 15:30:20 virt-server-2 rgmanager[26785]: [pvevm] Task still active, waiting
Sep 12 15:30:20 virt-server-2 task UPID:virt-server-2:00006792:00034C72:50508E64:qmigrate:100:root@pam:: migration problems
Sep 12 15:30:20 virt-server-2 pvevm: <root@pam> end task UPID:virt-server-2:00006792:00034C72:50508E64:qmigrate:100:root@pam: migration problems
Sep 12 15:30:20 virt-server-2 rgmanager[4183]: Migration of pvevm:100 to virt-server-7 completed
Sep 12 15:30:20 virt-server-2 pvedaemon[26292]: <root@pam> end task UPID:virt-server-2:0000678C:00034C33:50508E63:hamigrate:100:root@pam: OK
Sep 12 15:30:24 virt-server-2 ntpd[3793]: Deleting interface #77 tap100i0, fe80::8c93:a4ff:feec:193a#123, interface stats: received=0, sent=0, droppe
d=0, active_time=300 secs

I have no idea how to troubleshoot this.
Online migration worked fine in this environment yesterday. The only thing I have done since that I can think of is removing a VG and adding a new, bigger one.
I can't really see how that should cause this problem though, and "migration problems" doesn't really tell me much about what's going on.

I'm running proxmox ve 2.1.
 
Ah, I see. Double clicking the event gave me some more info.
I got the same info as that gave me when I removed HA from the machine and tried the migration again.
The relevant error was this:
Code:
[COLOR=#000000][FONT=tahoma]channel 3: open failed: administratively prohibited: open failed[/FONT][/COLOR]

Finding that made me realize this is actually a ssh issue. The problem was that AllowTcpForwarding was set to "no", which apparently prevents online migration from functioning.
I changed that to yes on all the nodes and restarted ssh, now it works again.