Concurrent migration fails because port is in use

chrisw · May 8, 2013

Hello everybody,

We have a two-node-cluster with Proxmox 2.3-13 using DRBD. Fail-over works, but after the failed node is back and unfenced, the second migration back to the original node fails:

Code:

task started by HA resource agent
May 07 13:53:21 starting migration of VM 103 to node 'vhost2' (10.0.0.102)
May 07 13:53:21 copying disk images
May 07 13:53:21 starting VM 103 on remote node 'vhost2'
May 07 13:53:23 starting migration tunnel
bind: Address already in use


channel_setup_fwd_listener: cannot listen to port: 60000


Could not request local forwarding.


May 07 13:53:24 starting online/live migration on port 60000
May 07 13:53:24 migrate_set_speed: 8589934592
May 07 13:53:24 migrate_set_downtime: 0.1
May 07 13:53:26 ERROR: online migrate failure - aborting
May 07 13:53:26 aborting phase 2 - cleanup resources
May 07 13:53:26 migrate_cancel
May 07 14:03:30 ERROR: migration finished with problems (duration 00:10:10)
TASK ERROR: migration problems

The first migration worked without problems. There seems to be a race condition if two migrations are concurrent, the port is not incremented. Is there any option to delay the migration by a few seconds? Or is another work-around available? Any help is appreciated because we really need to re-balance the VMs after a node comes online.

Kind regards,
Chris

This is our cluster.conf:

Code:

<?xml version="1.0"?>
<cluster config_version="9" name="testcluster">
  <cman two_node="1" expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.5" name="switch_a1" snmp_version="2c"/>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.6" name="switch_a2" snmp_version="2c"/>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.7" name="switch_b1" snmp_version="2c"/>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.8" name="switch_b2" snmp_version="2c"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="vhost1" nodeid="1" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="switch_b1" port="35"/>
          <device action="off" name="switch_b2" port="38"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="vhost2" nodeid="2" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="switch_a1" port="37"/>
          <device action="off" name="switch_a2" port="42"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <failoverdomains>
      <failoverdomain name="domain1" nofailback="0">
        <failoverdomainnode name="vhost1" priority="1"/>
      </failoverdomain>
      <failoverdomain name="domain2" nofailback="0">
        <failoverdomainnode name="vhost2" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <pvevm autostart="1" vmid="102" domain="domain1" />
    <pvevm autostart="1" vmid="103" domain="domain2"/>
    <pvevm autostart="1" vmid="106" domain="domain1" />
    <pvevm autostart="1" vmid="107" domain="domain2" />
  </rm>
</cluster>

dietmar · May 8, 2013

Please can you file a bug at bugzilla.proxmox.com

chrisw · May 8, 2013

Hi Dietmar,

the bug report is found here: https://bugzilla.proxmox.com/show_bug.cgi?id=381

Kind regards,
Chris

chrisw · May 13, 2013

Hi Dietmar,

thanks for your comment in the bug tracker. We upgraded to Proxmox VE 3.0 RC1, but it is still using qemu-server 3.0-8. Is 3.0-10 anywhere available for testing, or will there be a RC2 soon?

Kind regards,
Chris

dietmar · May 13, 2013

chrisw said:
Is 3.0-10 anywhere available for testing, or will there be a RC2 soon?

will upload soon.

Search

Search

Concurrent migration fails because port is in use

chrisw

New Member

dietmar

Proxmox Staff Member

chrisw

New Member

chrisw

New Member

dietmar

Proxmox Staff Member