Hello everybody,
We have a two-node-cluster with Proxmox 2.3-13 using DRBD. Fail-over works, but after the failed node is back and unfenced, the second migration back to the original node fails:
The first migration worked without problems. There seems to be a race condition if two migrations are concurrent, the port is not incremented. Is there any option to delay the migration by a few seconds? Or is another work-around available? Any help is appreciated because we really need to re-balance the VMs after a node comes online.
Kind regards,
Chris
This is our cluster.conf:
We have a two-node-cluster with Proxmox 2.3-13 using DRBD. Fail-over works, but after the failed node is back and unfenced, the second migration back to the original node fails:
Code:
task started by HA resource agent
May 07 13:53:21 starting migration of VM 103 to node 'vhost2' (10.0.0.102)
May 07 13:53:21 copying disk images
May 07 13:53:21 starting VM 103 on remote node 'vhost2'
May 07 13:53:23 starting migration tunnel
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 60000
Could not request local forwarding.
May 07 13:53:24 starting online/live migration on port 60000
May 07 13:53:24 migrate_set_speed: 8589934592
May 07 13:53:24 migrate_set_downtime: 0.1
May 07 13:53:26 ERROR: online migrate failure - aborting
May 07 13:53:26 aborting phase 2 - cleanup resources
May 07 13:53:26 migrate_cancel
May 07 14:03:30 ERROR: migration finished with problems (duration 00:10:10)
TASK ERROR: migration problems
The first migration worked without problems. There seems to be a race condition if two migrations are concurrent, the port is not incremented. Is there any option to delay the migration by a few seconds? Or is another work-around available? Any help is appreciated because we really need to re-balance the VMs after a node comes online.
Kind regards,
Chris
This is our cluster.conf:
Code:
<?xml version="1.0"?>
<cluster config_version="9" name="testcluster">
<cman two_node="1" expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.5" name="switch_a1" snmp_version="2c"/>
<fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.6" name="switch_a2" snmp_version="2c"/>
<fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.7" name="switch_b1" snmp_version="2c"/>
<fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.8" name="switch_b2" snmp_version="2c"/>
</fencedevices>
<clusternodes>
<clusternode name="vhost1" nodeid="1" votes="1">
<fence>
<method name="fence">
<device action="off" name="switch_b1" port="35"/>
<device action="off" name="switch_b2" port="38"/>
</method>
</fence>
</clusternode>
<clusternode name="vhost2" nodeid="2" votes="1">
<fence>
<method name="fence">
<device action="off" name="switch_a1" port="37"/>
<device action="off" name="switch_a2" port="42"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<failoverdomains>
<failoverdomain name="domain1" nofailback="0">
<failoverdomainnode name="vhost1" priority="1"/>
</failoverdomain>
<failoverdomain name="domain2" nofailback="0">
<failoverdomainnode name="vhost2" priority="1"/>
</failoverdomain>
</failoverdomains>
<pvevm autostart="1" vmid="102" domain="domain1" />
<pvevm autostart="1" vmid="103" domain="domain2"/>
<pvevm autostart="1" vmid="106" domain="domain1" />
<pvevm autostart="1" vmid="107" domain="domain2" />
</rm>
</cluster>
Last edited: