Failover domain failback doesn't work.

aTan

Renowned Member
Mar 22, 2013
43
4
73
Hi. I've set up a failover domain for VMs. When node1 crashes VMs are relocated to node2. It works fine (it'd worked before I've created failover domain). But when node1 recovers, VMs are not relocated back to it.


Code:
  <rm>
    <pvevm autostart="1" vmid="1000" domain="vmdomain"/>
    <pvevm autostart="1" vmid="1001" domain="vmdomain"/>
    <failoverdomains>
      <failoverdomain name="vmdomain" nofailback="0" ordered="1" restricted="0">
        <failoverdomainnode name="node1" priority="1"/>
        <failoverdomainnode name="node2" priority="2"/>
      </failoverdomain>
    </failoverdomains>
  </rm>
 
pls post the full cluster.conf
 
Code:
<?xml version="1.0"?>
<cluster config_version="12" name="stcluster">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey" />
  <quorumd allow_kill="0" interval="3" label="proxmox_qdisk" tko="10">
    <heuristic interval="3" program="ping $GATEWAY -c1 -w1" score="1" tko="4"/>
    <heuristic interval="3" program="ip addr | grep vmbr0 | grep -q UP" score="2" tko="3"/>
  </quorumd>
  <totem token="54000"/>
  <clusternodes>
    <clusternode name="prox1" nodeid="1" votes="1">
      <fence>
        <method name="vmware">
          <device action="reboot" name="fence_vmware_201" port="prox1" ssl="on" uuid="5012a823-43cb-a546-8a53-d4fba3f0b7a9"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="prox2" nodeid="2" votes="1">
      <fence>
        <method name="vmware">
          <device action="reboot" name="fence_vmware_202" port="prox2" ssl="on" uuid="50128fae-59b2-85e3-4d16-ddc80b030aa9"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_vmware_soap" ipaddr="10.0.0.201" login="root" name="fence_vmware_201" passwd="xxx"/>
    <fencedevice agent="fence_vmware_soap" ipaddr="10.0.0.202" login="root" name="fence_vmware_202" passwd="xxx"/>
  </fencedevices>
  <rm>
    <pvevm autostart="1" vmid="1000" domain="stservices"/>
    <pvevm autostart="1" vmid="1001" domain="stservices"/>
    <failoverdomains>
      <failoverdomain name="stservices" nofailback="0" ordered="1" restricted="0">
        <failoverdomainnode name="prox1" priority="1"/>
        <failoverdomainnode name="prox2" priority="2"/>
      </failoverdomain>
    </failoverdomains>
  </rm>
</cluster>
 
May guess is that the heuristics gives quorum while you are not quorate. I would use quorumd without heuristics (so that master_wins=1 gets activated),
 
This is it. After removing the first heuristic rule it tries to migrate it back. Thank you.
 
hello aTan,...

if you find out your solution, could you please to post the full cluster.conf again ? so that I can also see the solution and where is the wrong. thanks a lot
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!