no failback when using failover domains on two node cluster

acidrop

Renowned Member
Jul 17, 2012
204
7
83
Hello

I'm experimenting on two node cluster with drbd and lvm on top for vms.
I don't have a fencing device for the time so I'm using fence_manual and fence_ack_manual command to simulate fencing.Everything works as expected .The only difficulty I have is with failover domains. I have 5 HA vms. 3 of them running on node B and 2 running on Node A. When I reset node B and give "fence_ack_manual nodeB" from nodeA, the 3 vms located on nodeB are started on nodeA as expected.But when NodeB comes up, these 3 vms are not relocated back on nodeB as I wish to.I'm posting my cluster.conf:

Code:
<?xml version="1.0"?><cluster config_version="22" name="pvecluster">
  <cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_manual" name="human"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="proxmox" nodeid="1" votes="1">
      <fence>
        <method name="single">
          <device name="human" nodename="proxmox"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="proxmox2" nodeid="2" votes="1">
      <fence>
        <method name="single">
          <device name="human" nodename="proxmox2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
<failoverdomains>
   <failoverdomain name="ordered" nofailback="0" ordered="0" restricted="1">
      <failoverdomainnode name="proxmox"  priority="1"/>
      <failoverdomainnode name="proxmox2" priority="1"/>
   </failoverdomain>
</failoverdomains>
     <pvevm domain="ordered" autostart="1" vmid="111" recovery="relocate"/>
     <pvevm domain="ordered" autostart="1" vmid="107" recovery="relocate"/>
     <pvevm domain="ordered" autostart="1" vmid="108" recovery="relocate"/>
     <pvevm domain="ordered" autostart="1" vmid="110" recovery="relocate"/>
     <pvevm domain="ordered" autostart="1" vmid="112" recovery="relocate"/>
  </rm>
</cluster>

thank you
 
Hello acidrop

But when NodeB comes up, these 3 vms are not relocated back on nodeB as I wish to.I'm posting my cluster.conf:

This is the usual behavior - how and where the machines move in HA cluster cannot be controlled by user/administrator - but it should be possible to migrate them (by vm´s contex menu in web gui) back to nodeB.

Kind regards

Mr.Holmes
 
Last edited:
.
.
<failoverdomains>
<failoverdomain name="ordered1" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="proxmox1" priority="1"/>
</failoverdomain>
<failoverdomain name="ordered2" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="proxmox2" priority="1"/>
</failoverdomain>
</failoverdomains>
<pvevm domain="ordered1" autostart="1" vmid="110" recovery="relocate"/>
<pvevm domain="ordered2" autostart="1" vmid="112" recovery="relocate"/>
.
.
.
 
Hello

I'm experimenting on two node cluster with drbd and lvm on top for vms.
I don't have a fencing device for the time so I'm using fence_manual and fence_ack_manual command to simulate fencing.Everything works as expected .The only difficulty I have is with failover domains. I have 5 HA vms. 3 of them running on node B and 2 running on Node A. When I reset node B and give "fence_ack_manual nodeB" from nodeA, the 3 vms located on nodeB are started on nodeA as expected.But when NodeB comes up, these 3 vms are not relocated back on nodeB as I wish to.

I don't believe that "automatic failback" for "failover domains" is a good idea for use in your case.

This is due to that DRBD need time for sync the HDDs, and if the "automatic failback" does his work before that the HDDs are perfectly synchronized, you will get a great amount of error messages, also his virtual disks may become unusable.

My great suggestion is that you never use this "automatic failback" with DRBD, and always that you want use "live migration", previously you should verify that the DRBD volumes are synchronized.

Best regards
Cesar