VE 3.2 3 node cluster problems with HA

Marius Matei

Renowned Member
Jun 23, 2014
13
0
66
Bucharest, Romania, Romania
Hello guys,

I'm having some issues with a fresh cluster.

I've set up a 3 node cluster using supermicro servers and iSCSI/LVM storage.
Fencing seems to be working fine.
KVM Live migration works fine untill I configure HA on the VM's. Something is wrong and I cannot seem to locate the problem.
It's all chaotic: HA migration fails with code 250, VM's are automatically "started" by the resource manager but they never come online, etc.
HA doesen't work AT ALL unless I add the VM to HA in "stopped" state and let HA autostart it.

I think it's all tied with "service:ha_test_ip". I used the standard IPMI fencing config from the wiki.

Code:
<?xml version="1.0"?>
<cluster config_version="54" name="HM">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="10.100.0.101" lanplus="1" login="ADMIN" name="ipmi1" passwd="ADMIN" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.100.0.102" lanplus="1" login="ADMIN" name="ipmi2" passwd="ADMIN" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.100.0.103" lanplus="1" login="ADMIN" name="ipmi3" passwd="ADMIN" power_wait="5"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="pm01" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pm02" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pm03" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi3"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <service autostart="1" exclusive="0" name="ha_test_ip" recovery="relocate">
      <ip address="8.8.8.8"/>
    </service>
  </rm>
</cluster>

This is my clustat output:

Code:
Cluster Status for HM @ Mon Jun 23 20:40:32 2014
Member Status: Quorate


 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 pm01                                                                1 Online, Local, rgmanager
 pm02                                                                2 Online, rgmanager
 pm03                                                                3 Online, rgmanager


 Service Name                                                     Owner (Last)                                                     State
 ------- ----                                                     ----- ------                                                     -----
 service:ha_test_ip                                               (pm03)                                                           failed

It says failed.

I've tried "clusvcadm -R service:ha_test_ip" but it just says "Local machine trying to restart service:ha_test_ip...Failure" on any of the 3 nodes.
Nothing in the logs.

I'm out of ideas.
If anyone has any input I would very much appreciate it.

Best regards,
Marius