VE 3.2 3 node cluster problems with HA

Marius Matei

Renowned Member
Jun 23, 2014
13
0
66
Bucharest, Romania, Romania
Hello guys,

I'm having some issues with a fresh cluster.

I've set up a 3 node cluster using supermicro servers and iSCSI/LVM storage.
Fencing seems to be working fine.
KVM Live migration works fine untill I configure HA on the VM's. Something is wrong and I cannot seem to locate the problem.
It's all chaotic: HA migration fails with code 250, VM's are automatically "started" by the resource manager but they never come online, etc.
HA doesen't work AT ALL unless I add the VM to HA in "stopped" state and let HA autostart it.

I think it's all tied with "service:ha_test_ip". I used the standard IPMI fencing config from the wiki.

Code:
<?xml version="1.0"?>
<cluster config_version="54" name="HM">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="10.100.0.101" lanplus="1" login="ADMIN" name="ipmi1" passwd="ADMIN" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.100.0.102" lanplus="1" login="ADMIN" name="ipmi2" passwd="ADMIN" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.100.0.103" lanplus="1" login="ADMIN" name="ipmi3" passwd="ADMIN" power_wait="5"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="pm01" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pm02" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pm03" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi3"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <service autostart="1" exclusive="0" name="ha_test_ip" recovery="relocate">
      <ip address="8.8.8.8"/>
    </service>
  </rm>
</cluster>

This is my clustat output:

Code:
Cluster Status for HM @ Mon Jun 23 20:40:32 2014
Member Status: Quorate


 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 pm01                                                                1 Online, Local, rgmanager
 pm02                                                                2 Online, rgmanager
 pm03                                                                3 Online, rgmanager


 Service Name                                                     Owner (Last)                                                     State
 ------- ----                                                     ----- ------                                                     -----
 service:ha_test_ip                                               (pm03)                                                           failed

It says failed.

I've tried "clusvcadm -R service:ha_test_ip" but it just says "Local machine trying to restart service:ha_test_ip...Failure" on any of the 3 nodes.
Nothing in the logs.

I'm out of ideas.
If anyone has any input I would very much appreciate it.

Best regards,
Marius
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!