HA Cluster IP no longer accessible after outage

cscracker · Jun 25, 2015

I have my 4-node HA cluster up and running, and it's been working great. Last night I had a power issue (malfunctioning UPS) which caused my cluster network to go down. All nodes lost quorum. After restoring the network, everything came back up just fine, except that now I can no longer access the cluster IP. Each node works fine, they all talk to each other, and show no visible errors. I can migrate machines and everything looks happy, so I'm not sure what to do. Just in case, I rebooted each node in the cluster one at a time and it's still not accessible. Where should I look from here?

Here's my cluster.conf (passwords changed):

Code:

root@pve1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="7" name="c6100-cluster-1">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="10.99.99.21" lanplus="1" login="root" name="ipmi1" passwd="asdf" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.99.99.22" lanplus="1" login="root" name="ipmi2" passwd="asdf" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.99.99.23" lanplus="1" login="root" name="ipmi3" passwd="asdf" power_wait="5"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.99.99.24" lanplus="1" login="root" name="ipmi4" passwd="asdf" power_wait="5"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="pve1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pve2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pve3" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi3"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pve4" nodeid="4" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi4"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <service autostart="1" exclusive="0" name="HAClusterIP" recovery="relocate">
      <ip address="192.168.1.25"/>
    </service>
    <pvevm autostart="1" vmid="105"/>
    <pvevm autostart="1" vmid="103"/>
  </rm>
</cluster>

dietmar · Jun 25, 2015

are pve service running? (pveproxy, pvedaemon, pve-cluster)

cscracker · Jun 26, 2015

Yes, all the services listed in the web interface are running on all servers, as well as pveproxy and pvedaemon.

Search

Search

HA Cluster IP no longer accessible after outage

cscracker

New Member

dietmar

Proxmox Staff Member

cscracker

New Member

We value your privacy