HA Issues

ejc317 · Nov 17, 2012

Hi,

We're having this issue

VM100 HA on Node 1 - shuts off
VM100 HA on Node 2- migrates

The migration freezes. After a reboot, Vm100 shows up on Node 3 for 3 of the nodes but when you click it says no such VM, its actually on node 4 but only on node 4 does it show up ...

This repeats over and over and seems to be issue with HA

Also I see on node 1 D L M no local ip address has been set ...

___

So we debugged some more seems that we have to restart CMAN manually every time and the nodes are not starting rgmanager randomly - (due to cman not running right and not joining a fence domain) even though redhat-cluster-pve is shown to join ... any ideas? manually restarting cman is a pain

Also - I have another issue, when the server dies (we shut power off) - the ipmi fails but upon ipmik fail it just sits ther e- it doesnt move the vm to another node ...

tom · Nov 17, 2012

looks like a configuration issue, post you cluster.conf.

ejc317 · Nov 17, 2012

Cluster.conf is fine - seems like the issue is

a) cman doesn't start auto on each node correctly
b) rgmanager starts randomly and therefore HA doesn't work right and the node during migrate gets "lost" until a full reboot

<?xml version="1.0"?>
<cluster config_version="9" name="PL-C-TT">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi1" passwd="root" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi2" passwd="root" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi3" passwd="root" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi4" passwd="root" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="001" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="002" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="003" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
<clusternode name="004" nodeid="4" votes="1">
<fence>
<method name="1">
<device name="ipmi4"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="100"/>
</rm>
</cluster>

dietmar · Nov 17, 2012

IMHO, it is impossible to debug such things with such limited information. You first need to run a well defined test case, and then login to all those nodes and carefully analyze all the logs.

hotwired007 · Nov 19, 2012

What servers are you using? on my Dells i had to remove the lanplus="1" from the ipmi agent, i also had simialr symptoms that you describe due to the non working HA.,

ejc317 · Nov 19, 2012

hotwired007 said:
What servers are you using? on my Dells i had to remove the lanplus="1" from the ipmi agent, i also had simialr symptoms that you describe due to the non working HA.,

What happens when u turn off lan plus?

ejc317 · Nov 19, 2012

Holy nuts servers load all at 24 ... each .... and it keeps powercycling all my servers

I manually tried the commnad for fence_ipmilan and regular ipmitool with and without lanplus and both work - what does lanplus do for me in proxmox?

Search

Search

HA Issues

ejc317

Member

tom

Proxmox Staff Member

ejc317

Member

dietmar

Proxmox Staff Member

hotwired007

Member

ejc317

Member

ejc317

Member

We value your privacy