HA Issues

ejc317

Member
Oct 18, 2012
263
0
16
Hi,

We're having this issue

VM100 HA on Node 1 - shuts off
VM100 HA on Node 2- migrates

The migration freezes. After a reboot, Vm100 shows up on Node 3 for 3 of the nodes but when you click it says no such VM, its actually on node 4 but only on node 4 does it show up ...

This repeats over and over and seems to be issue with HA

Also I see on node 1 D L M no local ip address has been set ...

___

So we debugged some more seems that we have to restart CMAN manually every time and the nodes are not starting rgmanager randomly - (due to cman not running right and not joining a fence domain) even though redhat-cluster-pve is shown to join ... any ideas? manually restarting cman is a pain


Also - I have another issue, when the server dies (we shut power off) - the ipmi fails but upon ipmik fail it just sits ther e- it doesnt move the vm to another node ...
 
Last edited:
looks like a configuration issue, post you cluster.conf.
 
Cluster.conf is fine - seems like the issue is

a) cman doesn't start auto on each node correctly
b) rgmanager starts randomly and therefore HA doesn't work right and the node during migrate gets "lost" until a full reboot

<?xml version="1.0"?>
<cluster config_version="9" name="PL-C-TT">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi1" passwd="root" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi2" passwd="root" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi3" passwd="root" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="root" name="ipmi4" passwd="root" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="001" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="002" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="003" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
<clusternode name="004" nodeid="4" votes="1">
<fence>
<method name="1">
<device name="ipmi4"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="100"/>
</rm>
</cluster>
 
IMHO, it is impossible to debug such things with such limited information. You first need to run a well defined test case, and then login to all those nodes and carefully analyze all the logs.
 
What servers are you using? on my Dells i had to remove the lanplus="1" from the ipmi agent, i also had simialr symptoms that you describe due to the non working HA.,
 
Holy nuts servers load all at 24 ... each .... and it keeps powercycling all my servers

I manually tried the commnad for fence_ipmilan and regular ipmitool with and without lanplus and both work - what does lanplus do for me in proxmox?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!