Fencing, HA with Dell M1000e. Stop Power and doesn't work.

Feb 6, 2013
47
0
6
41
Trujillo
Hi,

I'll try to put my cluster with HA. I have 4 nodes. I changed cluster.conf and validate. My cluster.conf: http://pastebin.com/BcymtFCx
If I launch fence_drac5: fence_drac5 -m server-1 -l root -p dc4CEYaB -a 192.168.3.44 -o list -x.... it works correctly.
But if I execute "fence_node cetamox-01 -vv", I obtain this:
fence cetamox-01 dev 0.0 agent fence_drac5 result: error from agent
agent args: nodename=cetamox-01 agent=fence_drac5 ipaddr= 192.168.9.173 login=root module_name=server-1 passwd=password secure=1
fence cetamox-01 failed


When cluster is running and I stop the RGManager Service in a node... it works correctly and VW are start on another node. But if I interrupt the power in a node with VM it doesn't works and VM stops.

I have 2 LVM groups like storage elements and are shared correctly.

Have you got any idea about my problem??.

Thanks.
 
Last edited:
..
But if I execute "fence_node cetamox-01 -vv", I obtain this:
fence cetamox-01 dev 0.0 agent fence_drac5 result: error from agent
agent args: nodename=cetamox-01 agent=fence_drac5 ipaddr= 192.168.9.173 login=root module_name=server-1 passwd=w6TCwxyU secure=1
fence cetamox-01 failed


...

fencing must works, otherwise HA will not work. check your cluster.conf, as far as I see you configured the same IP address for all 4 fencing devices - this looks not ok for me. also the password is not set, ...
 
Hi Tom,

I put the different IP of Blades and pass.... it doesn't work.
Then I put de iDrac IP and DRAC pass.... it doesn't work.

The software is IDRAC7. I think that it doesn't matter.

Buf if I launch

fence_drac5 -m Server-1 -l root -p password -a 192.168.9.173 -o list -x.... it works correctly.

Have you got any idea??
 
Is it 'server-1' or 'Server-1'?

- - - Updated - - -

fence_drac5 -m Server-1 -l root -p password -a 192.168.9.173 -o list -x.... it works correctly.

And reboot also works?

fence_drac5 -m Server-1 -l root -p password -a 192.168.9.173 -o reboot -x
 
I have the new configuration for Fencing (with ipmilan). And I have the VM with HA activated.
When I tryed to launch fence_node, NODE; the node goes to reboot and VMs starts in other NODE. But when I shutdown a node with power off directly AC, the VM's doesn't restart in other NODE.

is it normal?
 
ok, I think that the fencing is OK now. I need to put on cluster.conf, in argument for fence_ipmilan -> option=monitor and fence works perfectly. But now, I have other problem to do HA. When I shutdown by power off A/C one node, the other 3 nodes starts VMs from that NODE. But if I shutdown other NODE.... I only have a cluster with two nodes and appears a message that says: Quorum Dissolved and I cant starts this VM's.
Could I solved it??

I'll try with expected_votes="1" or/and two_node="1", but it doesn't work.
 
ok, I think that the fencing is OK now. I need to put on cluster.conf, in argument for fence_ipmilan -> option=monitor and fence works perfectly. But now, I have other problem to do HA. When I shutdown by power off A/C one node, the other 3 nodes starts VMs from that NODE.

yes, thats the idea of our HA implementation. as soon as you mark a VM as HA, the rgmanager make sure that the VM is always on.

if you do not want that VMs are managed in that way, don´t mark them as HA.

But if I shutdown other NODE.... I only have a cluster with two nodes and appears a message that says: Quorum Dissolved and I cant starts this VM's.
Could I solved it??

I'll try with expected_votes="1" or/and two_node="1", but it doesn't work.

If you have 3 nodes in total, the cluster is operational as long as two nodes are up. if you loose two nodes, then only one is left and therefor you lost quorum.