Fenced dos not operate

greyzlii

New Member
Mar 3, 2012
2
0
1
Hi proxmox Community !
First of all, excuse my poor English please.

I try to build a dual node proxmox cluster with shared storage (on a SAN).

First I modified the cman part on the cluster.conf.new file

<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>

Then I defined fencing device.

<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.214.11" login="xxxxxxxx" name="ipmi_tsbnproxmox01" passwd="xxxxxx" power_wait="60"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.214.12" login="xxxxx" name="ipmi_tsbnproxmox02" passwd="xxxx" power_wait="60"/>
</fencedevices>

And I modified nodes like this :

<clusternodes>
<clusternode name="tsbnproxmox01" nodeid="1" votes="1">
<fence>
<method name="ipmi">
<device action="reboot" name="ipmi_tsbnproxmox01"/>
</method>
</fence>
</clusternode>
<clusternode name="tsbnproxmox02" nodeid="2" votes="1">
<fence>
<method name="ipmi">
<device action="reboot" name="ipmi_tsbnproxmox02"/>
</method>
</fence>
</clusternode>
</clusternodes>

This configuration seems correct since I can fence a node with command fence_node

fence_node -vv tsbnproxmox02
fence tsbnproxmox02 dev 0.0 agent fence_ipmilan result: success
agent args: action=reboot nodename=tsbnproxmox02 agent=fence_ipmilan ipaddr=10.10.214.12 login=xxxx passwd=xxxx power_wait=60
fence tsbnproxmox02 success

My server is rebooted properly.

My problem is that fenced do not fence any node when a node fails.

I started the following command on the second node :

echo c > /proc/sysrq-trigger

Don't do that at home, it freezes the kernel :)

In syslog file, I obtain this information :

ar 3 17:47:57 tsbnproxmox01 corosync[724242]: [TOTEM ] A processor failed, forming new configuration.
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] CLM CONFIGURATION CHANGE
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] New Configuration:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] #011r(0) ip(10.10.214.111)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Left:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] #011r(0) ip(10.10.214.112)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Joined:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [QUORUM] Members[1]: 1
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] CLM CONFIGURATION CHANGE
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] New Configuration:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] #011r(0) ip(10.10.214.111)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Left:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Joined:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CPG ] chosen downlist: sender r(0) ip(10.10.214.111) ; members(old:2 left:1)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 3 17:47:59 tsbnproxmox01 pmxcfs[724636]: [dcdb] notice: members: 1/724636
Mar 3 17:47:59 tsbnproxmox01 pmxcfs[724636]: [dcdb] notice: members: 1/724636
Mar 3 17:47:59 tsbnproxmox01 kernel: dlm: closing connection to node 2

As you can see, fenced is dumb.
I tried to start fenced with -D switch, and when I crash the second node I just have these logs :

1330791209 cluster node 2 removed seq 416
1330791209 fenced:daemon conf 1 0 1 memb 1 join left 2
1330791209 fenced:daemon ring 1:416 1 memb 1

Is someone seeing why fenced do not do his job ?

Regards,
Nicolas