Hi proxmox Community !
First of all, excuse my poor English please.
I try to build a dual node proxmox cluster with shared storage (on a SAN).
First I modified the cman part on the cluster.conf.new file
Then I defined fencing device.
And I modified nodes like this :
This configuration seems correct since I can fence a node with command fence_node
My server is rebooted properly.
My problem is that fenced do not fence any node when a node fails.
I started the following command on the second node :
Don't do that at home, it freezes the kernel
In syslog file, I obtain this information :
As you can see, fenced is dumb.
I tried to start fenced with -D switch, and when I crash the second node I just have these logs :
Is someone seeing why fenced do not do his job ?
Regards,
Nicolas
First of all, excuse my poor English please.
I try to build a dual node proxmox cluster with shared storage (on a SAN).
First I modified the cman part on the cluster.conf.new file
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
Then I defined fencing device.
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.214.11" login="xxxxxxxx" name="ipmi_tsbnproxmox01" passwd="xxxxxx" power_wait="60"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.214.12" login="xxxxx" name="ipmi_tsbnproxmox02" passwd="xxxx" power_wait="60"/>
</fencedevices>
And I modified nodes like this :
<clusternodes>
<clusternode name="tsbnproxmox01" nodeid="1" votes="1">
<fence>
<method name="ipmi">
<device action="reboot" name="ipmi_tsbnproxmox01"/>
</method>
</fence>
</clusternode>
<clusternode name="tsbnproxmox02" nodeid="2" votes="1">
<fence>
<method name="ipmi">
<device action="reboot" name="ipmi_tsbnproxmox02"/>
</method>
</fence>
</clusternode>
</clusternodes>
This configuration seems correct since I can fence a node with command fence_node
fence_node -vv tsbnproxmox02
fence tsbnproxmox02 dev 0.0 agent fence_ipmilan result: success
agent args: action=reboot nodename=tsbnproxmox02 agent=fence_ipmilan ipaddr=10.10.214.12 login=xxxx passwd=xxxx power_wait=60
fence tsbnproxmox02 success
My server is rebooted properly.
My problem is that fenced do not fence any node when a node fails.
I started the following command on the second node :
echo c > /proc/sysrq-trigger
Don't do that at home, it freezes the kernel
In syslog file, I obtain this information :
ar 3 17:47:57 tsbnproxmox01 corosync[724242]: [TOTEM ] A processor failed, forming new configuration.
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] CLM CONFIGURATION CHANGE
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] New Configuration:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] #011r(0) ip(10.10.214.111)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Left:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] #011r(0) ip(10.10.214.112)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Joined:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [QUORUM] Members[1]: 1
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] CLM CONFIGURATION CHANGE
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] New Configuration:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] #011r(0) ip(10.10.214.111)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Left:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CLM ] Members Joined:
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [CPG ] chosen downlist: sender r(0) ip(10.10.214.111) ; members(old:2 left:1)
Mar 3 17:47:59 tsbnproxmox01 corosync[724242]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 3 17:47:59 tsbnproxmox01 pmxcfs[724636]: [dcdb] notice: members: 1/724636
Mar 3 17:47:59 tsbnproxmox01 pmxcfs[724636]: [dcdb] notice: members: 1/724636
Mar 3 17:47:59 tsbnproxmox01 kernel: dlm: closing connection to node 2
As you can see, fenced is dumb.
I tried to start fenced with -D switch, and when I crash the second node I just have these logs :
1330791209 cluster node 2 removed seq 416
1330791209 fenced:daemon conf 1 0 1 memb 1 join left 2
1330791209 fenced:daemon ring 1:416 1 memb 1
Is someone seeing why fenced do not do his job ?
Regards,
Nicolas