Fencing problem. fence_ipmilan: Failed

megap

New Member
Oct 1, 2014
20
0
1
Good morning to all.

I'm configuring a cluster with two nodes and a quorum disk and I my fencing is not working.

I'm using fence_ipmilan but I have problems. In each node I configured ipmitool and tested and it's working:

Code:
root@node1:~# ipmitool -I lanplus -H 192.168.150.43 -U user -P pass -v chassis power status
Chassis Power is on.

root@node2:~# ipmitool -I lanplus -H 192.168.150.33 -U user -P pass -v chassis power status
Chassis Power is on

If I launch fence_node for test the fencing I get an error in each node:

Code:
root@node1:~# fence_node node1-vv
fence node1 dev 0.0 agent fence_ipmilan result: error from agent
agent args: action=reboot nodename=gestion1 agent=fence_ipmilan ipaddr=192.168.150.33 login=user passwd=pass power_wait=60
fence node1 failed

root@node2:~# fence_node node2 -vv
fence gestion1 dev 0.0 agent fence_ipmilan result: error from agent
agent args: action=reboot nodename=node2 agent=fence_ipmilan ipaddr=192.168.150.43 login=user passwd=user power_wait=60
fence node2 failed

This is my cluster.conf:

Code:
<?xml version="1.0"?>
<cluster config_version="16" name="clustergestion">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" name="fenceGestion1" ipaddr="192.168.150.33" login="user" passwd="pass" power_wait="60"/>
    <fencedevice agent="fence_ipmilan" name="fenceGestion2" ipaddr="192.168.150.43" login="user" passwd="pass" power_wait="60"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

fence_ipmilan test fails too:

Code:
root@node2:~# fence_ipmilan -l user -p pass -a 192.168.150.33 -o reboot -vv
INFO:root:Delay 0 second(s) before logging in to the fence device
INFO:root:Executing: /usr/bin/ipmitool -I lan -H 192.168.150.33 -U user -P pass [B]-C 0 -p 623[/B] -L ADMINISTRATOR chassis power status


DEBUG:root:1  Get Session Challenge command failed
Error: Unable to establish LAN session
Unable to get Chassis Power Status




ERROR:root:Failed: Unable to obtain correct plug status or plug is not available

Maybe is some problem with -C 0 -p 623, but I don't know what is this valors:

Code:
root@node2:~# /usr/bin/ipmitool -I lanplus -H 192.168.150.33 -U user -P user -L ADMINISTRATOR chassis power status
Chassis Power is on


My cluster status:

Code:
root@node2:~# clustat
Cluster Status for clustergestion @ Tue Oct 14 11:15:22 2014Member Status: Quorate


 Member Name                                                ID   Status
 ------ ----                                                      ----        ------
 node1                                                       1 Online, Local, rgmanager
 node2                                                       2 Online, rgmanager
 /dev/block/8:33                                            0 Online, Quorum Disk


 Service Name                                      Owner (Last)                                      State
 ------- ----                                      ----- ------                                      -----
 pvevm:100                                         node2                                          started

Proxmox version on node1 and node2:

Code:
root@gestion2:~# pveversion --verbose
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.3-1 (running version: 3.3-1/a06c9f73)
pve-kernel-2.6.32-32-pve: 2.6.32-136
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-23
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1


¿Somebody can help me?

Best regards.
 
Seems it works if you use '-I lanplus' on the command line. The correspondig property in cluster.conf
is:

lanplus="1"

Or use the CLI with -P flag

# fence_ipmilan -P ...

see

# man fence_ipmilan
 
I changed my cluster.conf to:

Code:
<?xml version="1.0"?>
<cluster config_version="17" name="clustergestion">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" [B]lanplus="1"[/B] name="fenceGestion1" ipaddr="192.168.150.33" login="user" passwd="pass" power_wait="60"/>
    <fencedevice agent="fence_ipmilan" [B]lanplus="1"[/B] name="fenceGestion2" ipaddr="192.168.150.43" login="user" passwd="pass" power_wait="60"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

If I lauch fence_node still not working:
Code:
root@node1:~# fence_node node2 -vv
fence node2 dev 0.0 agent fence_ipmilan result: error from agent
agent args: action=reboot nodename=node2 agent=fence_ipmilan lanplus=1 ipaddr=192.168.150.43 login=user passwd=pass power_wait=60
fence node2 failed

Another fail test with fence_ipmilan:

Code:
root@node1:~# fence_ipmilan -P -l user -p pass -a 192.168.150.43 -o reboot -vv
INFO:root:Delay 0 second(s) before logging in to the fence device
INFO:root:Executing: /usr/bin/ipmitool -I lanplus -H 192.168.150.43 -U user -P pass -C 0 -p 623 -L ADMINISTRATOR chassis power status


DEBUG:root:1  Error in open session response message : no matching cipher suite


Error: Unable to establish IPMI v2 / RMCP+ session
Unable to get Chassis Power Status
ERROR:root:Failed: Unable to obtain correct plug status or plug is not available

Code:
root@node1:~# ipmitool -I lanplus -H 192.168.150.43 -U user -P user [B]-C 0 -p 623[/B] -L ADMINISTRATOR chassis power status
Error in open session response message : no matching cipher suite


Error: Unable to establish IPMI v2 / RMCP+ session
Unable to get Chassis Power Status

The same test withouth -C 0 -p 623 is working:

Code:
root@node1:~# ipmitool -I lanplus -H 192.168.150.43 -U user -P user -L ADMINISTRATOR chassis power status  
Chassis Power is on

Two nodes are joined to fence:

Code:
root@node1:~# fence_tool lsfence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2

Some ideas?

Thanks in advance!
 
I added cipher="1" on cluster.conf and fence_node node1 -vv is working:

Code:
<?xml version="1.0"?>
<cluster config_version="17" name="clustergestion">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" lanplus="1" [B]cipher="1"[/B] name="fenceGestion1" ipaddr="192.168.150.33" login="user" passwd="pass" power_wait="60"/>
    <fencedevice agent="fence_ipmilan" lanplus="1" [B]cipher="1"[/B] name="fenceGestion2" ipaddr="192.168.150.43" login="user" passwd="pass" power_wait="60"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fenceGestion1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fenceGestion2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

Code:
root@node2:~# fence_node node1 -vv
fence node1 dev 0.0 agent fence_ipmilan result: success
agent args: action=off nodename=node1 agent=fence_ipmilan lanplus=1 cipher=1 ipaddr=192.168.150.33 login=user passwd=user power_wait=60
fence node1 success

When I launch fence_node node1 , node1 is power off and VM is moved to node2, that's ok.

But now my problem, is when I unplug the net cable or power cord from node1 (where is VM=100 running) VM is not migrated to the node2 until I plug again the cables.

In syslog I have some fencing errors:

Code:
[COLOR=#000000][FONT=tahoma]Oct 14 13:49:40 node1 qdiskd[2696]: Assuming master role[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:49:41 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]qdiskd[2696]: Writing eviction notice for node 2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:49:42 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]qdiskd[2696]: Node 2 evicted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:14 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [TOTEM ] A processor failed, forming new configuration.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] CLM CONFIGURATION CHANGE[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] New Configuration:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] #011r(0) ip(192.168.150.34) [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Left:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] #011r(0) ip(192.168.150.44) [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Joined:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [QUORUM] Members[1]: 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] CLM CONFIGURATION CHANGE[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] New Configuration:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] #011r(0) ip(192.168.150.34) [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Left:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Joined:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]rgmanager[3139]: State change: [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]DOWN[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.150.34) ; members(old:2 left:1)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]pmxcfs[2452]: [dcdb] notice: members: 1/2452[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]pmxcfs[2452]: [dcdb] notice: members: 1/2452[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]kernel: dlm: closing connection to node 2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [MAIN  ] Completed service synchronization, ready to provide service.[/FONT][/COLOR]
[B][COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fencing node [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Parse error: Ignoring unknown option 'nodename=[/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:36 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Connection timed out[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:36 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]dev 0.0 agent fence_ipmilan result: error from agent[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:36 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]failed[/FONT][/COLOR][/B]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:39 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fencing node [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:40 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Parse error: Ignoring unknown option 'nodename=[/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:51:00 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Connection timed out[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:51:00 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]dev 0.0 agent fence_ipmilan result: error from agent[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:51:00 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence node2 failed[/FONT][/COLOR]

Is to be supposed, that when a node have a disconnection from the net or the electrical, to migrate the VM to live node?
 
Ask yourself: How should fencing work if you unplug the fencing device?

==> this will never work, by design.
 
Ask yourself: How should fencing work if you unplug the fencing device?

==> this will never work, by design.

Thanks for the answe Dietmar, I'm just learning by myself in this.

I read in https://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster#Testing about these kind of test and I was thinking that VM auto migrate with an uncontrolled failure.

So my question is: Is it possible auto migrate a VM to the node2 when node1 has a crash withouth a manual action? .




Hope you can help me.
Best regards.
 
Last edited:
So my question is if it's possible when a VM is running in a node have a failure, auto migrate a VM to another live node withouth a manual action.

Sure, you just need a suitable fencing device (for example power based fencing devices).
 
Fencing with idrac in Dell servers is able to auto-migrate VM to another live node?