Fencing problem. fence_ipmilan: Failed

megap

New Member
Oct 1, 2014
20
0
1
Good morning to all.

I'm configuring a cluster with two nodes and a quorum disk and I my fencing is not working.

I'm using fence_ipmilan but I have problems. In each node I configured ipmitool and tested and it's working:

Code:
root@node1:~# ipmitool -I lanplus -H 192.168.150.43 -U user -P pass -v chassis power status
Chassis Power is on.

root@node2:~# ipmitool -I lanplus -H 192.168.150.33 -U user -P pass -v chassis power status
Chassis Power is on

If I launch fence_node for test the fencing I get an error in each node:

Code:
root@node1:~# fence_node node1-vv
fence node1 dev 0.0 agent fence_ipmilan result: error from agent
agent args: action=reboot nodename=gestion1 agent=fence_ipmilan ipaddr=192.168.150.33 login=user passwd=pass power_wait=60
fence node1 failed

root@node2:~# fence_node node2 -vv
fence gestion1 dev 0.0 agent fence_ipmilan result: error from agent
agent args: action=reboot nodename=node2 agent=fence_ipmilan ipaddr=192.168.150.43 login=user passwd=user power_wait=60
fence node2 failed

This is my cluster.conf:

Code:
<?xml version="1.0"?>
<cluster config_version="16" name="clustergestion">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" name="fenceGestion1" ipaddr="192.168.150.33" login="user" passwd="pass" power_wait="60"/>
    <fencedevice agent="fence_ipmilan" name="fenceGestion2" ipaddr="192.168.150.43" login="user" passwd="pass" power_wait="60"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

fence_ipmilan test fails too:

Code:
root@node2:~# fence_ipmilan -l user -p pass -a 192.168.150.33 -o reboot -vv
INFO:root:Delay 0 second(s) before logging in to the fence device
INFO:root:Executing: /usr/bin/ipmitool -I lan -H 192.168.150.33 -U user -P pass [B]-C 0 -p 623[/B] -L ADMINISTRATOR chassis power status


DEBUG:root:1  Get Session Challenge command failed
Error: Unable to establish LAN session
Unable to get Chassis Power Status




ERROR:root:Failed: Unable to obtain correct plug status or plug is not available

Maybe is some problem with -C 0 -p 623, but I don't know what is this valors:

Code:
root@node2:~# /usr/bin/ipmitool -I lanplus -H 192.168.150.33 -U user -P user -L ADMINISTRATOR chassis power status
Chassis Power is on


My cluster status:

Code:
root@node2:~# clustat
Cluster Status for clustergestion @ Tue Oct 14 11:15:22 2014Member Status: Quorate


 Member Name                                                ID   Status
 ------ ----                                                      ----        ------
 node1                                                       1 Online, Local, rgmanager
 node2                                                       2 Online, rgmanager
 /dev/block/8:33                                            0 Online, Quorum Disk


 Service Name                                      Owner (Last)                                      State
 ------- ----                                      ----- ------                                      -----
 pvevm:100                                         node2                                          started

Proxmox version on node1 and node2:

Code:
root@gestion2:~# pveversion --verbose
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.3-1 (running version: 3.3-1/a06c9f73)
pve-kernel-2.6.32-32-pve: 2.6.32-136
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-23
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1


¿Somebody can help me?

Best regards.
 
Seems it works if you use '-I lanplus' on the command line. The correspondig property in cluster.conf
is:

lanplus="1"

Or use the CLI with -P flag

# fence_ipmilan -P ...

see

# man fence_ipmilan
 
I changed my cluster.conf to:

Code:
<?xml version="1.0"?>
<cluster config_version="17" name="clustergestion">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" [B]lanplus="1"[/B] name="fenceGestion1" ipaddr="192.168.150.33" login="user" passwd="pass" power_wait="60"/>
    <fencedevice agent="fence_ipmilan" [B]lanplus="1"[/B] name="fenceGestion2" ipaddr="192.168.150.43" login="user" passwd="pass" power_wait="60"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceGestion2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

If I lauch fence_node still not working:
Code:
root@node1:~# fence_node node2 -vv
fence node2 dev 0.0 agent fence_ipmilan result: error from agent
agent args: action=reboot nodename=node2 agent=fence_ipmilan lanplus=1 ipaddr=192.168.150.43 login=user passwd=pass power_wait=60
fence node2 failed

Another fail test with fence_ipmilan:

Code:
root@node1:~# fence_ipmilan -P -l user -p pass -a 192.168.150.43 -o reboot -vv
INFO:root:Delay 0 second(s) before logging in to the fence device
INFO:root:Executing: /usr/bin/ipmitool -I lanplus -H 192.168.150.43 -U user -P pass -C 0 -p 623 -L ADMINISTRATOR chassis power status


DEBUG:root:1  Error in open session response message : no matching cipher suite


Error: Unable to establish IPMI v2 / RMCP+ session
Unable to get Chassis Power Status
ERROR:root:Failed: Unable to obtain correct plug status or plug is not available

Code:
root@node1:~# ipmitool -I lanplus -H 192.168.150.43 -U user -P user [B]-C 0 -p 623[/B] -L ADMINISTRATOR chassis power status
Error in open session response message : no matching cipher suite


Error: Unable to establish IPMI v2 / RMCP+ session
Unable to get Chassis Power Status

The same test withouth -C 0 -p 623 is working:

Code:
root@node1:~# ipmitool -I lanplus -H 192.168.150.43 -U user -P user -L ADMINISTRATOR chassis power status  
Chassis Power is on

Two nodes are joined to fence:

Code:
root@node1:~# fence_tool lsfence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2

Some ideas?

Thanks in advance!
 
I added cipher="1" on cluster.conf and fence_node node1 -vv is working:

Code:
<?xml version="1.0"?>
<cluster config_version="17" name="clustergestion">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" lanplus="1" [B]cipher="1"[/B] name="fenceGestion1" ipaddr="192.168.150.33" login="user" passwd="pass" power_wait="60"/>
    <fencedevice agent="fence_ipmilan" lanplus="1" [B]cipher="1"[/B] name="fenceGestion2" ipaddr="192.168.150.43" login="user" passwd="pass" power_wait="60"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fenceGestion1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fenceGestion2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

Code:
root@node2:~# fence_node node1 -vv
fence node1 dev 0.0 agent fence_ipmilan result: success
agent args: action=off nodename=node1 agent=fence_ipmilan lanplus=1 cipher=1 ipaddr=192.168.150.33 login=user passwd=user power_wait=60
fence node1 success

When I launch fence_node node1 , node1 is power off and VM is moved to node2, that's ok.

But now my problem, is when I unplug the net cable or power cord from node1 (where is VM=100 running) VM is not migrated to the node2 until I plug again the cables.

In syslog I have some fencing errors:

Code:
[COLOR=#000000][FONT=tahoma]Oct 14 13:49:40 node1 qdiskd[2696]: Assuming master role[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:49:41 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]qdiskd[2696]: Writing eviction notice for node 2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:49:42 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]qdiskd[2696]: Node 2 evicted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:14 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [TOTEM ] A processor failed, forming new configuration.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] CLM CONFIGURATION CHANGE[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] New Configuration:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] #011r(0) ip(192.168.150.34) [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Left:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] #011r(0) ip(192.168.150.44) [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Joined:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [QUORUM] Members[1]: 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] CLM CONFIGURATION CHANGE[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] New Configuration:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] #011r(0) ip(192.168.150.34) [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Left:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CLM   ] Members Joined:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]rgmanager[3139]: State change: [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]DOWN[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.150.34) ; members(old:2 left:1)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]pmxcfs[2452]: [dcdb] notice: members: 1/2452[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]pmxcfs[2452]: [dcdb] notice: members: 1/2452[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]kernel: dlm: closing connection to node 2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]corosync[2638]:   [MAIN  ] Completed service synchronization, ready to provide service.[/FONT][/COLOR]
[B][COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fencing node [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:16 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Parse error: Ignoring unknown option 'nodename=[/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:36 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Connection timed out[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:36 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]dev 0.0 agent fence_ipmilan result: error from agent[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:36 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]failed[/FONT][/COLOR][/B]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:39 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fencing node [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:50:40 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Parse error: Ignoring unknown option 'nodename=[/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:51:00 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fence_ipmilan: Connection timed out[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:51:00 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node2 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]dev 0.0 agent fence_ipmilan result: error from agent[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 14 13:51:00 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]node1 [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]fenced[2854]: fence node2 failed[/FONT][/COLOR]

Is to be supposed, that when a node have a disconnection from the net or the electrical, to migrate the VM to live node?
 
Ask yourself: How should fencing work if you unplug the fencing device?

==> this will never work, by design.
 
Ask yourself: How should fencing work if you unplug the fencing device?

==> this will never work, by design.

Thanks for the answe Dietmar, I'm just learning by myself in this.

I read in https://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster#Testing about these kind of test and I was thinking that VM auto migrate with an uncontrolled failure.

So my question is: Is it possible auto migrate a VM to the node2 when node1 has a crash withouth a manual action? .




Hope you can help me.
Best regards.
 
Last edited:
So my question is if it's possible when a VM is running in a node have a failure, auto migrate a VM to another live node withouth a manual action.

Sure, you just need a suitable fencing device (for example power based fencing devices).
 
Fencing with idrac in Dell servers is able to auto-migrate VM to another live node?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!