HA cluster fencing problem

Captain · Feb 20, 2016

Hello!

I have a problem with my HA cluster with DRBD. I have two nodes (Intel Servers with Intel BMC).
When I do:
fence_node node1
It is reset and VM restart on node2 - it is OK and work

But if I unplug network cable from node1 :

Code:

fence_tool ls

fence domain
member count  2
victim count  1
victim now    1
master nodeid 1
wait state    fencing
members      1 2

And nothing happens. VM still offline and on node1 I see:

Code:

INFO: task rgmanager: 19170 blocked for more than 120 seconds.
Tainted: P                  ------------------- 2.6.32-39-pve #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

When I am try to reboot this node1, rebooting process stop on Stopping Cluster Service Manager
To reboot thin node need to reset server.

My configurations:

Code:

<?xml version="1.0"?>
<cluster config_version="13" name="cluster0">
  <cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.200.11" login="root" name="ipmi1" passwd="mypass"/>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.200.21" login="root" name="ipmi2" passwd="mypass"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="wcat1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="wcat2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="101"/>
  </rm>
</cluster>

Code:

proxmox-ve-2.6.32: 3.4-156 (running kernel: 2.6.32-39-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-39-pve: 2.6.32-157
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-5
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-21
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

How to solve this problem?
Thank you.

Captain · Feb 21, 2016

Someone can help in this question?
May I do something wrong with IPMI, because it is connected over LAN.

Can someone explain how work fencing if on one node power fail or network fail.

Big thanks.

sdinet · Feb 24, 2016

If one node's network fails, the other nodes will realize this and send a 'kill' signal over the IPMI on the faulty node. If the entire node's power fails, then the HA software will wait forever to complete a fencing action, because the IPMI will never respond, because it has no power. A better way to do fencing is to use the power supply to kill the power to your faulty node.

sdinet · Mar 1, 2016

Where did you find that XML file? In my proxmox, /etc/pve/cluster.conf does not exist.

Captain · Mar 1, 2016

Hello!
Thank you for answer.
XML works for HA configuration.
It is not a xml file, file is cluster.conf, but it is use XML inside.

Ok one more question about you say " A better way to do fencing is to use the power supply to kill the power to your faulty node."
Can you explain how it can be realized?

adamb · Mar 2, 2016

Captain said:
Hello!
Thank you for answer.
XML works for HA configuration.
It is not a xml file, file is cluster.conf, but it is use XML inside.

Ok one more question about you say " A better way to do fencing is to use the power supply to kill the power to your faulty node."
Can you explain how it can be realized?

The easiest way I know of is to use a switched PDU. That way the power outlets can be turned on/off over the network. I know APC has a few models of switched pdu's that work well.

sdinet · Mar 2, 2016

adamb said:
The easiest way I know of is to use a switched PDU. That way the power outlets can be turned on/off over the network. I know APC has a few models of switched pdu's that work well.

yes this

Search

Search

HA cluster fencing problem

Captain

New Member

Captain

New Member

sdinet

Member

sdinet

Member

Captain

New Member

adamb

Famous Member

sdinet

Member

We value your privacy