HA cluster fencing problem

Captain

New Member
Feb 20, 2016
3
1
3
41
Hello!

I have a problem with my HA cluster with DRBD. I have two nodes (Intel Servers with Intel BMC).
When I do:
fence_node node1
It is reset and VM restart on node2 - it is OK and work

But if I unplug network cable from node1 :
Code:
fence_tool ls

fence domain
member count  2
victim count  1
victim now    1
master nodeid 1
wait state    fencing
members      1 2

And nothing happens. VM still offline and on node1 I see:
Code:
INFO: task rgmanager: 19170 blocked for more than 120 seconds.
Tainted: P                  ------------------- 2.6.32-39-pve #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

When I am try to reboot this node1, rebooting process stop on Stopping Cluster Service Manager
To reboot thin node need to reset server.

My configurations:
Code:
<?xml version="1.0"?>
<cluster config_version="13" name="cluster0">
  <cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.200.11" login="root" name="ipmi1" passwd="mypass"/>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.200.21" login="root" name="ipmi2" passwd="mypass"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="wcat1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="wcat2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="101"/>
  </rm>
</cluster>

Code:
proxmox-ve-2.6.32: 3.4-156 (running kernel: 2.6.32-39-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-39-pve: 2.6.32-157
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-5
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-21
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

How to solve this problem?
Thank you.
 
Last edited:
Someone can help in this question?
May I do something wrong with IPMI, because it is connected over LAN.

Can someone explain how work fencing if on one node power fail or network fail.

Big thanks.
 
If one node's network fails, the other nodes will realize this and send a 'kill' signal over the IPMI on the faulty node. If the entire node's power fails, then the HA software will wait forever to complete a fencing action, because the IPMI will never respond, because it has no power. A better way to do fencing is to use the power supply to kill the power to your faulty node.
 
Hello!
Thank you for answer.
XML works for HA configuration.
It is not a xml file, file is cluster.conf, but it is use XML inside.

Ok one more question about you say " A better way to do fencing is to use the power supply to kill the power to your faulty node."
Can you explain how it can be realized?
 
  • Like
Reactions: sdinet
Hello!
Thank you for answer.
XML works for HA configuration.
It is not a xml file, file is cluster.conf, but it is use XML inside.

Ok one more question about you say " A better way to do fencing is to use the power supply to kill the power to your faulty node."
Can you explain how it can be realized?

The easiest way I know of is to use a switched PDU. That way the power outlets can be turned on/off over the network. I know APC has a few models of switched pdu's that work well.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!