two node cluster fence error

gogo3k

New Member
Dec 27, 2014
9
0
1
Hi, I'am just installed proxmox 3.3 on two servers and i want to make ha cluster. I wrote many websites and watched many video tutorails on YT and i have massive problem with fencing. My servers don't have fence devices (i use tyan gt25). I created cluster and works fine i see two nodes from GUI, but when i try to validate cluster.conf i get many errors.
Code:
root@prox1:~# ccs_config_validate -l /etc/pve/cluster.conf.new
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:4: element fencedevices: Relax-NG validity error : Element cluster fail
ed to validate content
tempfile:19: element device: validity error : IDREF attribute name references an
 unknown ID "fenceB"
Configuration fails to validate

This is my cluster.conf.new:

Code:
<?xml version="1.0"?>
<cluster name="test" config_version="5">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1" expected_votes="1"/>


  <fencedevices>
  <fencedevice agent="fence_ilo" ipaddr="192.168.1.121" login="root" password="pass" name="fenceA"/>
  <fencedevice agent="fence_ilo" ipaddr="192.168.1.122" login="root" password="pass" name="fenceB"/>
  </fencedevices>

  

  <clusternodes>
  <clusternode name="prox1" votes="1" nodeid="1">
  
  <fence>
  <method name="1">
  <device name="fenceA" action="reboot"/>
  </method>
  </fence>
  
  </clusternode>

  <clusternode name="prox2" votes="1" nodeid="2">
 
  <fence>
  <method name="1">
  <device name="fenceB" action="reboot"/>
  </method>
  </fence>
  </clusternode>

</clusternodes>
</cluster>
My pveversion:
Code:
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.3-1 (running version: 3.3-1/a06c9f73)
pve-kernel-2.6.32-32-pve: 2.6.32-136
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-23
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
I must made stupid mistake but, i don't see anything. Sorry for my english because I'am learning it now.
 
Last edited:
I suggest you to not use HA on two node cluster.It is better to use manual mode in such setup.
 
Yes i have nodes in hosts file.
@acidrop what do you mean? I need some VM's to work all time long with minimal downtime, and without my actions. VM's must migrate automatic for me.
 
Then use a 3-node cluster! Even if the third node is just a "desktop" machine w/ no VMs running (probably even a Raspberry Pi could be used! but the work required makes a real node less expensive).
With 2 nodes, you risk that both gets fenced or both remain active, depending on the failure. And that means downtime (nodes mutually fenced) or even data loss (VM active on both nodes).
 
In this envirnoment is no place for other machine and all workstations is in use. So i must make 2node HA workable.
 
So there's no 100% reliable solution. See http://en.wikipedia.org/wiki/Two_Generals'_Problem . If you're really lucky, things work as expected. If you're not so lucky, nodes cross-fence and go down at the same time. If you're unlucky, both nodes activate the same VM and you suffer data corruption (that's what you get setting "expected votes = 1"...)!

Try looking at "quorum disk", or try installing a pseudo third node on a raspberry pi. Or, at least, use redundant communication between nodes (e.g via serial port) to minimize the chance of the data corruption scenario.
 
Nodes are conected to switch via lacp, as a storage i have ceph on nodes, so it is a little rendundant. Can someone answer me to problem from first post?
 
Hello gogo3k

Nodes are conected to switch via lacp, as a storage i have ceph on nodes, so it is a little rendundant. Can someone answer me to problem from first post?

The error in your config file is using

Code:
password="pass"

but correct is

Code:
passwd="pass"

But: as already mentioned in the postings before:

* 2-node HA cluster is useless in that form - you must have at least a quorum disk, but better a small third node

* you wrote

My servers don't have fence devices

But HA is not possible without fencing!

Two possibilities:

- simple one: let the 2 node cluster as it is, without HA and simplify your cluster.conf as follows:

Code:
<?xml version="1.0"?>
<cluster name="test" config_version="5">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>

  <clusternodes>
  <clusternode name="prox1" votes="1" nodeid="1">

  </clusternode>

  <clusternode name="prox2" votes="1" nodeid="2">
 
  </clusternode>

</clusternodes>
</cluster>

- professional one: establish a third node, install a fence device, your cluster.conf will look approximately like this:

Code:
<?xml version="1.0"?>
<cluster config_version="19" name="ims-cluster">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ilo" ipaddr="192.168.1.10" login="user" name="fence" passwd="xxxx" power_wait="15"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="prox1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="fence" port="1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="prox2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="fence" port="2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="prox3" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device name="fence" port="3"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
<rm>
   <failoverdomains>
     <failoverdomain name="fodom" ordered ="0" restricted="1" >
      <failoverdomainnode name="prox1" />
      <failoverdomainnode name="prox2" />
     </failoverdomain>
   </failoverdomains>

  
    <pvevm autostart="1" vmid="100" domain="fodom"/>
    <pvevm autostart="1" vmid="101" domain="fodom"/>
    <pvevm autostart="1" vmid="102" domain="fodom"/>
    <pvevm autostart="1" vmid="103" domain="fodom"/>
   
 
   
   
</rm>
</cluster>


Success!

Mr.Holmes
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!