Two-Node HA w/ DRBD cluster.conf error.

Asgard

New Member
Feb 17, 2012
9
0
1
Ok I have setup two servers with proxmox 2.0-38. I am currently having some problems setting up the HA right.
Here are the wiki's that I followed to make these two servers work so far.

http://pve.proxmox.com/wiki/DRBD
http://pve.proxmox.com/wiki/Fencing

The problem that I am currently faced with is the cluster.conf file is not correct. I found one of your wikis telling me to put this line in
<cman two_node="1" expected_votes="1"> </cman>
from this wiki http://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster
I put it in and this error pops up I have not been around code enough to know what tag I need.
mismatched tag at line 31, column 2, byte 1061 at /usr/lib/perl5/XML/Parser.pm line 187 (500)

Here is my conf file without the names and passwords

<?xml version="1.0"?>
<cluster config_version="19" name="<cluster_name>">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
<cman two_node="1" expected_votes="1"> </cman>
<fencedevices>
<fencedevice agent="fence_ilo" hostname="<Full_Domain_NameA>" login="Administrator" name="<iloA>" passwd="<password>"/>
<fencedevice agent="fence_ilo" hostname="<Full_Domain_NameB>" login="Administrator" name="<iloB>" passwd="<password>"/>
</fencedevices>
<clusternodes>
<clusternode name="<NodeA>" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="<iloA>" action="reboot"/>
</method>
</fence>
</clusternode>
<clusternode name="<NodeB>" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="<iloB>" action="reboot"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
<ip address="192.168.7.180"/>
</service>
</rm>
</cluster>

sorry for some reason i can't get the spacing to come out right in this post so, here is a txt with the config in it.


I know I am probably missing a space or something small like that. Before I put that line in the file I was getting the unknown error 500. when I activate any changes in to the web UI.
 

Attachments

  • cluster.conf.txt
    1 KB · Views: 14
I'm not sure whether what you has will work from an HA point-of-view, I haven't messed with that under ProxMox yet, however you have some fundamental XML errors...

Try changing the two <cman> lines above to just ONE cman line as follows...

Code:
<cman keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1" expected_votes="1"></cman>

And see if that fixes your problem... There may be other errors, but it certainly won't work with those cman lines as is.
 
Yeah I don't mess with a lot of code like this if you can't tell and, yes that fix one of the problems but I am still getting this error.

config validation failed: unknown error (500)
 
I have been trouble shooting this problem for a while now I am now getting this error
Code:
root@node1:~# fence_ilo -l <ilo_username> -p <ilo_password> -a <ilo_ip> -o status -v -z
Unable to connect/login to fencing device
Can anyone tell me how I could fix this? HA is the only thing I can't seem to get to work on these servers.
 
Hi,

I just started today with a DRBD-based HA setup under Proxmox 2.1. I also followed the two URL's you specified earlier in the thread and ran into a similar problem which I also worked out with the XML formats.

But to get to your query below...

I have been trouble shooting this problem for a while now I am now getting this error
Code:
root@node1:~# fence_ilo -l <ilo_username> -p <ilo_password> -a <ilo_ip> -o status -v -z
Unable to connect/login to fencing device
Can anyone tell me how I could fix this? HA is the only thing I can't seem to get to work on these servers.

I get the same result from fence_ilo when used with two iLO3-based servers. There is another fence tool, called fence_ilo_mp which is also supplied, and with that tool I'm able to use the ssh login (-x) to get a "STATUS: ON" from querying other lower version iLO's (iLO 1 f.e.). Running the same command on iLO3 is no go.

I've found that the wiki docs aren't as good as they should be ie. they give you a good start but there's bits missing that you have to work on and trouble-shoot.

I'm still struggling with the "config validation failed: unknown error (500)" error, which is how I got to this forum thread, just can't find any more detail/debug as to why the config validation fails.

It's hard to say if even the fence_ilo or fence_ilo_mp test tools work with iLO3.

I'm currently wondering if I can use IPMI or an alternative fencing facility for the ProLiant G7 servers if iLO3 is a no go, but I think for that I need the HP ProLiant Support Pack which isn't part of the Proxmox 2.1 ISO?
 
Well I don't know what to tell you about the "config validation failed: unknown error (500)" Other then that somewhere in your config something is wrong. The two servers that i was building are now production servers and we are not using HA at all. One of the other possibilities that I gave to my higher ups was to use fence_human I will make another post more about that. For the HP ProLiant Support Pack this might save you sometime. I don't know if you have the same servers that I do but here are the steps that I used to install HP ProLiant Support Pack.

1. You will need to get the software. (Which is in the software share in a iso called HP_ProLiant_Value_Add_Software-8.70-10-6.iso or you can download it here http://h20000.www2.hp.com/bizsuppor...sId=4132832&swLang=8&taskId=135&swEnvOID=4123 it is the second part of the Software - Systems Management.)

2. You need to mount the iso which can be done with this command
mount -t iso9660 /dev/scd1 /mnt
If you have the iso mounted threw ILO

3. cd
cd /mnt/pool/non-free/

4. Then run
dpkg --install hpsmh_6.0.0-97_amd64.deb cpqacuxe_8.70-9.0.7-8_amd64.deb hpacucli_8.70-8.0.2-2_amd64.deb hp-health_8.7.0.1.2-5_amd64.deb hponcfg_3.1.1.0.2-2_amd64.deb hp-snmp-agents_8.7.0.1.7-9_amd64.deb hp-smh-templates_8.7.0.1.4-4_all.deb

5. Next setup a config file
/sbin/hpsnmpconfig

6. Now it is time to get all of the depends
apt-get -f install

7. If the web interface is not working (IP_OF_COMPUTER:2381) run this command to restart all of the services
service hp-snmp-agents restart
Now go back to the website and it should work now.
NOTE make sure when you install these programs that you read and do everything that it tells you to. For example on one of them it will ask you for a config file or if you want to load in default values you do not want to load in default values. You want to hit N and go through the questions. If you do not the snmp will not be configured correctly unless you have the right config file.
 
Hi,

Thank you for your reply. I was able to figure out (using that ccs_config_validate tool) that an extra "element" was the problem in the fencedevices area. After a process of elimination it turned out that the "hostname" element was not accepted. Removing that from:

<fencedevice agent="fence_ilo" login="Administrator" name="<iloA>" passwd="<password>"/>

made it work.

In regards to the HP PSP pack for deb, I will follow your instructions so many thanks for spending the time typing them out.

Well I don't know what to tell you about the "config validation failed: unknown error (500)" Other then that somewhere in your config something is wrong. The two servers that i was building are now production servers and we are not using HA at all. One of the other possibilities that I gave to my higher ups was to use fence_human I will make another post more about that.

I did read in another post in the forums about a "fence_human" maybe that was your post? I did try that setup and performing a live migration from one proxmox node to another for a KVM VM happened in a matter of seconds.

When I changed back to iLO fencing devices, performing live migrations of KVM VM's between proxmox nodes would take about 5 minutes, because an actual copy over the DRBD network would occur between proxmox nodes of the KVM VM, no idea why. Proxmox didn't seem to recognise that there exists DRBD storage (under the drbdvg I created) and seemed to just copy from the DRBD storage to the same DRBD storage. Made no sense.

Have checked the setup again, going through the Wiki for DRBD and two-node HA again, nothing I can see is at fault so no idea why Proxmox doesn't recognise the same DRBDVG has the same VM on it.

Searching the forums so far hasn't turned up people discussing any similar problems.

Still working at it so hopefully can find why Proxmox does that. Thanks again for your assistance.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!