Relax-NG validity error -> cluster.conf (Fencing/HA)

mabe

New Member
Jun 9, 2014
4
0
1
Hello,

I want configure an HA proxmox cluster. I have an error when i check the new configuration, i don't understand why and how to solve this. Have you an idea please?

Error :

pmox1:~#:/etc/cluster# ccs_config_validate -v -f /etc/pve/cluster.conf
Creating temporary file: /tmp/tmp.mUVTuMRhlM
Config interface set to:
Configuration stored in temporary file
Updating relaxng schema
Validating..
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:6: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:23: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:30: element device: validity error : IDREF attribute name references an unknown ID "hirru"

Configuration fails to validate
Validation completed

After "service pve-cluster stop" :
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Relax-NG validity error : Extra element fencedevices in interleave
tempfile:4: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:20: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:27: element device: validity error : IDREF attribute name references an unknown ID "hirru"

Configuration fails to validate
[ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]

Version :
pmox1:~# pveversion
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)

Nodes :
pmox1:~# ccs_tool lsnode -v

Cluster name: cluster-fm-ha-1, config_version: 30

Nodename Votes Nodeid Fencetype
ns6412076 1 1 bat
Fence properties: action=off
ns6407164 1 2 bi
Fence properties: action=off
ns6407163 1 3 hirru
Fence properties: action=off


Status :
pmox1:~# pvecm status
Version: 6.2.0
Config Version: 21
Cluster Name: cluster-fm-ha-1
Cluster Id: 31167
Cluster Member: Yes
Cluster Generation: 504
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: ns6407164
Node ID: 2
Multicast addresses: 255.255.255.255
Node addresses: 172.16.0.101


My /etc/pve/cluster.conf :
Code:
<?xml version="1.0"?>
<cluster name="cluster-fm-ha-1" config_version="30">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"></cman>

  <fencedevices>
         <fencedevice agent="fence_ovh" name="bat" email="ddd@ddd.com"  ipaddr="nsxxx76" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
         <fencedevice agent="fence_ovh" name="bi" email="ddd@ddd.com"  ipaddr="nsxxx64" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
         <fencedevice agent="fence_ovh" name="hirru" email="ddd@ddd.com"  ipaddr="nsxxx63" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
  </fencedevices>

  <clusternodes>
        <clusternode name="nsxxx76" votes="1" nodeid="1">
                <fence>
                        <method name="1">
                                <device name="bat" action="off"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name="nsxxx64" votes="1" nodeid="2">
                <fence>
                        <method name="1">
                                <device name="bi" action="off"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name=nsxxx63" votes="1" nodeid="3">
                <fence>
                        <method name="1">
                                <device name="hirru" action="off"/>
                        </method>
                </fence>
        </clusternode>
  </clusternodes>
</cluster>

fence_ovh script (http://forum.proxmox.com/threads/11066-Proxmox-HA-Cluster-at-OVH-Fencing?p=75152#post75152) :
Code:
pmox1:~#cat /usr/sbin/fence_ovh
#!/usr/bin/python
# assemled by Dennis Busch, secofor GmbH,
# Germany
# This work is licensed under a
# Creative Commons Attribution-ShareAlike 3.0 Unported License.
 
# Manual call parametres example
#
# login=ab12345-ovh
# passwd=MYSECRET
# email=admin@myadmin
# ipaddr=ns12345
# action=off
 
# # where ipaddr is your server's OVH name
 
import sys, re, pexpect
sys.path.append("/usr/share/fence")
from fencing import *
 
import sys
from suds.client import Client
from suds.xsd.doctor import ImportDoctor, Import
import time
from datetime import datetime
 
OVH_RESCUE_PRO_NETBOOT_ID='28'
OVH_HARD_DISK_NETBOOT_ID='1'
STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot
STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run
OVH_FENCE_DEBUG=False # True or False for debug
 
def netboot_reboot(nodeovh,login,passwd,email,mode):
    imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
    url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
    imp.filter.add('http://soapi.ovh.com/manager')
    d = ImportDoctor(imp)
    soap = Client(url, doctor=d)
    session = soap.service.login(login, passwd, 'es', 0)
 
    #dedicatedNetbootModifyById changes the mode of the next reboot
    result = soap.service.dedicatedNetbootModifyById(session, nodeovh, mode, '', email)
 
    #dedicatedHardRebootDo initiates a hard reboot on the given node
    soap.service.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es')
 
    soap.service.logout(session)
 
def reboot_status(nodeovh,login,passwd):
    imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
    url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
    imp.filter.add('http://soapi.ovh.com/manager')
    d = ImportDoctor(imp)
    soap = Client(url, doctor=d)
    session = soap.service.login(login, passwd, 'es', 0)
 
    result = soap.service.dedicatedHardRebootStatus(session, nodeovh)
    tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S')
    tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S')
    result.start = tmpstart
    result.end = tmpend
 
    soap.service.logout(session)
    return result
 
#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog
 
global all_opt
 
device_opt = [  "email", "ipaddr", "action" , "login" , "passwd" , "nodename" ]
 
ovh_fence_opt = {
        "email" : {
                "getopt" : "Z:",
                "longopt" : "email",
                "help" : "-Z, --email=<email>          email for reboot message: admin@domain.com",
                "required" : "1",
                "shortdesc" : "Reboot email",
                "default" : "",
                "order" : 1 },
}
 
all_opt.update(ovh_fence_opt)
all_opt["ipaddr"]["shortdesc"] = "OVH node name"
 
atexit.register(atexit_handler)
options=check_input(device_opt,process_input(device_opt))
# Not sure if I need this old notation
## Support for -n [switch]:[plug] notation that was used before
if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))):
    (switch, plug) = options["-n"].split(":", 1)
    if ((switch.isdigit()) and (plug.isdigit())):
        options["-s"] = switch
        options["-n"] = plug
 
if (not (options.has_key("-s"))):
    options["-s"]="1"
 
docs = { }
docs["shortdesc"] = "Fence agent for OVH"
docs["longdesc"] = "fence_ovh is an Power Fencing agent \
which can be used within OVH datecentre. \
Poweroff is simulated with a reboot into rescue-pro \
mode. \
 /usr/local/etc/ovhsecret example: \
 \
 [OVH] \
 Login = ab12345-ovh \
 Passwd = MYSECRET \
"
docs["vendorurl"] = "http://www.ovh.net"
show_docs(options, docs)
 
 
#I use a own logfile for debugging purpose
if OVH_FENCE_DEBUG:
    logfile=open("/var/log/fence_ovh.log", "a");
    logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n"))
    logfile.write("Parameter:\t")
    for val in sys.argv:
    logfile.write(val + " ")
    logfile.write("\n")
 
print options
 
action=options['--action']
email=options['--email']
login=options['--username']
passwd=options['--password']
nodeovh=options['--ip']
if nodeovh[-8:] != '.ovh.net':
    nodeovh += '.ovh.net'
    
# Save datetime just before changing netboot
before_netboot_reboot = datetime.now()
 
if action == 'off':
    netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro
elif action == 'on':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
elif action == 'reboot':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("nothing to do\n")
    logfile.close()
    errlog.close()
    sys.exit()
 
if action == 'off':
    time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM
elif action == 'on':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
elif action == 'reboot':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("No sense! Check script please!\n")
    logfile.close()
    errlog.close()
    
    sys.exit()
 
after_netboot_reboot = datetime.now()
 
# Verification of success
 
reboot_start_end=reboot_status(nodeovh,login,passwd)
if OVH_FENCE_DEBUG:
    logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
 
if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)):
    if OVH_FENCE_DEBUG:
    logfile.write("Netboot reboot went OK.\n")
else:
    if OVH_FENCE_DEBUG:
    logfile.write("ERROR: Netboot reboot wasn't OK.\n")
    logfile.close()
    errlog.close()
    sys.exit(1)
 
 
if OVH_FENCE_DEBUG:
    logfile.close()
errlog.close()

From web interface :
From pveproxy web interface on pmox1, from HA tab, when i add a "HA Managed VM/CT" and applicate i have an error : "config validation failed: unknown error (500)"



Thanks all,Regards,Maxime.
 
Last edited:
Hello,

Additional information if you need :


pveversion :
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

cman_tool status :
Version: 6.2.0
Config Version: 30
Cluster Name: cluster-fm-ha-1
Cluster Id: 31167
Cluster Member: Yes
Cluster Generation: 524
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: ns6407164
Node ID: 2
Multicast addresses: 255.255.255.255
Node addresses: 172.16.0.102

Regards,

Maxime.