Relax-NG validity error -> cluster.conf (Fencing/HA)

mabe

New Member
Jun 9, 2014
4
0
1
Hello,

I want configure an HA proxmox cluster. I have an error when i check the new configuration, i don't understand why and how to solve this. Have you an idea please?

Error :

pmox1:~#:/etc/cluster# ccs_config_validate -v -f /etc/pve/cluster.conf
Creating temporary file: /tmp/tmp.mUVTuMRhlM
Config interface set to:
Configuration stored in temporary file
Updating relaxng schema
Validating..
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:6: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:23: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:30: element device: validity error : IDREF attribute name references an unknown ID "hirru"

Configuration fails to validate
Validation completed

After "service pve-cluster stop" :
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Relax-NG validity error : Extra element fencedevices in interleave
tempfile:4: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:20: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:27: element device: validity error : IDREF attribute name references an unknown ID "hirru"

Configuration fails to validate
[ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]

Version :
pmox1:~# pveversion
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)

Nodes :
pmox1:~# ccs_tool lsnode -v

Cluster name: cluster-fm-ha-1, config_version: 30

Nodename Votes Nodeid Fencetype
ns6412076 1 1 bat
Fence properties: action=off
ns6407164 1 2 bi
Fence properties: action=off
ns6407163 1 3 hirru
Fence properties: action=off


Status :
pmox1:~# pvecm status
Version: 6.2.0
Config Version: 21
Cluster Name: cluster-fm-ha-1
Cluster Id: 31167
Cluster Member: Yes
Cluster Generation: 504
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: ns6407164
Node ID: 2
Multicast addresses: 255.255.255.255
Node addresses: 172.16.0.101


My /etc/pve/cluster.conf :
Code:
<?xml version="1.0"?>
<cluster name="cluster-fm-ha-1" config_version="30">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"></cman>

  <fencedevices>
         <fencedevice agent="fence_ovh" name="bat" email="ddd@ddd.com"  ipaddr="nsxxx76" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
         <fencedevice agent="fence_ovh" name="bi" email="ddd@ddd.com"  ipaddr="nsxxx64" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
         <fencedevice agent="fence_ovh" name="hirru" email="ddd@ddd.com"  ipaddr="nsxxx63" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
  </fencedevices>

  <clusternodes>
        <clusternode name="nsxxx76" votes="1" nodeid="1">
                <fence>
                        <method name="1">
                                <device name="bat" action="off"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name="nsxxx64" votes="1" nodeid="2">
                <fence>
                        <method name="1">
                                <device name="bi" action="off"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name=nsxxx63" votes="1" nodeid="3">
                <fence>
                        <method name="1">
                                <device name="hirru" action="off"/>
                        </method>
                </fence>
        </clusternode>
  </clusternodes>
</cluster>

fence_ovh script (http://forum.proxmox.com/threads/11066-Proxmox-HA-Cluster-at-OVH-Fencing?p=75152#post75152) :
Code:
pmox1:~#cat /usr/sbin/fence_ovh
#!/usr/bin/python
# assemled by Dennis Busch, secofor GmbH,
# Germany
# This work is licensed under a
# Creative Commons Attribution-ShareAlike 3.0 Unported License.
 
# Manual call parametres example
#
# login=ab12345-ovh
# passwd=MYSECRET
# email=admin@myadmin
# ipaddr=ns12345
# action=off
 
# # where ipaddr is your server's OVH name
 
import sys, re, pexpect
sys.path.append("/usr/share/fence")
from fencing import *
 
import sys
from suds.client import Client
from suds.xsd.doctor import ImportDoctor, Import
import time
from datetime import datetime
 
OVH_RESCUE_PRO_NETBOOT_ID='28'
OVH_HARD_DISK_NETBOOT_ID='1'
STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot
STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run
OVH_FENCE_DEBUG=False # True or False for debug
 
def netboot_reboot(nodeovh,login,passwd,email,mode):
    imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
    url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
    imp.filter.add('http://soapi.ovh.com/manager')
    d = ImportDoctor(imp)
    soap = Client(url, doctor=d)
    session = soap.service.login(login, passwd, 'es', 0)
 
    #dedicatedNetbootModifyById changes the mode of the next reboot
    result = soap.service.dedicatedNetbootModifyById(session, nodeovh, mode, '', email)
 
    #dedicatedHardRebootDo initiates a hard reboot on the given node
    soap.service.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es')
 
    soap.service.logout(session)
 
def reboot_status(nodeovh,login,passwd):
    imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
    url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
    imp.filter.add('http://soapi.ovh.com/manager')
    d = ImportDoctor(imp)
    soap = Client(url, doctor=d)
    session = soap.service.login(login, passwd, 'es', 0)
 
    result = soap.service.dedicatedHardRebootStatus(session, nodeovh)
    tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S')
    tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S')
    result.start = tmpstart
    result.end = tmpend
 
    soap.service.logout(session)
    return result
 
#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog
 
global all_opt
 
device_opt = [  "email", "ipaddr", "action" , "login" , "passwd" , "nodename" ]
 
ovh_fence_opt = {
        "email" : {
                "getopt" : "Z:",
                "longopt" : "email",
                "help" : "-Z, --email=<email>          email for reboot message: admin@domain.com",
                "required" : "1",
                "shortdesc" : "Reboot email",
                "default" : "",
                "order" : 1 },
}
 
all_opt.update(ovh_fence_opt)
all_opt["ipaddr"]["shortdesc"] = "OVH node name"
 
atexit.register(atexit_handler)
options=check_input(device_opt,process_input(device_opt))
# Not sure if I need this old notation
## Support for -n [switch]:[plug] notation that was used before
if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))):
    (switch, plug) = options["-n"].split(":", 1)
    if ((switch.isdigit()) and (plug.isdigit())):
        options["-s"] = switch
        options["-n"] = plug
 
if (not (options.has_key("-s"))):
    options["-s"]="1"
 
docs = { }
docs["shortdesc"] = "Fence agent for OVH"
docs["longdesc"] = "fence_ovh is an Power Fencing agent \
which can be used within OVH datecentre. \
Poweroff is simulated with a reboot into rescue-pro \
mode. \
 /usr/local/etc/ovhsecret example: \
 \
 [OVH] \
 Login = ab12345-ovh \
 Passwd = MYSECRET \
"
docs["vendorurl"] = "http://www.ovh.net"
show_docs(options, docs)
 
 
#I use a own logfile for debugging purpose
if OVH_FENCE_DEBUG:
    logfile=open("/var/log/fence_ovh.log", "a");
    logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n"))
    logfile.write("Parameter:\t")
    for val in sys.argv:
    logfile.write(val + " ")
    logfile.write("\n")
 
print options
 
action=options['--action']
email=options['--email']
login=options['--username']
passwd=options['--password']
nodeovh=options['--ip']
if nodeovh[-8:] != '.ovh.net':
    nodeovh += '.ovh.net'
    
# Save datetime just before changing netboot
before_netboot_reboot = datetime.now()
 
if action == 'off':
    netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro
elif action == 'on':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
elif action == 'reboot':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("nothing to do\n")
    logfile.close()
    errlog.close()
    sys.exit()
 
if action == 'off':
    time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM
elif action == 'on':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
elif action == 'reboot':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("No sense! Check script please!\n")
    logfile.close()
    errlog.close()
    
    sys.exit()
 
after_netboot_reboot = datetime.now()
 
# Verification of success
 
reboot_start_end=reboot_status(nodeovh,login,passwd)
if OVH_FENCE_DEBUG:
    logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
 
if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)):
    if OVH_FENCE_DEBUG:
    logfile.write("Netboot reboot went OK.\n")
else:
    if OVH_FENCE_DEBUG:
    logfile.write("ERROR: Netboot reboot wasn't OK.\n")
    logfile.close()
    errlog.close()
    sys.exit(1)
 
 
if OVH_FENCE_DEBUG:
    logfile.close()
errlog.close()

From web interface :
From pveproxy web interface on pmox1, from HA tab, when i add a "HA Managed VM/CT" and applicate i have an error : "config validation failed: unknown error (500)"



Thanks all,Regards,Maxime.
 
Last edited:
Hello,

Additional information if you need :


pveversion :
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

cman_tool status :
Version: 6.2.0
Config Version: 30
Cluster Name: cluster-fm-ha-1
Cluster Id: 31167
Cluster Member: Yes
Cluster Generation: 524
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: ns6407164
Node ID: 2
Multicast addresses: 255.255.255.255
Node addresses: 172.16.0.102

Regards,

Maxime.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!