Hello,
I want configure an HA proxmox cluster. I have an error when i check the new configuration, i don't understand why and how to solve this. Have you an idea please?
Error :
pmox1:~#:/etc/cluster# ccs_config_validate -v -f /etc/pve/cluster.conf
Creating temporary file: /tmp/tmp.mUVTuMRhlM
Config interface set to:
Configuration stored in temporary file
Updating relaxng schema
Validating..
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:6: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:23: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:30: element device: validity error : IDREF attribute name references an unknown ID "hirru"
Configuration fails to validate
Validation completed
After "service pve-cluster stop" :
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Relax-NG validity error : Extra element fencedevices in interleave
tempfile:4: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:20: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:27: element device: validity error : IDREF attribute name references an unknown ID "hirru"
Configuration fails to validate
[ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]
Version :
pmox1:~# pveversion
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
Nodes :
pmox1:~# ccs_tool lsnode -v
Cluster name: cluster-fm-ha-1, config_version: 30
Nodename Votes Nodeid Fencetype
ns6412076 1 1 bat
Fence properties: action=off
ns6407164 1 2 bi
Fence properties: action=off
ns6407163 1 3 hirru
Fence properties: action=off
Status :
pmox1:~# pvecm status
Version: 6.2.0
Config Version: 21
Cluster Name: cluster-fm-ha-1
Cluster Id: 31167
Cluster Member: Yes
Cluster Generation: 504
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: ns6407164
Node ID: 2
Multicast addresses: 255.255.255.255
Node addresses: 172.16.0.101
My /etc/pve/cluster.conf :
fence_ovh script (http://forum.proxmox.com/threads/11066-Proxmox-HA-Cluster-at-OVH-Fencing?p=75152#post75152) :
From web interface :
From pveproxy web interface on pmox1, from HA tab, when i add a "HA Managed VM/CT" and applicate i have an error : "config validation failed: unknown error (500)"
Thanks all,Regards,Maxime.
I want configure an HA proxmox cluster. I have an error when i check the new configuration, i don't understand why and how to solve this. Have you an idea please?
Error :
pmox1:~#:/etc/cluster# ccs_config_validate -v -f /etc/pve/cluster.conf
Creating temporary file: /tmp/tmp.mUVTuMRhlM
Config interface set to:
Configuration stored in temporary file
Updating relaxng schema
Validating..
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:6: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:23: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:30: element device: validity error : IDREF attribute name references an unknown ID "hirru"
Configuration fails to validate
Validation completed
After "service pve-cluster stop" :
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Relax-NG validity error : Extra element fencedevices in interleave
tempfile:4: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:20: element device: validity error : IDREF attribute name references an unknown ID "bi"
tempfile:27: element device: validity error : IDREF attribute name references an unknown ID "hirru"
Configuration fails to validate
[ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]
Version :
pmox1:~# pveversion
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
Nodes :
pmox1:~# ccs_tool lsnode -v
Cluster name: cluster-fm-ha-1, config_version: 30
Nodename Votes Nodeid Fencetype
ns6412076 1 1 bat
Fence properties: action=off
ns6407164 1 2 bi
Fence properties: action=off
ns6407163 1 3 hirru
Fence properties: action=off
Status :
pmox1:~# pvecm status
Version: 6.2.0
Config Version: 21
Cluster Name: cluster-fm-ha-1
Cluster Id: 31167
Cluster Member: Yes
Cluster Generation: 504
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: ns6407164
Node ID: 2
Multicast addresses: 255.255.255.255
Node addresses: 172.16.0.101
My /etc/pve/cluster.conf :
Code:
<?xml version="1.0"?>
<cluster name="cluster-fm-ha-1" config_version="30">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"></cman>
<fencedevices>
<fencedevice agent="fence_ovh" name="bat" email="ddd@ddd.com" ipaddr="nsxxx76" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ovh" name="bi" email="ddd@ddd.com" ipaddr="nsxxx64" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ovh" name="hirru" email="ddd@ddd.com" ipaddr="nsxxx63" login="sa57499-ovh" passwd="xxx" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="nsxxx76" votes="1" nodeid="1">
<fence>
<method name="1">
<device name="bat" action="off"/>
</method>
</fence>
</clusternode>
<clusternode name="nsxxx64" votes="1" nodeid="2">
<fence>
<method name="1">
<device name="bi" action="off"/>
</method>
</fence>
</clusternode>
<clusternode name=nsxxx63" votes="1" nodeid="3">
<fence>
<method name="1">
<device name="hirru" action="off"/>
</method>
</fence>
</clusternode>
</clusternodes>
</cluster>
fence_ovh script (http://forum.proxmox.com/threads/11066-Proxmox-HA-Cluster-at-OVH-Fencing?p=75152#post75152) :
Code:
pmox1:~#cat /usr/sbin/fence_ovh
#!/usr/bin/python
# assemled by Dennis Busch, secofor GmbH,
# Germany
# This work is licensed under a
# Creative Commons Attribution-ShareAlike 3.0 Unported License.
# Manual call parametres example
#
# login=ab12345-ovh
# passwd=MYSECRET
# email=admin@myadmin
# ipaddr=ns12345
# action=off
# # where ipaddr is your server's OVH name
import sys, re, pexpect
sys.path.append("/usr/share/fence")
from fencing import *
import sys
from suds.client import Client
from suds.xsd.doctor import ImportDoctor, Import
import time
from datetime import datetime
OVH_RESCUE_PRO_NETBOOT_ID='28'
OVH_HARD_DISK_NETBOOT_ID='1'
STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot
STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run
OVH_FENCE_DEBUG=False # True or False for debug
def netboot_reboot(nodeovh,login,passwd,email,mode):
imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
imp.filter.add('http://soapi.ovh.com/manager')
d = ImportDoctor(imp)
soap = Client(url, doctor=d)
session = soap.service.login(login, passwd, 'es', 0)
#dedicatedNetbootModifyById changes the mode of the next reboot
result = soap.service.dedicatedNetbootModifyById(session, nodeovh, mode, '', email)
#dedicatedHardRebootDo initiates a hard reboot on the given node
soap.service.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es')
soap.service.logout(session)
def reboot_status(nodeovh,login,passwd):
imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
imp.filter.add('http://soapi.ovh.com/manager')
d = ImportDoctor(imp)
soap = Client(url, doctor=d)
session = soap.service.login(login, passwd, 'es', 0)
result = soap.service.dedicatedHardRebootStatus(session, nodeovh)
tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S')
tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S')
result.start = tmpstart
result.end = tmpend
soap.service.logout(session)
return result
#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog
global all_opt
device_opt = [ "email", "ipaddr", "action" , "login" , "passwd" , "nodename" ]
ovh_fence_opt = {
"email" : {
"getopt" : "Z:",
"longopt" : "email",
"help" : "-Z, --email=<email> email for reboot message: admin@domain.com",
"required" : "1",
"shortdesc" : "Reboot email",
"default" : "",
"order" : 1 },
}
all_opt.update(ovh_fence_opt)
all_opt["ipaddr"]["shortdesc"] = "OVH node name"
atexit.register(atexit_handler)
options=check_input(device_opt,process_input(device_opt))
# Not sure if I need this old notation
## Support for -n [switch]:[plug] notation that was used before
if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))):
(switch, plug) = options["-n"].split(":", 1)
if ((switch.isdigit()) and (plug.isdigit())):
options["-s"] = switch
options["-n"] = plug
if (not (options.has_key("-s"))):
options["-s"]="1"
docs = { }
docs["shortdesc"] = "Fence agent for OVH"
docs["longdesc"] = "fence_ovh is an Power Fencing agent \
which can be used within OVH datecentre. \
Poweroff is simulated with a reboot into rescue-pro \
mode. \
/usr/local/etc/ovhsecret example: \
\
[OVH] \
Login = ab12345-ovh \
Passwd = MYSECRET \
"
docs["vendorurl"] = "http://www.ovh.net"
show_docs(options, docs)
#I use a own logfile for debugging purpose
if OVH_FENCE_DEBUG:
logfile=open("/var/log/fence_ovh.log", "a");
logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n"))
logfile.write("Parameter:\t")
for val in sys.argv:
logfile.write(val + " ")
logfile.write("\n")
print options
action=options['--action']
email=options['--email']
login=options['--username']
passwd=options['--password']
nodeovh=options['--ip']
if nodeovh[-8:] != '.ovh.net':
nodeovh += '.ovh.net'
# Save datetime just before changing netboot
before_netboot_reboot = datetime.now()
if action == 'off':
netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro
elif action == 'on':
netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
elif action == 'reboot':
netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
else:
if OVH_FENCE_DEBUG:
logfile.write("nothing to do\n")
logfile.close()
errlog.close()
sys.exit()
if action == 'off':
time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM
elif action == 'on':
time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
elif action == 'reboot':
time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
else:
if OVH_FENCE_DEBUG:
logfile.write("No sense! Check script please!\n")
logfile.close()
errlog.close()
sys.exit()
after_netboot_reboot = datetime.now()
# Verification of success
reboot_start_end=reboot_status(nodeovh,login,passwd)
if OVH_FENCE_DEBUG:
logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n")
logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n")
logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)):
if OVH_FENCE_DEBUG:
logfile.write("Netboot reboot went OK.\n")
else:
if OVH_FENCE_DEBUG:
logfile.write("ERROR: Netboot reboot wasn't OK.\n")
logfile.close()
errlog.close()
sys.exit(1)
if OVH_FENCE_DEBUG:
logfile.close()
errlog.close()
From web interface :
From pveproxy web interface on pmox1, from HA tab, when i add a "HA Managed VM/CT" and applicate i have an error : "config validation failed: unknown error (500)"
Thanks all,Regards,Maxime.
Last edited: