Proxmox HA Cluster at OVH - Fencing

stacktrace

New Member
Sep 5, 2012
14
0
1
Germany
proxmox.stacktrace.de
I'm trying to raise a Proxmox HA cluster with OVH rootsevers. One must-have requirement is a working fence device. As I could not find other ways to implement this with OVH, I wrote a own quick-and-dirty fence agent in python. It's working that far, but it's not really good in any way. In fact, it's quite bad ;) Besides the ugly Python code - What is really bad for using it in productive environments: there is not yet any verification of success.

I post it here hoping that you can help to improve it. If you need the OVH SOAP API reference, you can find it at http://www.ovh.com/soapi/en/

Code:
#!/usr/bin/python
#This is a fence agent for use at OVH
#As there are no other fence devices available, we must use OVH's SOAP API #Quick-and-dirty assemled by Dennis Busch, secofor GmbH, Germany #Thanks to Elbrunz's Blog for the config parsing code [URL]http://elbrunz.wordpress.com[/URL] #This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
 
import sys
from SOAPpy import WSDL
import time
 
def action_do(nodename,login,passwd,email,mode):
    soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.47.wsdl')
    session = soap.login(login, passwd, 'de', 0)
 
    #dedicatedNetbootModifyById changes the mode of the next reboot
    result = soap.dedicatedNetbootModifyById(session, nodename, mode, '')
 
    #dedicatedHardRebootDo initiates a hard reboot on the given node
    soap.dedicatedHardRebootDo(session, nodename, 'Fencing initiated by cluster', '', 'de')
 
    soap.logout(session)
 
#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog
 
#I use a own logfile for debugging purpose logfile=open("/var/log/fence_ovh.log", "a"); logfile.write(time.strftime("%d.%m.%Y %H:%M:%S \t"))
logfile.write("Parameter:\t")
for val in sys.argv:
    logfile.write(val + " ")
logfile.write("\n")
logfile.write("Optionen\t")
 
#fenced hands over the attributes via stdin #thanks to Elbrunz for the following parser lines COMMENT_CHAR = '#'
OPTION_CHAR =  '='
options = {}
for line in sys.stdin.readlines():
    logfile.write(line)
    # First, remove comments:
    if COMMENT_CHAR in line:
       # split on comment char, keep only the part before
       line, comment = line.split(COMMENT_CHAR, 1)
    # Second, find lines with an option=value:
    if OPTION_CHAR in line:
        # split on option char:
        option, value = line.split(OPTION_CHAR, 1)
        # strip spaces:
        option = option.strip()
        value = value.strip()
        # store in dictionary:
        options[option] = value
for val in options:
    logfile.write(val + "\n")
logfile.write("\n")
 
if 'action' in options:
    action=options['action']
else:
    logfile.write("nothing to do")
    sys.exit()
if 'login' in options:
    login=options['login']
if 'passwd' in options:
    passwd=options['passwd']
if 'email' in options:
    email=options['email']
if 'nodename' in options:
    nodename=options['nodename']
    if nodename[-8:] != '.ovh.net':
        nodename += '.ovh.net'
 
if action == 'off':
    action_do(nodename,login,passwd,email,'29') #Reboot in vKVM elif action == 'on':
    action_do(nodename,login,passwd,email,'1') #Reboot from HD elif action == 'reboot':
    action_do(nodename,login,passwd,email, '1') #Reboot from HD
else:
    logfile.write("nothing to do")
    sys.exit()
 
errlog.close()
logfile.close()
Dennis Busch
 
Last edited by a moderator:
Hi,

After much research, I came to the same conclusion, writing his own script.
Have you put in production at ovh this solution?
If so, what are your recommendations? And could you share your research about this fence_ovh?

Have a good day.

Guillaume
 
I'm trying to raise a Proxmox HA cluster with OVH rootsevers.

I post it here hoping that you can help to improve it. If you need the OVH SOAP API reference, you can find it at http://www.ovh.com/soapi/en/

First you need to make sure python-soappy package is installed.

So you save:
Code:
#!/usr/bin/python
# Copyright 2013 Adrian Gibanel Lopez (bTactic)
# Adrian Gibanel improved this script
# at 2013 to add verification of success
# and to output metadata

# Based on:
# This is a fence agent for use at OVH
# As there are no other fence devices available,
# we must use OVH's SOAP API #Quick-and-dirty
# assemled by Dennis Busch, secofor GmbH,
# Germany
# This work is licensed under a
# Creative Commons Attribution-ShareAlike 3.0 Unported License.

# Manual call parametres example
#
# login=ab12345-ovh
# passwd=MYSECRET
# email=admin@myadmin
# ipaddr=ns12345
# action=off

# # where ipaddr is your server's OVH name

import sys, re, pexpect
sys.path.append("/usr/share/fence")
from fencing import *

import sys
from SOAPpy import WSDL
import time
from datetime import datetime

OVH_RESCUE_PRO_NETBOOT_ID='28'
OVH_HARD_DISK_NETBOOT_ID='1'
STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot
STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run
OVH_FENCE_DEBUG=False # True or False for debug

def netboot_reboot(nodeovh,login,passwd,email,mode):
    soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl')
    session = soap.login(login, passwd, 'es', 0)
 
    #dedicatedNetbootModifyById changes the mode of the next reboot
    result = soap.dedicatedNetbootModifyById(session, nodeovh, mode, '', email)
 
    #dedicatedHardRebootDo initiates a hard reboot on the given node
    soap.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es')
 
    soap.logout(session)

def reboot_status(nodeovh,login,passwd):
    soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl')
    session = soap.login(login, passwd, 'es', 0)
 
    result = soap.dedicatedHardRebootStatus(session, nodeovh)
    tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S')
    tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S')
    result.start = tmpstart
    result.end = tmpend

    soap.logout(session)
    return result

#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog

global all_opt

device_opt = [  "email", "ipaddr", "action" , "login" , "passwd"]

ovh_fence_opt = {
        "email" : {
                "getopt" : "Z:",
                "longopt" : "email",
                "help" : "-Z, --email=<email>          email for reboot message: admin@domain.com",
                "required" : "1",
                "shortdesc" : "Reboot email",
                "default" : "",
                "order" : 1 },
}

all_opt.update(ovh_fence_opt)
all_opt["ipaddr"]["shortdesc"] = "OVH node name"

atexit.register(atexit_handler)
options=check_input(device_opt,process_input(device_opt))
# Not sure if I need this old notation
## Support for -n [switch]:[plug] notation that was used before
if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))):
    (switch, plug) = options["-n"].split(":", 1)
    if ((switch.isdigit()) and (plug.isdigit())):
        options["-s"] = switch
        options["-n"] = plug

if (not (options.has_key("-s"))):
    options["-s"]="1"

docs = { }
docs["shortdesc"] = "Fence agent for OVH"
docs["longdesc"] = "fence_ovh is an Power Fencing agent \
which can be used within OVH datecentre. \
Poweroff is simulated with a reboot into rescue-pro \
mode. \
 /usr/local/etc/ovhsecret example: \
 \
 [OVH] \
 Login = ab12345-ovh \
 Passwd = MYSECRET \
"
docs["vendorurl"] = "http://www.ovh.net"
show_docs(options, docs)


#I use a own logfile for debugging purpose
if OVH_FENCE_DEBUG:
    logfile=open("/var/log/fence_ovh.log", "a");
    logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n"))
    logfile.write("Parameter:\t")
    for val in sys.argv:
    logfile.write(val + " ")
    logfile.write("\n")

action=options['-o']
email=options['-Z']
login=options['-l']
passwd=options['-p']
nodeovh=options['-a']
if nodeovh[-8:] != '.ovh.net':
    nodeovh += '.ovh.net'
    
# Save datetime just before changing netboot
before_netboot_reboot = datetime.now()

if action == 'off':
    netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro
elif action == 'on':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
elif action == 'reboot':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("nothing to do\n")
    logfile.close()
    errlog.close()
    sys.exit()

if action == 'off':
    time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM
elif action == 'on':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
elif action == 'reboot':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("No sense! Check script please!\n")
    logfile.close()
    errlog.close()
    
    sys.exit()

after_netboot_reboot = datetime.now()

# Verification of success

reboot_start_end=reboot_status(nodeovh,login,passwd)
if OVH_FENCE_DEBUG:
    logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")

if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)):
    if OVH_FENCE_DEBUG:
    logfile.write("Netboot reboot went OK.\n")
else:
    if OVH_FENCE_DEBUG:
    logfile.write("ERROR: Netboot reboot wasn't OK.\n")
    logfile.close()
    errlog.close()
    sys.exit(1)


if OVH_FENCE_DEBUG:
    logfile.close()
errlog.close()
as:
Code:
/usr/sbin/fence_ovh
.

Then you run:
Code:
ccs_update_schema
so that you can validate it as suggested with:
Code:
ccs_config_validate -v -f /etc/pve/cluster.conf.new
.

Here's an cluster.conf.new example:
Code:
<?xml version="1.0"?>
<cluster name="ha-008-010" config_version="3">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" two_node="1" expected_votes="1">
</cman>

<fencedevices>
        <fencedevice agent="fence_ovh" name="fence008" email="admin@domain.com" ipaddr="ns123456" login="ab12345-ovh" passwd="MYSECRET" />
        <fencedevice agent="fence_ovh" name="fence010" email="admin@domain.com" ipaddr="ns789012" login="ab12345-ovh" passwd="MYSECRET" />
</fencedevices>

<clusternodes>
<clusternode name="server008" nodeid="1" votes="1">
  <fence>
    <method name="1">
      <device name="fence008" action="off"/>
    </method>
  </fence>
</clusternode>
<clusternode name="server010" nodeid="2" votes="1">
  <fence>
    <method name="1">
      <device name="fence010" action="off"/>
    </method>
  </fence>
</clusternode>
</clusternodes>


</cluster>
.

Any feeback is welcomed. I'm sure the script can be improved a lot.

Adrian Gibanel
bTactic
 
I've rewritten the script so that it works in Proxmox 3 and so that uses python-suds library instead of python-soappy. Please read former post for instructions on how to use it.

Update fence_ovh script:
Code:
#!/usr/bin/python
# Copyright 2013 Adrian Gibanel Lopez (bTactic)
# Adrian Gibanel improved this script
# at 2013 to add verification of success
# and to output metadata

# Based on:
# This is a fence agent for use at OVH
# As there are no other fence devices available,
# we must use OVH's SOAP API #Quick-and-dirty
# assemled by Dennis Busch, secofor GmbH,
# Germany
# This work is licensed under a
# Creative Commons Attribution-ShareAlike 3.0 Unported License.

# Manual call parametres example
#
# login=ab12345-ovh
# passwd=MYSECRET
# email=admin@myadmin
# ipaddr=ns12345
# action=off

# # where ipaddr is your server's OVH name

import sys, re, pexpect
sys.path.append("/usr/share/fence")
from fencing import *

import sys
from suds.client import Client
from suds.xsd.doctor import ImportDoctor, Import
import time
from datetime import datetime

OVH_RESCUE_PRO_NETBOOT_ID='28'
OVH_HARD_DISK_NETBOOT_ID='1'
STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot
STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run
OVH_FENCE_DEBUG=False # True or False for debug

def netboot_reboot(nodeovh,login,passwd,email,mode):
    imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
    url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
    imp.filter.add('http://soapi.ovh.com/manager')
    d = ImportDoctor(imp)
    soap = Client(url, doctor=d)
    session = soap.service.login(login, passwd, 'es', 0)
 
    #dedicatedNetbootModifyById changes the mode of the next reboot
    result = soap.service.dedicatedNetbootModifyById(session, nodeovh, mode, '', email)
 
    #dedicatedHardRebootDo initiates a hard reboot on the given node
    soap.service.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es')
 
    soap.service.logout(session)

def reboot_status(nodeovh,login,passwd):
    imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
    url='https://www.ovh.com/soapi/soapi-re-1.59.wsdl'
    imp.filter.add('http://soapi.ovh.com/manager')
    d = ImportDoctor(imp)
    soap = Client(url, doctor=d)
    session = soap.service.login(login, passwd, 'es', 0)
 
    result = soap.service.dedicatedHardRebootStatus(session, nodeovh)
    tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S')
    tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S')
    result.start = tmpstart
    result.end = tmpend

    soap.service.logout(session)
    return result

#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog

global all_opt

device_opt = [  "email", "ipaddr", "action" , "login" , "passwd" , "nodename" ]

ovh_fence_opt = {
        "email" : {
                "getopt" : "Z:",
                "longopt" : "email",
                "help" : "-Z, --email=<email>          email for reboot message: admin@domain.com",
                "required" : "1",
                "shortdesc" : "Reboot email",
                "default" : "",
                "order" : 1 },
}

all_opt.update(ovh_fence_opt)
all_opt["ipaddr"]["shortdesc"] = "OVH node name"

atexit.register(atexit_handler)
options=check_input(device_opt,process_input(device_opt))
# Not sure if I need this old notation
## Support for -n [switch]:[plug] notation that was used before
if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))):
    (switch, plug) = options["-n"].split(":", 1)
    if ((switch.isdigit()) and (plug.isdigit())):
        options["-s"] = switch
        options["-n"] = plug

if (not (options.has_key("-s"))):
    options["-s"]="1"

docs = { }
docs["shortdesc"] = "Fence agent for OVH"
docs["longdesc"] = "fence_ovh is an Power Fencing agent \
which can be used within OVH datecentre. \
Poweroff is simulated with a reboot into rescue-pro \
mode. \
 /usr/local/etc/ovhsecret example: \
 \
 [OVH] \
 Login = ab12345-ovh \
 Passwd = MYSECRET \
"
docs["vendorurl"] = "http://www.ovh.net"
show_docs(options, docs)


#I use a own logfile for debugging purpose
if OVH_FENCE_DEBUG:
    logfile=open("/var/log/fence_ovh.log", "a");
    logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n"))
    logfile.write("Parameter:\t")
    for val in sys.argv:
    logfile.write(val + " ")
    logfile.write("\n")

print options

action=options['--action']
email=options['--email']
login=options['--username']
passwd=options['--password']
nodeovh=options['--ip']
if nodeovh[-8:] != '.ovh.net':
    nodeovh += '.ovh.net'
    
# Save datetime just before changing netboot
before_netboot_reboot = datetime.now()

if action == 'off':
    netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro
elif action == 'on':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
elif action == 'reboot':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("nothing to do\n")
    logfile.close()
    errlog.close()
    sys.exit()

if action == 'off':
    time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM
elif action == 'on':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
elif action == 'reboot':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
    logfile.write("No sense! Check script please!\n")
    logfile.close()
    errlog.close()
    
    sys.exit()

after_netboot_reboot = datetime.now()

# Verification of success

reboot_start_end=reboot_status(nodeovh,login,passwd)
if OVH_FENCE_DEBUG:
    logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")

if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)):
    if OVH_FENCE_DEBUG:
    logfile.write("Netboot reboot went OK.\n")
else:
    if OVH_FENCE_DEBUG:
    logfile.write("ERROR: Netboot reboot wasn't OK.\n")
    logfile.close()
    errlog.close()
    sys.exit(1)


if OVH_FENCE_DEBUG:
    logfile.close()
errlog.close()

Cluster.conf example:
Code:
<?xml version="1.0"?>
<cluster name="ha-008-010" config_version="3">

<cman keyfile="/var/lib/pve-cluster\
/corosync.authkey" transport="udpu" \
two_node="1" expected_votes="1">
</cman>

<fencedevices>
        <fencedevice agent="fence_ovh" \
name="fence008" email="myadmin@domain.com" \
ipaddr="ns1234" login="ab12345-ovh" passwd="MYSECRET" />
        <fencedevice agent="fence_ovh" \
name="fence010" email="myadmin@domain.com" \
ipaddr="ns5678" login="ab12345-ovh" passwd="MYSECRET" />
</fencedevices>

  <clusternodes>
<clusternode name="nodeA.your.domain" nodeid="1" votes="1">
  <fence>
    <method name="1">
      <device name="fence008" action="off"/>
    </method>
  </fence>
</clusternode>
<clusternode name="nodeB.your.domain" nodeid="2" votes="1">
  <fence>
    <method name="1">
      <device name="fence010" action="off"/>
    </method>
  </fence>
</clusternode>
</clusternodes>
</cluster>

As always any feedback is welcomed.
 
Hello Adrian,

thank you very much for sharing your script. I'm testing it for fencing in OVH with proxmox 3 in a 3 node cluster and I have a problem: when I add email attribute in fencedevice tag the cluster.conf doesn't validates with "ccs_config_validate -v -f /etc/pve/cluster.conf.new" and it returns something as:

Code:
root@front1:~# ccs_config_validate -v -f /etc/pve/cluster.conf.new
Creating temporary file: /tmp/tmp.xqJmYol5zU
Config interface set to:
Configuration stored in temporary file
Updating relaxng schema
Validating..
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:6: element fencedevices: Relax-NG validity error : Element cluster failed to validate content

If I remove email attribute it validates fine.

This is my cluster.conf:

Code:
<?xml version="1.0"?>
<cluster name="clusterfront" config_version="14">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"></cman>

  <fencedevices>
        <fencedevice agent="fence_ovh" name="fence008" email="mail@dominio.com" ipaddr="serverovh1" login="usuario-manager" passwd="pwdmanager" />
        <fencedevice agent="fence_ovh" name="fence010" email="mail@dominio.com" ipaddr="serverovh2" login="usuario-manager" passwd="pwdmanager" />
        <fencedevice agent="fence_ovh" name="fence012" email="mail@dominio.com" ipaddr="serverovh3" login="usuario-manager" passwd="pwdmanager" />
  </fencedevices>

  <clusternodes>
        <clusternode name="front1" votes="1" nodeid="1">
                <fence>
                        <method name="1">
                                <device name="fence008" action="off"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name="front3" votes="1" nodeid="2">
                <fence>
                        <method name="1">
                                <device name="fence010" action="off"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name="front2" votes="1" nodeid="3">
                <fence>
                        <method name="1">
                                <device name="fence012" action="off"/>
                        </method>
                </fence>
        </clusternode>
  </clusternodes>

</cluster>

How can I test that fencing (fence_ovh) is working though OVH API from API?

Thanks in advance.


Best regards, txetxu.
 
Last edited:
At first glance I don't see anything wrong.Can you run:
Code:
ccs_update_schema
or even:
Code:
ccs_update_schema --force
and see if validates after that?
 
I search too : how can I test that fencing (fence_ovh) is working though OVH API from API?Regards,Maxime.
 
Any luck on getting this working on Proxmox 5, anyone have experience with OVH Proxmox fencing?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!