"Connection timed out" when fencing node, but it does actually shut down (iDRAC)

jampy

Member
Jun 26, 2015
39
0
6
I'm trying to use iDRAC 8 as a fencing device.

Status query works fine, however `fence_node node2` aborts after a few seconds with "agent fence_drac5 result: status error". However, the node (gracefully) shuts down nonetheless.

I tried to manually start fence_drac5 and I can see that the connection times out immediately after issueing "powerstatus":

Code:
    # time fence_drac5 --ip=xxxxxxxxxx -l fencing_user -p xxxxxxxxx -c "admin1->" -x -v -v -v -o off
    INFO:root:Delay 0 second(s) before logging in to the fence       device
    INFO:root:Running command: /usr/bin/ssh fencing_user@xxxxxxxxxx -p 22 -o PubkeyAuthentication=no
    DEBUG:root:Received: fencing_user@xxxxxxxxxxxx's password:
    DEBUG:root:Sent: xxxxxxxxxxxxxx
    DEBUG:root:Sent:
    
    DEBUG:root:Received:
    /admin1->
    DEBUG:root:Sent: racadm serveraction powerstatus
    
    DEBUG:root:Received:  racadm serveraction powerstatus
    Server power status: ON
    /admin1->
    DEBUG:root:Sent: racadm serveraction powerdown
    
    DEBUG:root:Received:
    /admin1->
    DEBUG:root:Sent: racadm serveraction powerstatus
    
    ERROR:root:Connection timed out
    
    
    
    real    0m7.624s
    user    0m0.028s
    sys     0m0.007s

I get similar results when powering the node up (it starts, but I get a "Connection timed out" error).

What can I do to solve the problem?
 
Oh, I forgot:

Code:
# fence_drac5 -V
4.0.10 (built Thu Dec 4 12:16:32 CET 2014)
Copyright (C) Red Hat, Inc. 2004-2010 All rights reserved.
 
Nobody can help me?

IMHO I think the problem is that the SSH commands cause multiple command prompts (does fence_drac5 send two RETURN characters, perhaps). Apparently there is a "/admin1->" response already waiting when the `racadm serveraction powerdown` command is issued and the parser gets confused. It would probably help to discard any input before sending a new command.

Can anyone help me with this issue please?
 
Ok, I'm replying to myself once again. Hopefully this will help someone else..

Since I could not get fence_drac5 to work with IDRAC v8, I wrote my own fencing script, which I post here.

Code:
#!/bin/bash

# Written by Udo Giacomozzi <udo.giacomozzi@indunet.it>
# Public domain.
VERSION=1.0

    

show_help()
{
  cat <<EOF
Usage:
        fence_drac5 [options]
Options:
   -a, --ip=[ip]                  IP address or hostname of fencing device
   -l, --username=[name]          Login name
   -p, --password=[password]      Login password or passphrase
   -c, --command-prompt=[prompt]  command prompt string (default "admin1->")
   -x, --ssh                      Use ssh connection
   -u, --ipport=[port]            TCP/UDP port to use (default 22)
   --ssh-options=[options]        SSH options to use
   -o, --action=[action]          Action: status, reboot (default), off or on
   -v, --verbose                  Verbose mode
   -V, --version                  Output version information and exit
   -h, --help                     Display this help and exit
EOF

  exit 1
  
}


show_version()
{
  echo "$VERSION"
}


gen_expect_commands()
{
  cat <<EOF
# action $action
log_user $be_verbose  
spawn ssh $ssh_options -o StrictHostKeyChecking=no $username@$ip_address
expect "assword:"
send "$password\r"
expect {
  "denied"     "exit 9"
  "$prompt"    { }
}

proc verify_off {} {
  set timeout 5
  send "racadm serveraction powerstatus\r"
  expect {
    "power status: OFF"   { expect "$prompt" ; return }
    timeout               "exit 3"
    "$prompt"             "exit 5"
  }
}                 

proc verify_on {} {
  set timeout 5
  send "racadm serveraction powerstatus\r"
  expect {
    "power status: ON"    { expect "$prompt" ; return }
    timeout               "exit 3"
    "$prompt"             "exit 5"
  }
}                 

EOF

  if [ "$action" = "status" ]; then
    cat <<EOF
      send "racadm serveraction powerstatus\r"
      expect {
        "power status: ON" { send_user "Status: ON" ; return 0 }
        "power status: OFF" { send_user "Status: OFF" ; return 0 }
      }
EOF
  fi

  if [ "$action" = "reboot" ]; then
    echo 'send "racadm serveraction hardreset\r"'
    echo 'expect "operation successful"'
  fi

  if [ "$action" = "off" ]; then
    cat <<EOF
      send "racadm serveraction powerdown\r"
      
      set timeout 20    
      expect {
        "operation successful"    { expect "$prompt" ; verify_off }
        "is already powered OFF"  { expect "$prompt" ; verify_off }
        timeout                   "exit 3"
        "$prompt"                 "exit 5"  # unrecognized response
      }
      
      
      # do it twice, because otherwise (for some reason) the server may boot up
      # again after a few seconds.. 
      send "racadm serveraction powerdown\r"
      
      set timeout 20    
      expect {
        "operation successful"    { expect "$prompt" ; verify_off }
        "is already powered OFF"  { expect "$prompt" ; verify_off }
        timeout                   "exit 3"
        "$prompt"                 "exit 5"  # unrecognized response
      }
EOF
  fi

  if [ "$action" = "on" ]; then
    cat <<EOF
      send "racadm serveraction powerup\r"
      
      set timeout 20    
      expect {
        "operation successful"    { expect "$prompt" ; verify_on }
        "is already powered ON"   { expect "$prompt" ; verify_on }
        timeout                   "exit 3"
        "$prompt"                 "exit 5"  # unrecognized response
      }
EOF
  fi

}

#### Parse arguments ###########################################################

# based on http://stackoverflow.com/a/14203146/688869

ip_port=22
ssh_options=""
be_verbose=0
prompt="/admin1->"

if [[ $# -eq 0 ]]; then

  action=off

  while read line ; do
    key="${line%=*}"
    value="${line#*=}"
    
    case $key in
      cmd_prompt)
        prompt="$value"
        ;;
      
      ipaddr)
        ip_address=$value
        ;;
        
      ipport)
        ip_port=$value
        ;;
        
      login)
        username="$value"
        ;;
        
      passwd)
        password="$value"
        ;;
        
      secure)
        use_ssh=1
        ;;
    esac
  done      

else 

  while [[ $# > 0 ]]
  do
    key="$1"
    
    case $key in
  
      -h|--help)
        show_help
        ;;
        
        
      -a)
        ip_address="$2"
        shift
        ;;
        
      --ip=*)
        ip_address="${key#*=}"
        ;;
        
  
      -u)
        ip_port="$2"
        shift
        ;;
        
      --ipport=*)
        ip_port="${key#*=}"
        ;;
        
  
      -l)
        username="$2"
        shift
        ;;
        
      --username=*)
        username="${key#*=}"
        ;;
        
  
      -p)
        password="$2"
        shift
        ;;
        
      --password=*)
        password="${key#*=}"
        ;;
        
  
      -c)
        prompt="$2"
        shift
        ;;
        
      --command-prompt=*)
        prompt="${key#*=}"
        ;;
  
  
      -o)
        action="$2"
        shift
        ;;
        
      --action=*)
        action="${key#*=}"
        ;;
        
  
      --ssh-options=*)
        ssh_options="${key#*=}"
        ;;
        
  
      -x|--ssh)
        use_ssh=1
        ;;
        
        
      -v|--verbose)
        be_verbose=1
        ;;
        
        
      -V|--version)
        show_version
        exit 0
        ;;
        
      *)
        echo "Error: unknown option $key"    
        show_help
        ;;    
    esac
    
    shift # past argument or value
  done

fi

[ -z "$ip_address" ] && echo "ERROR: Missing IP address" && show_help
[ -z "$username" ] && echo "ERROR: Missing username" && show_help
[ -z "$password" ] && echo "ERROR: Missing password" && show_help
[ -z "$action" ] && echo "ERROR: Missing action" && show_help
[ ! "$use_ssh" = "1" ] && echo "ERROR: -x option is mandatory" && show_help


#### Check prerequisites #######################################################

which expect >/dev/null
if [ $? -ne 0 ]; then
  echo "ERROR: missing 'expect' tool"
  exit 2
fi


#### Run `expect' ##############################################################

cmd_fn=/tmp/fence_idrac8-$$.tmp
gen_expect_commands >$cmd_fn
if [ $be_verbose -eq 1 ]; then
  echo "===EXPECT SCRIPT======================================================="
  cat $cmd_fn
  echo "======================================================================="
fi  
expect -f $cmd_fn
ec=$?
rm -f $cmd_fn
exit $ec

This should be saved under /usr/sbin/fence_idrac8 (on each node!) with same rights as other fence agents.

Perhaps it's necessary to run "ccs_update_schema" once after installing, I'm not sure.

It works as a replacement for fence_drac5 and works like a charm for me.
 
I had the same problem but got it solved with fence_drac5. You pointed me in the right direction :)

Your command (I used the same):
Code:
fence_drac5 --ip=xxxxxxxxxx -l fencing_user -p xxxxxxxxx -c "admin1->" -x -v -v -v -o off
is according to the documentation at https://pve.proxmox.com/wiki/Fencing#Example_.2Fetc.2Fpve.2Fcluster.conf.new_with_iDRAC
But the given command-prompt is incorrect. You can see it in your own results:
Code:
DEBUG:root:Received:
    /admin1->
As you can see there's a / before admin1->, you (and I) didn't specify that / in the command prompt.

In your own fencing script you specified it correctly:
Code:
prompt="/admin1->"

This command works fine:
Code:
fence_drac5 --ip=xxxxxxxxxx -l fencing_user -p xxxxxxxxx -c "/admin1->" -x -v -v -v -o off

In cluster.conf it looks like:
Code:
<fencedevice agent="fence_drac5" cmd_prompt="/admin1->" ipaddr="xxxxxxxxxx" login="fencing_user" name="node01-drac" passwd="xxxxxxxxx" secure="1" login_timeout="10"/>
I also specified a login_timeout because our DRAC isn't always responding within the default of 5 seconds. This makes it more reliable.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!