"Connection timed out" when fencing node, but it does actually shut down (iDRAC)

jampy

Member
Jun 26, 2015
39
0
6
I'm trying to use iDRAC 8 as a fencing device.

Status query works fine, however `fence_node node2` aborts after a few seconds with "agent fence_drac5 result: status error". However, the node (gracefully) shuts down nonetheless.

I tried to manually start fence_drac5 and I can see that the connection times out immediately after issueing "powerstatus":

Code:
    # time fence_drac5 --ip=xxxxxxxxxx -l fencing_user -p xxxxxxxxx -c "admin1->" -x -v -v -v -o off
    INFO:root:Delay 0 second(s) before logging in to the fence       device
    INFO:root:Running command: /usr/bin/ssh fencing_user@xxxxxxxxxx -p 22 -o PubkeyAuthentication=no
    DEBUG:root:Received: fencing_user@xxxxxxxxxxxx's password:
    DEBUG:root:Sent: xxxxxxxxxxxxxx
    DEBUG:root:Sent:
    
    DEBUG:root:Received:
    /admin1->
    DEBUG:root:Sent: racadm serveraction powerstatus
    
    DEBUG:root:Received:  racadm serveraction powerstatus
    Server power status: ON
    /admin1->
    DEBUG:root:Sent: racadm serveraction powerdown
    
    DEBUG:root:Received:
    /admin1->
    DEBUG:root:Sent: racadm serveraction powerstatus
    
    ERROR:root:Connection timed out
    
    
    
    real    0m7.624s
    user    0m0.028s
    sys     0m0.007s

I get similar results when powering the node up (it starts, but I get a "Connection timed out" error).

What can I do to solve the problem?
 
Oh, I forgot:

Code:
# fence_drac5 -V
4.0.10 (built Thu Dec 4 12:16:32 CET 2014)
Copyright (C) Red Hat, Inc. 2004-2010 All rights reserved.
 
Nobody can help me?

IMHO I think the problem is that the SSH commands cause multiple command prompts (does fence_drac5 send two RETURN characters, perhaps). Apparently there is a "/admin1->" response already waiting when the `racadm serveraction powerdown` command is issued and the parser gets confused. It would probably help to discard any input before sending a new command.

Can anyone help me with this issue please?
 
Ok, I'm replying to myself once again. Hopefully this will help someone else..

Since I could not get fence_drac5 to work with IDRAC v8, I wrote my own fencing script, which I post here.

Code:
#!/bin/bash

# Written by Udo Giacomozzi <udo.giacomozzi@indunet.it>
# Public domain.
VERSION=1.0

    

show_help()
{
  cat <<EOF
Usage:
        fence_drac5 [options]
Options:
   -a, --ip=[ip]                  IP address or hostname of fencing device
   -l, --username=[name]          Login name
   -p, --password=[password]      Login password or passphrase
   -c, --command-prompt=[prompt]  command prompt string (default "admin1->")
   -x, --ssh                      Use ssh connection
   -u, --ipport=[port]            TCP/UDP port to use (default 22)
   --ssh-options=[options]        SSH options to use
   -o, --action=[action]          Action: status, reboot (default), off or on
   -v, --verbose                  Verbose mode
   -V, --version                  Output version information and exit
   -h, --help                     Display this help and exit
EOF

  exit 1
  
}


show_version()
{
  echo "$VERSION"
}


gen_expect_commands()
{
  cat <<EOF
# action $action
log_user $be_verbose  
spawn ssh $ssh_options -o StrictHostKeyChecking=no $username@$ip_address
expect "assword:"
send "$password\r"
expect {
  "denied"     "exit 9"
  "$prompt"    { }
}

proc verify_off {} {
  set timeout 5
  send "racadm serveraction powerstatus\r"
  expect {
    "power status: OFF"   { expect "$prompt" ; return }
    timeout               "exit 3"
    "$prompt"             "exit 5"
  }
}                 

proc verify_on {} {
  set timeout 5
  send "racadm serveraction powerstatus\r"
  expect {
    "power status: ON"    { expect "$prompt" ; return }
    timeout               "exit 3"
    "$prompt"             "exit 5"
  }
}                 

EOF

  if [ "$action" = "status" ]; then
    cat <<EOF
      send "racadm serveraction powerstatus\r"
      expect {
        "power status: ON" { send_user "Status: ON" ; return 0 }
        "power status: OFF" { send_user "Status: OFF" ; return 0 }
      }
EOF
  fi

  if [ "$action" = "reboot" ]; then
    echo 'send "racadm serveraction hardreset\r"'
    echo 'expect "operation successful"'
  fi

  if [ "$action" = "off" ]; then
    cat <<EOF
      send "racadm serveraction powerdown\r"
      
      set timeout 20    
      expect {
        "operation successful"    { expect "$prompt" ; verify_off }
        "is already powered OFF"  { expect "$prompt" ; verify_off }
        timeout                   "exit 3"
        "$prompt"                 "exit 5"  # unrecognized response
      }
      
      
      # do it twice, because otherwise (for some reason) the server may boot up
      # again after a few seconds.. 
      send "racadm serveraction powerdown\r"
      
      set timeout 20    
      expect {
        "operation successful"    { expect "$prompt" ; verify_off }
        "is already powered OFF"  { expect "$prompt" ; verify_off }
        timeout                   "exit 3"
        "$prompt"                 "exit 5"  # unrecognized response
      }
EOF
  fi

  if [ "$action" = "on" ]; then
    cat <<EOF
      send "racadm serveraction powerup\r"
      
      set timeout 20    
      expect {
        "operation successful"    { expect "$prompt" ; verify_on }
        "is already powered ON"   { expect "$prompt" ; verify_on }
        timeout                   "exit 3"
        "$prompt"                 "exit 5"  # unrecognized response
      }
EOF
  fi

}

#### Parse arguments ###########################################################

# based on http://stackoverflow.com/a/14203146/688869

ip_port=22
ssh_options=""
be_verbose=0
prompt="/admin1->"

if [[ $# -eq 0 ]]; then

  action=off

  while read line ; do
    key="${line%=*}"
    value="${line#*=}"
    
    case $key in
      cmd_prompt)
        prompt="$value"
        ;;
      
      ipaddr)
        ip_address=$value
        ;;
        
      ipport)
        ip_port=$value
        ;;
        
      login)
        username="$value"
        ;;
        
      passwd)
        password="$value"
        ;;
        
      secure)
        use_ssh=1
        ;;
    esac
  done      

else 

  while [[ $# > 0 ]]
  do
    key="$1"
    
    case $key in
  
      -h|--help)
        show_help
        ;;
        
        
      -a)
        ip_address="$2"
        shift
        ;;
        
      --ip=*)
        ip_address="${key#*=}"
        ;;
        
  
      -u)
        ip_port="$2"
        shift
        ;;
        
      --ipport=*)
        ip_port="${key#*=}"
        ;;
        
  
      -l)
        username="$2"
        shift
        ;;
        
      --username=*)
        username="${key#*=}"
        ;;
        
  
      -p)
        password="$2"
        shift
        ;;
        
      --password=*)
        password="${key#*=}"
        ;;
        
  
      -c)
        prompt="$2"
        shift
        ;;
        
      --command-prompt=*)
        prompt="${key#*=}"
        ;;
  
  
      -o)
        action="$2"
        shift
        ;;
        
      --action=*)
        action="${key#*=}"
        ;;
        
  
      --ssh-options=*)
        ssh_options="${key#*=}"
        ;;
        
  
      -x|--ssh)
        use_ssh=1
        ;;
        
        
      -v|--verbose)
        be_verbose=1
        ;;
        
        
      -V|--version)
        show_version
        exit 0
        ;;
        
      *)
        echo "Error: unknown option $key"    
        show_help
        ;;    
    esac
    
    shift # past argument or value
  done

fi

[ -z "$ip_address" ] && echo "ERROR: Missing IP address" && show_help
[ -z "$username" ] && echo "ERROR: Missing username" && show_help
[ -z "$password" ] && echo "ERROR: Missing password" && show_help
[ -z "$action" ] && echo "ERROR: Missing action" && show_help
[ ! "$use_ssh" = "1" ] && echo "ERROR: -x option is mandatory" && show_help


#### Check prerequisites #######################################################

which expect >/dev/null
if [ $? -ne 0 ]; then
  echo "ERROR: missing 'expect' tool"
  exit 2
fi


#### Run `expect' ##############################################################

cmd_fn=/tmp/fence_idrac8-$$.tmp
gen_expect_commands >$cmd_fn
if [ $be_verbose -eq 1 ]; then
  echo "===EXPECT SCRIPT======================================================="
  cat $cmd_fn
  echo "======================================================================="
fi  
expect -f $cmd_fn
ec=$?
rm -f $cmd_fn
exit $ec

This should be saved under /usr/sbin/fence_idrac8 (on each node!) with same rights as other fence agents.

Perhaps it's necessary to run "ccs_update_schema" once after installing, I'm not sure.

It works as a replacement for fence_drac5 and works like a charm for me.
 
I had the same problem but got it solved with fence_drac5. You pointed me in the right direction :-)

Your command (I used the same):
Code:
fence_drac5 --ip=xxxxxxxxxx -l fencing_user -p xxxxxxxxx -c "admin1->" -x -v -v -v -o off
is according to the documentation at https://pve.proxmox.com/wiki/Fencing#Example_.2Fetc.2Fpve.2Fcluster.conf.new_with_iDRAC
But the given command-prompt is incorrect. You can see it in your own results:
Code:
DEBUG:root:Received:
    /admin1->
As you can see there's a / before admin1->, you (and I) didn't specify that / in the command prompt.

In your own fencing script you specified it correctly:
Code:
prompt="/admin1->"

This command works fine:
Code:
fence_drac5 --ip=xxxxxxxxxx -l fencing_user -p xxxxxxxxx -c "/admin1->" -x -v -v -v -o off

In cluster.conf it looks like:
Code:
<fencedevice agent="fence_drac5" cmd_prompt="/admin1->" ipaddr="xxxxxxxxxx" login="fencing_user" name="node01-drac" passwd="xxxxxxxxx" secure="1" login_timeout="10"/>
I also specified a login_timeout because our DRAC isn't always responding within the default of 5 seconds. This makes it more reliable.