Unable to Fence using APC PDU

riptide_wave · May 10, 2013

Hello, I am currently trying to get fencing to work with my 3 node cluster, but I am unable to get it to work. From what it seems, fence_apc is unable to contact the PDU, but I can ping and SSH into the PDU from each node without a problem (minus a 20 second wait time for password prompt) . I have a APC Rack PDU model APC7930, and I am using 2 HP DL360 G5 Servers, and 1 custom 2U Server. All have the latest version of Proxmox, with all of the latest updates.

I have tried the test command, and here is what it gives me:

Code:

root@srv-1-02:~# fence_apc -x -l proxmox -p XXXX -a 10.1.7.3 -o status -n 1 -vUnable to connect/login to fencing device

pveversion -v

Code:

pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-95
pve-kernel-2.6.32-19-pve: 2.6.32-95
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1

/etc/pve/cluster.conf

Code:

<?xml version="1.0"?><cluster name="Cluster-1" config_version="8">


  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>


  <fencedevices>
    <fencedevice agent="fence_apc" ipaddr="10.1.7.3" login="proxmox" name="pdu-1-01" passwd="XXXX" power_wait="10"/>
  </fencedevices>


  <clusternodes>


    <clusternode name="srv-1-02" votes="1" nodeid="1">
      <fence>
        <method name="power">
          <device name="pdu-1-01" port="1" secure="on"/>
        </method>
      </fence>
    </clusternode>


    <clusternode name="srv-1-03" votes="1" nodeid="2">
      <fence>
        <method name="power">
          <device name="pdu-1-01" port="2" secure="on"/>
          <device name="pdu-1-01" port="3" secure="on"/>
        </method>
      </fence>
    </clusternode>


    <clusternode name="srv-1-04" votes="1" nodeid="3">
      <fence>
        <method name="power">
          <device name="pdu-1-01" port="4" secure="on"/>
          <device name="pdu-1-01" port="5" secure="on"/>
        </method>
      </fence>
    </clusternode>


  </clusternodes>


  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="10.1.8.1"/>
    </service>
  </rm>


</cluster>

Any ideas? Thanks

dietmar · May 10, 2013

maybe you need to set longer "login_timeout" (default is 5 seconds).

tom · May 10, 2013

APC: Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply.

riptide_wave · May 10, 2013

dietmar said:
maybe you need to set longer "login_timeout" (default is 5 seconds).

Tested again with the following command, still no luck.

Code:

[COLOR=#333333]root@srv-1-02:~# fence_apc -x -l proxmox -p XXXX -a 10.1.7.3 -o status -n 1 -v [/COLOR]--login-timeout 60
[COLOR=#333333]Unable to connect/login to fencing device[/COLOR]

tom said:
APC: Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply.

SSH is enabled as I am able to SSH into the device without an issue. As for Outlet Access, there does not seem to be an option for this. However, when I SSH into the PDU I am able to trigger outlets on and off using the created account. Attached are the screenshots of what I have setup in the APC Management GUI.

hotwired007 · May 10, 2013

would it be worth trying it in SSH V1 and V2?

riptide_wave · May 10, 2013

hotwired007 said:
would it be worth trying it in SSH V1 and V2?

you sir have fixed it! I swear I tried that yesterday, but it still gave me problems.

Consider this case solved!

hotwired007 · May 10, 2013

glad to help

i had similar issues with my eaton ePDUs

riptide_wave · May 10, 2013

Thanks again,

and quick question. I have all of my devices on a UPS. In case of power loss, and battery levels become critical, how would I go about shutting down all servers with HA enabled? Do I have to stop rgmanager on each node before issuing a shutdown command, or is this unneeded? I don't want to have it try to put all of the VM's on a single node just for shutdown.

mir · May 10, 2013

You need to shutdown a HA managed VM you need to send the 'stop' command. Sending 'shutdown' will only cause the vm to be started on another node.

You could execute a script like below on every node:

Code:

#!/bin/sh


echo "Stopping all running CT's"
CT=$(pvectl list | awk '{if ($3 == "running") print $1}')
if [ -n "$CT" ]; then
        for ct in $CT; do
                pvectl stop $ct
        done
fi


echo "Stopping all running VM's"
VM=$(qm list | awk '{if ($3 == "running") print $1}')
if [ -n "$VM" ]; then
        for vm in $VM; do
                qm stop $vm
        done
fi

riptide_wave · May 10, 2013

mir said:
You need to shutdown a HA managed VM you need to send the 'stop' command. Sending 'shutdown' will only cause the vm to be started on another node.

You could execute a script like below on every node:

Code:

#!/bin/sh echo "Stopping all running CT's" CT=$(pvectl list | awk '{if ($3 == "running") print $1}') if [ -n "$CT" ]; then for ct in $CT; do pvectl stop $ct done fi echo "Stopping all running VM's" VM=$(qm list | awk '{if ($3 == "running") print $1}') if [ -n "$VM" ]; then for vm in $VM; do qm stop $vm done fi

Thanks, exactly what I was looking for!

Search

Search

Unable to Fence using APC PDU

riptide_wave

Member

dietmar

Proxmox Staff Member

tom

Proxmox Staff Member

riptide_wave

Member

Attachments

hotwired007

Member

riptide_wave

Member

hotwired007

Member

riptide_wave

Member

mir

Famous Member

riptide_wave

Member

We value your privacy