Unable to Fence using APC PDU

riptide_wave

Member
Mar 21, 2013
73
2
8
Minnesota, USA
Hello, I am currently trying to get fencing to work with my 3 node cluster, but I am unable to get it to work. From what it seems, fence_apc is unable to contact the PDU, but I can ping and SSH into the PDU from each node without a problem (minus a 20 second wait time for password prompt) . I have a APC Rack PDU model APC7930, and I am using 2 HP DL360 G5 Servers, and 1 custom 2U Server. All have the latest version of Proxmox, with all of the latest updates.

I have tried the test command, and here is what it gives me:

Code:
root@srv-1-02:~# fence_apc -x -l proxmox -p XXXX -a 10.1.7.3 -o status -n 1 -vUnable to connect/login to fencing device

pveversion -v
Code:
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-95
pve-kernel-2.6.32-19-pve: 2.6.32-95
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1

/etc/pve/cluster.conf
Code:
<?xml version="1.0"?><cluster name="Cluster-1" config_version="8">


  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>


  <fencedevices>
    <fencedevice agent="fence_apc" ipaddr="10.1.7.3" login="proxmox" name="pdu-1-01" passwd="XXXX" power_wait="10"/>
  </fencedevices>


  <clusternodes>


    <clusternode name="srv-1-02" votes="1" nodeid="1">
      <fence>
        <method name="power">
          <device name="pdu-1-01" port="1" secure="on"/>
        </method>
      </fence>
    </clusternode>


    <clusternode name="srv-1-03" votes="1" nodeid="2">
      <fence>
        <method name="power">
          <device name="pdu-1-01" port="2" secure="on"/>
          <device name="pdu-1-01" port="3" secure="on"/>
        </method>
      </fence>
    </clusternode>


    <clusternode name="srv-1-04" votes="1" nodeid="3">
      <fence>
        <method name="power">
          <device name="pdu-1-01" port="4" secure="on"/>
          <device name="pdu-1-01" port="5" secure="on"/>
        </method>
      </fence>
    </clusternode>


  </clusternodes>


  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="10.1.8.1"/>
    </service>
  </rm>


</cluster>

Any ideas? Thanks
 
APC: Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply.
 
maybe you need to set longer "login_timeout" (default is 5 seconds).

Tested again with the following command, still no luck.
Code:
[COLOR=#333333]root@srv-1-02:~# fence_apc -x -l proxmox -p XXXX -a 10.1.7.3 -o status -n 1 -v [/COLOR]--login-timeout 60
[COLOR=#333333]Unable to connect/login to fencing device[/COLOR]

APC: Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply.
SSH is enabled as I am able to SSH into the device without an issue. As for Outlet Access, there does not seem to be an option for this. However, when I SSH into the PDU I am able to trigger outlets on and off using the created account. Attached are the screenshots of what I have setup in the APC Management GUI.
 

Attachments

  • 1.PNG
    1.PNG
    36.9 KB · Views: 17
  • 2.PNG
    2.PNG
    35.3 KB · Views: 16
Thanks again,

and quick question. I have all of my devices on a UPS. In case of power loss, and battery levels become critical, how would I go about shutting down all servers with HA enabled? Do I have to stop rgmanager on each node before issuing a shutdown command, or is this unneeded? I don't want to have it try to put all of the VM's on a single node just for shutdown.
 
You need to shutdown a HA managed VM you need to send the 'stop' command. Sending 'shutdown' will only cause the vm to be started on another node.

You could execute a script like below on every node:
Code:
#!/bin/sh


echo "Stopping all running CT's"
CT=$(pvectl list | awk '{if ($3 == "running") print $1}')
if [ -n "$CT" ]; then
        for ct in $CT; do
                pvectl stop $ct
        done
fi


echo "Stopping all running VM's"
VM=$(qm list | awk '{if ($3 == "running") print $1}')
if [ -n "$VM" ]; then
        for vm in $VM; do
                qm stop $vm
        done
fi
 
You need to shutdown a HA managed VM you need to send the 'stop' command. Sending 'shutdown' will only cause the vm to be started on another node.

You could execute a script like below on every node:
Code:
#!/bin/sh


echo "Stopping all running CT's"
CT=$(pvectl list | awk '{if ($3 == "running") print $1}')
if [ -n "$CT" ]; then
        for ct in $CT; do
                pvectl stop $ct
        done
fi


echo "Stopping all running VM's"
VM=$(qm list | awk '{if ($3 == "running") print $1}')
if [ -n "$VM" ]; then
        for vm in $VM; do
                qm stop $vm
        done
fi

Thanks, exactly what I was looking for!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!