Fencing HA test Scenario

N

Ninjix

Guest
Can someone give a quick method for testing if my Dell iDrac IMPI fencing works? I've followed the HA Wiki and Youtube tutorial. I've also confirmed fence_ipmilan from the command line works. What I want to do now is create a test scenario that causes the cluster to STONITH the node hosting the HA enabled VM.
 
:) Thanks, Dietmar.

I disabled its port on the switch but the HA test didn't work. The cluster saw the node drop away but nothing else happened. The HA KVM did not restart on another node. The fencing also did not shut off the test node.

Which log file should I be watching for HA errors and debug information? Is it the cman service that executes the fence_node command and from which node?
 
The iDRAC are out-of-band with their own network connection to an isolated VLAN. The cluster is running on a 10g switch. I can do the PDU power kill with our APC switches but that will take a little more time to authorize and setup.
 
We are using APC PDU.
I tested fencing by pulling the two bonded connections from one server.
After a few seconds that server gets power cycled.

Your out of band iDRAC should work the same.

You do have at least three nodes in the cluster?
With only two there is no quorum thus no fencing.

What is the output of clustat?
 
Thanks for your help, e100.

I got the fencing working correctly by installing the apcid service on the nodes. I'll update the Wiki with my steps.

Dietmar, any particular reason apcid is not part of the bare-metal installer?
 
It looks like the fence_ipmilan call to the Dell DRAC is sending a "reset system" (warm boot) command. Once I installed the acpid daemon, the nodes noticed the shutdown event and gracefully synced just before reboot. This is more ideal for me than having the APC PDU STONITH. This way I can keep the PDU power cycle a manual process and I don't have to bother with special network access to the Colo Datacenter's PDU.
 
It looks like the fence_ipmilan call to the Dell DRAC is sending a "reset system" (warm boot) command. Once I installed the acpid daemon

No, that is not how you want it to work.

Quote from here: http://pve.proxmox.com/wiki/Fencing
If you use integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing - here are the different options:

  • make sure that you did not installed acpid (remove with: aptitude remove acpid)
  • disable ACPI soft-off in the bios
  • disable via acpi=off to the kernel boot command line
In any case, you need to make sure that the node turns off immediately when fenced. If you have delays here, the HA resources cannot be moved.
 
Thanks e100. I didn't connect those instructions with what was happening. Now it makes sense. My first tests were causing too much delay in the shutdown action. I ran "apt-get purge acpid" on my nodes. The fencing and guest HA still works as advertised.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!