Questions about setting up HA.

WhiteStarEOF

Active Member
Mar 6, 2012
96
10
28
Hey everyone. I've been browsing the docs for setting up a clustered environment with HA and I had a few questions. First off, I think it's absolutely awesome how easy it is to set up a cluster. It took longer to create a Windows VM to test live migration than it did to create the cluster.

The next step is HA, and that's where I get a little hazy. The docs say that I need a fencing device, and an example given was a $600 APC PDU. Are these devices actually required, or are people able to roll their own completely software solution?

If they are required, how do I shop for PDUs and UPSs with this feature? Fencing doesn't seem to be a searchable feature on Newegg or refurbups.

If a completely software option is viable, does anyone have any examples of how to do it that they can point me to?
 
Yes you need fencing, imagine if your HA VM using shared storage was running on two nodes at the same time. The obvious result is that your VMs filesystem will be corrupted beyond recognition.

I prefer a PDU for fencing. You can find them used on ebay from time to time, the really old APC MasterSwitch PDU works, I've posted the info in the wiki (not sure this old device is very secure so I would only use it on a protected network which your cluster should be on anyway)

Lastly, there are a number of ways to do fencing. If your server support IPMI, you could use that, you could also go the poor mans route and simply use a fence agent that disables all the ethernet ports in your managed switch for the dead node. While these solutions may work, I really suggest using a PDU.

If you do:
Code:
ls /usr/sbin/fence_*
You can see a list of all the pre-existing fence agents.
Then do, for example:
Code:
man fence_apc_snmp
to see how the agent works

Maybe you can find something that will work with what you have, if all else fails you can also write your own fence agent.
 
Huh. I guess I didn't know that fencing was supposed to manage the virtual machines inside of Proxmox. I thought it existed to detect whether a server in the cluster went down, and then notify the remaining servers that one had gone down so that they could pick up the VMs that server was running.

I'll see if I can pick up one of these devices, follow the docs, and see if that clears things up. Or at the very least, gives me some more intelligent questions to ask. :)
 
Based on your response maybe I did not explain this well, let me try again.

Fencing does not manage the VMs, when the cluster detects that a physical node failed, fencing is used to ensure the node is dead. Sometimes this is called STONITH (Shoot The Other Node In The Head)
By fencing (killing the dead node) the cluster can be ensured that the dead node is indeed dead.
Once fenced it is now safe to start the VMs that were on the dead node, on some other node.

Without fencing it is possible that the "dead" node is not actually dead, imagine, for example, that maybe the cluster software crashed on the "dead" node so it is not really dead but the cluster "thinks" it is dead.
The VMs are still running on the "dead" node, so if you started the VMs on some other node you end up causing problems.
With fencing the cluster would kill the "dead" node and now the cluster knows that "dead" node is indeed dead making it 100% safe to start the VMs somewhere else.
 
Ah, I get it. Machine 1 could possibly be dead, we could either waste time determining this for certain, or just kill it and get the VMs running elsewhere right away. Since we care about up time, killing it on the first sign of it being unresponsive is the way to go. Cool, that makes sense.

That does leave me wondering about shopping for these devices though. I noticed that the command you mentioned was fence_apc_snmp. Is it just any PDU that's capable of SNMP reporting likely able to do what we need it to do?
 
The fence agents are specific to particular makes/models.

Before purchasing I suggest searching to see if someone else has used that model as a fence device before.

I remember helping someone else in this forum who had some obscure pdu. He had to edit the fence agent to make it work. If you lack programming skils make sure whatever you purchase will work with no issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!