setup HA - needed steps

thp

New Member
Nov 12, 2013
8
0
1
Hi,
I have 4 proxmox servers, cluster is created and it works fine. Now I want activate HA. What would be the right order of the steps? I would like to avoid any downtime and if possible any reboot.

I found follow site about this:
https://pve.proxmox.com/wiki/Fencing

* on every host activate fencing
- /etc/default/redhat-cluster-pve set FENCE_JOIN="yes"
- /etc/init.d/cman restart
- fence_tool join

* add fence devices
- cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new
- we use ipmi, I add for every host the ipmi device like described in https://pve.proxmox.com/wiki/Fencing#List_of_supported_fence_devices
- check the new config ccs_config_validate -v -f /etc/pve/cluster.conf.new
- if all is fine, activate this new config via broswer interface

* rgmanager
- /etc/init.d/rgmanager start
- update-rc.d rgmanager defaults

Would this be the right order? I tested it on our playground proxmox env yesterday, it worked fine, but for a productive system I would like ask here to avoid any needless mistakes :)

Best regards,
thomas.
 
IMHO, HA requires intense testing before you move into production.

I have good experiences with proxmox HA, it works really great. Only if there are network problems, it fails "unrepairable" (I did not find any solution to repair it.)), only a reinstallation of all systems was the solution.

As we started with proxmox ~ one year ago, I activated HA immediatly, my current setup was done without any HA, it runs since 2 month without any problems. Our network and storage setup is without any spof, from this side all is fine and tested.

Now I want activate HA again and I want be shure, that this steps have the right order.

I read the proxmox wiki and would like know, have the steps from this site the right order. We have a basic subscription for this four systems since last month, if it is really needed, I will ask my question over this way :))
 
some remarks, perhaps there will be a day and someone needs this information

- This steps are working without any downtime.
- fence_tool join was needless
- after /etc/init.d/cman restart it was (for me) needed to restart the service pve-cluster
- without any HA setup for a vm rgmanager will not start
- in my proxmox playground (but only 3 servers, same proxmox setup but without subscription)
Code:
<fencedevice agent="fence_ipmilan" ipaddr="1.1.1.1" lanplus="1" login="user" name="ipmi1" passwd="password" power_wait="10"/>
works fine but in productive cluster.conf I had to remove lanplus="1" (I found the solution here: http://forum.proxmox.com/threads/19928-Fence-Issues ) .
Perhaps someone from Proxmox can add this into the wiki.

error message:
root@node4:~# fence_node node1 -vv
fence node1 dev 0.0 agent fence_ipmilan result: error from agent
agent args: nodename=node1 agent=fence_ipmilan ipaddr=1.1.1.1 lanplus=1 login=user passwd=password power_wait=10
fence node1 failed

logfile:
Oct 22 15:50:48 node4 fence_ipmilan: Parse error: Ignoring unknown option 'nodename=node1
Oct 22 15:50:48 node4 fence_ipmilan: Failed: Unable to obtain correct plug status or plug is not available
Oct 22 15:50:48 node4 fence_node[529621]: fence node1 failed
 
I think this is a poor setup because the setup relays on that the server which is to be fenced both recognizes this and actually do fence itself. In a correct HA setup the fencing should be performed from a third party like a PDU or the switch.