HA relocation algorithm

cwells · Mar 13, 2013

I setup HA on a four-node cluster. On node1 I created 3 CT's and configured them for HA. I then stopped rgmanager on node1 to test relocation. All three CT's were migrated to node2. This seems suboptimal. Is there any way to cause them to be migrated round-robin (first CT to node2, second to node3, etc)? I'm imagining a scenario where a node fails, and all its CT's are migrated to a single node, causing it to be overloaded when there are plenty of resources available on other nodes.

Ideally, it would use SNMP or something to prefer the least-loaded nodes, but even if it were to simply round-robin the relocation, it would be better than simply relocating everything to a single node.

I found an old project, LBVM, that does actual load balancing for rgmanager, but it appears quite outdated and I don't know how it would integrate with PVE in any case.

If I'm willing to hack a bit, where would I start? I took a look at the scripts under /usr/share/cluster but I couldn't figure out how they are related. The pvevm script doesn't appear to be involved in the preferred node decision. Would this mean changing rgmanager itself, or can it be done using rgmanager event scripting?

cwells · Mar 13, 2013

Also, if a solution is possible via event scripting, is this document accurate?

https://fedorahosted.org/cluster/wiki/EventScripting

It claims that central_processing = '1' needs to be set in cluster.conf and that the overall operation of rgmanager is affected.

cwells · Mar 13, 2013

So it appears that manually adding the CT's to a failover domain changes the behavior:

Code:

  <rm>
    <pvevm autostart="1" vmid="100" domain="cluster1-default"/>
    <pvevm autostart="1" vmid="101" domain="cluster1-default"/>
    <pvevm autostart="1" vmid="102" domain="cluster1-default"/>
    <failoverdomains>
      <failoverdomain name="cluster1-default" ordered="0" restricted="0" nofailback="1" recovery="relocate">
        <failoverdomainnode name="node1" priority="1"/>
        <failoverdomainnode name="node2" priority="1"/>
        <failoverdomainnode name="node3" priority="1"/>
        <failoverdomainnode name="node4" priority="1"/>
      </failoverdomain>
    </failoverdomains>
  </rm>

With this configuration, stopping rgmanager on node1 results in two of the CT's ending up on node4 and one on node2 (I'm assuming a simple random selection). This is somewhat better than the previous case. Still, from what I've read, this should be the default behavior (unordered, unrestricted), so it's not clear why explicitly creating a failover domain is needed. Anyone care to clarify?

dietmar · Mar 13, 2013

see 'man rgmanager' (section failover domains)

cwells · Mar 13, 2013

It doesn't help in any case. Once I added the above stanza, I can no longer add any new CT's with HA. The UI gives "unknown error 500" when I try to commit. Without the failover domains, it works.

cwells · Mar 13, 2013

dietmar said:
see 'man rgmanager' (section failover domains)

I've read this til my eyes bleed. It answers none of my questions:

1) what is the behavior when there is no failover domain (e.g. the default)? I would assume it's the same as unordered+unrestricted, but it doesn't say.

2) what is the behavior with unordered+unrestricted with regards to node preference during relocation? The man page doesn't suggest anything, but other versions (online) suggest that the services will be randomly relocated. However, this doesn't appear to be the actual case (or else their random number generator is very bad). Instead all services migrate en masse to one unfortunate node.

In any case, adding a failover domain breaks the PVE GUI (can't create new HA CT's), so it's academic.

amonra · Mar 13, 2013

In my case it is working ok. This is my relevant section in cluster.conf:

<rm>
<service autostart="1" domain="openVZ1" exclusive="0" max_restarts="1" name="sharedOpenVZ1" recovery="relocate">
<script file="/etc/init.d/openvzMount" name="openvzMount" />
</service>

<failoverdomains>
<failoverdomain name="openVZ1" nofailback="0" ordered="1" restricted="1">
<failoverdomainnode name="khnum" priority="1"/>
<failoverdomainnode name="bastet" priority="9"/>
</failoverdomain>
<failoverdomain name="sharedKVM" nofailback="1" ordered="1" restricted="1">
<failoverdomainnode name="heket" priority="1"/>
<failoverdomainnode name="bastet" priority="2"/>
</failoverdomain>

</failoverdomains>
<pvevm autostart="1" domain="sharedKVM" vmid="102"/>

</rm>

And openvzMount:

#! /bin/sh
# /etc/init.d/openvzMount
#

# Some things that run always
touch /var/lock/openvzMount
# Carry out specific functions when asked to by the system
case "$1" in
start)
echo "Starting openvz Storage"
mount -o _netdev,nobh,barrier=0 /dev/mapper/proxmox_shared-OpenVZStorage /mnt/OpenVZStorage
myname=`hostname`
pvesm set sharedOpenVZ -nodes $myname
;;
stop)
echo "Stoping openvz Storage"
#pvesm set sharedOpenVZ -nodes ""
umount -f /dev/mapper/proxmox_shared-OpenVZStorage
;;
status)
mountpoint /mnt/OpenVZStorage/
if [ $? -eq 0 ] ; then
exit 0
else
exit 1
fi
;;
*)
echo "Incorrect Parameters"
exit 1
;;
esac
exit 0

This migrates shared partition for openvz from one machine to another.

cwells · Mar 13, 2013

amonra said:
In my case it is working ok. This is my relevant section in cluster.conf:
And openvzMount:

This migrates shared partition for openvz from one machine to another.

Thanks for the reply. I'm not seeing any important differences between your failover specification and mine (aside from the fact I only have one domain). I assume you are able to create a new CT and set it to HA from the web UI? That's what breaks for me. I suppose it doesn't matter much since once you setup a failover domain the config must be managed manually in any case.

My biggest issue is that the load distribution algorithm for migrated CT's appears to be very poor. I'll ask over on the redhat cluster list since this seems more related to rgmanager itself than PVE.

Search

Search

HA relocation algorithm

cwells

Member

cwells

Member

cwells

Member

dietmar

Proxmox Staff Member

cwells

Member

cwells

Member

amonra

New Member

cwells

Member

We value your privacy