HA relocation algorithm

cwells

Member
Jun 8, 2009
33
0
6
I setup HA on a four-node cluster. On node1 I created 3 CT's and configured them for HA. I then stopped rgmanager on node1 to test relocation. All three CT's were migrated to node2. This seems suboptimal. Is there any way to cause them to be migrated round-robin (first CT to node2, second to node3, etc)? I'm imagining a scenario where a node fails, and all its CT's are migrated to a single node, causing it to be overloaded when there are plenty of resources available on other nodes.

Ideally, it would use SNMP or something to prefer the least-loaded nodes, but even if it were to simply round-robin the relocation, it would be better than simply relocating everything to a single node.

I found an old project, LBVM, that does actual load balancing for rgmanager, but it appears quite outdated and I don't know how it would integrate with PVE in any case.

If I'm willing to hack a bit, where would I start? I took a look at the scripts under /usr/share/cluster but I couldn't figure out how they are related. The pvevm script doesn't appear to be involved in the preferred node decision. Would this mean changing rgmanager itself, or can it be done using rgmanager event scripting?
 
So it appears that manually adding the CT's to a failover domain changes the behavior:

Code:
  <rm>
    <pvevm autostart="1" vmid="100" domain="cluster1-default"/>
    <pvevm autostart="1" vmid="101" domain="cluster1-default"/>
    <pvevm autostart="1" vmid="102" domain="cluster1-default"/>
    <failoverdomains>
      <failoverdomain name="cluster1-default" ordered="0" restricted="0" nofailback="1" recovery="relocate">
        <failoverdomainnode name="node1" priority="1"/>
        <failoverdomainnode name="node2" priority="1"/>
        <failoverdomainnode name="node3" priority="1"/>
        <failoverdomainnode name="node4" priority="1"/>
      </failoverdomain>
    </failoverdomains>
  </rm>

With this configuration, stopping rgmanager on node1 results in two of the CT's ending up on node4 and one on node2 (I'm assuming a simple random selection). This is somewhat better than the previous case. Still, from what I've read, this should be the default behavior (unordered, unrestricted), so it's not clear why explicitly creating a failover domain is needed. Anyone care to clarify?
 
It doesn't help in any case. Once I added the above stanza, I can no longer add any new CT's with HA. The UI gives "unknown error 500" when I try to commit. Without the failover domains, it works.
 
see 'man rgmanager' (section failover domains)

I've read this til my eyes bleed. It answers none of my questions:

1) what is the behavior when there is no failover domain (e.g. the default)? I would assume it's the same as unordered+unrestricted, but it doesn't say.

2) what is the behavior with unordered+unrestricted with regards to node preference during relocation? The man page doesn't suggest anything, but other versions (online) suggest that the services will be randomly relocated. However, this doesn't appear to be the actual case (or else their random number generator is very bad). Instead all services migrate en masse to one unfortunate node.

In any case, adding a failover domain breaks the PVE GUI (can't create new HA CT's), so it's academic.
 
In my case it is working ok. This is my relevant section in cluster.conf:
<rm>
<service autostart="1" domain="openVZ1" exclusive="0" max_restarts="1" name="sharedOpenVZ1" recovery="relocate">
<script file="/etc/init.d/openvzMount" name="openvzMount" />
</service>


<failoverdomains>
<failoverdomain name="openVZ1" nofailback="0" ordered="1" restricted="1">
<failoverdomainnode name="khnum" priority="1"/>
<failoverdomainnode name="bastet" priority="9"/>
</failoverdomain>
<failoverdomain name="sharedKVM" nofailback="1" ordered="1" restricted="1">
<failoverdomainnode name="heket" priority="1"/>
<failoverdomainnode name="bastet" priority="2"/>
</failoverdomain>


</failoverdomains>
<pvevm autostart="1" domain="sharedKVM" vmid="102"/>

</rm>
And openvzMount:
#! /bin/sh
# /etc/init.d/openvzMount
#


# Some things that run always
touch /var/lock/openvzMount
# Carry out specific functions when asked to by the system
case "$1" in
start)
echo "Starting openvz Storage"
mount -o _netdev,nobh,barrier=0 /dev/mapper/proxmox_shared-OpenVZStorage /mnt/OpenVZStorage
myname=`hostname`
pvesm set sharedOpenVZ -nodes $myname
;;
stop)
echo "Stoping openvz Storage"
#pvesm set sharedOpenVZ -nodes ""
umount -f /dev/mapper/proxmox_shared-OpenVZStorage
;;
status)
mountpoint /mnt/OpenVZStorage/
if [ $? -eq 0 ] ; then
exit 0
else
exit 1
fi
;;
*)
echo "Incorrect Parameters"
exit 1
;;
esac
exit 0


This migrates shared partition for openvz from one machine to another.
 
In my case it is working ok. This is my relevant section in cluster.conf:
And openvzMount:

This migrates shared partition for openvz from one machine to another.

Thanks for the reply. I'm not seeing any important differences between your failover specification and mine (aside from the fact I only have one domain). I assume you are able to create a new CT and set it to HA from the web UI? That's what breaks for me. I suppose it doesn't matter much since once you setup a failover domain the config must be managed manually in any case.

My biggest issue is that the load distribution algorithm for migrated CT's appears to be very poor. I'll ask over on the redhat cluster list since this seems more related to rgmanager itself than PVE.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!