[HA] Cluster Resource Scheduling - filling in the missing pieces (from the docs)

esi_y

Active Member
Nov 29, 2023
796
105
43
I have read the docs on HA top to bottom and back to top:
https://pve.proxmox.com/wiki/High_Availability

I can't get my head around the resource scheduling, because:

"The cluster resource scheduler (CRS) mode controls how HA selects nodes for the recovery of a service as well as for migrations that are triggered by a shutdown policy."

Then for both modes it goes on to say:

"Non-HA-managed services are currently not counted."

There's some further details mentioned on the (tech preview) Static-Load Scheduler in terms of CPU and RAM weighing, but even then the only CRS points are recovery, config change and service started (from a stopped state).

So do I understand this correctly that there's:
1. no config options whatsoever to have running services auto-migrated in order to avoid uneven load (say for a set threshold) on HA group of nodes?
2. a service would be only migrated in case e.g. OOM killer takes it down, but not before?
3. after a node-down recovery, none of the running services will be migrated to it unless they fail on the other nodes in the same priority group?
4. the Basic Scheduler is good for nothing but a set of homogenous services on resource-equivalent nodes?
5. none of these are reliable in case there are also non-HA services on any of the nodes used for HA-services?

And finally:

6 Can I prevent non-HA services be started on HA-group of nodes? Or shouldn't the docs then advise on having a none-or-all services as HA approach on a select group of nodes?

Obviously I do not really want to be confirmed on (all of) the above, but if I am wrong, could you advise how to address the said issue?
 
Hello

I think you got it mostly right, however, it sounds like you misunderstand what HA is for. HA currently is only for migrating a service to another node if it detects that this node is not operational anymore (usually because it can not be reached). In that case, HA is looking for the best node to migrate the service, but other than that it does not do any load scheduling.

There should be no problem running HA and none HA VMs and containers on the same node. In case of failure, the HA one will be migrated and the non HA one will be not. Also, while resources the HA services need have to be available on all HA nodes, the nodes do not have to be resource-equivalent.

6 Can I prevent non-HA services be started on HA-group of nodes? Or shouldn't the docs then advise on having a none-or-all services as HA approach on a select group of nodes?
I am not sure what you mean with that. Non HA services will not be migrated or moved automatically to another node. If you don't want them to start on a specific node, just don't migrate them there or start them (or set the 'start on boot' option).
 

Thanks for quick reply!

I think you got it mostly right, however, it sounds like you misunderstand what HA is for.

I see where you are coming from, but I originally named my thread CRS, then realised there's no dedicated docs on that and it's all shoved under HA, so followed within that logic. CRS should not HA-specific, in fact I would like to see it expand for non-HA scenarios, which I understand currently is not possible.

There should be no problem running HA and none HA VMs and containers on the same node. In case of failure, the HA one will be migrated and the non HA one will be not.

I understand they will run alongside, but since e.g. the scheduler does not take into account what non-HA is already running on the node, it's not really deterministic once there are non-HA guests there. *

Also, while resources the HA services need have to be available on all HA nodes, the nodes do not have to be resource-equivalent.

But with Basic Scheduler, this will get me into problems sooner or later should one of the nodes be substantially underbudgeted, correct?

I am not sure what you mean with that. Non HA services will not be migrated or moved automatically to another node. If you don't want them to start on a specific node, just don't migrate them there or start them (or set the 'start on boot' option).

I mostly meant, given the above*, can I set a node not to allow non-HA guests to even launch there as I want to have some control over the load distribution and only the HA services are subject to some scheduler.
 
Hi,
I see where you are coming from, but I originally named my thread CRS, then realised there's no dedicated docs on that and it's all shoved under HA, so followed within that logic. CRS should not HA-specific, in fact I would like to see it expand for non-HA scenarios, which I understand currently is not possible.

I understand they will run alongside, but since e.g. the scheduler does not take into account what non-HA is already running on the node, it's not really deterministic once there are non-HA guests there. *
this is planned: https://pve.proxmox.com/wiki/Roadmap#Roadmap

  • Cluster Resource Scheduling ImprovementsShort/Mid-Term:
    • Re-balance service on fresh start up (request-stop to request-start configuration change) released with Proxmox VE 7.4
    • Account for non-HA virtual guests
    Mid/Long-Term:
    • Add Dynamic-Load scheduling mode
    • Add option to schedule non-HA virtual guests too
 
  • Like
Reactions: esi_y
is there any plan to add to the documentation so that the new entries for CRS in 8.1.4 are explained?
 
oh, i see what happened. I thought "Default(basic) and "Basic(resource count)" were two different options...

Sorry, still learning proxmox. This is one of those interface nuances that i will get familiar with over time with using it. When i see default notations in other apps, its usually in parenthesis next to the option on the right. Just a UI thing.
 
I have a more technical question regarding resource scheduling. Does non-basic one ignore ZFS arc cache when it comes to memory weight? Seems like cache should be completely ignored as it can be evicted.
 
I have a more technical question regarding resource scheduling. Does non-basic one ignore ZFS arc cache when it comes to memory weight? Seems like cache should be completely ignored as it can be evicted.
Static mode currently considers the total memory of a node and subtracts the memory of running HA guests on that node as a crude heuristic to score the node. So the ZFS arc cache does not play a role.

For dynamic resource scheduling, it is planned to use the much more accurate PSI (pressure stall information) instead. But it's necessary to rework how the information is sent to nodes first, because with Corosync/knet, messages are broadcast to all nodes, so to collect stats for n nodes, you'd have n*(n-1) messages.
 
  • Like
Reactions: PhantexTech

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!