[HA] Cluster Resource Scheduling - filling in the missing pieces (from the docs)

esi_y · Jan 4, 2024

I have read the docs on HA top to bottom and back to top:
https://pve.proxmox.com/wiki/High_Availability

I can't get my head around the resource scheduling, because:

"The cluster resource scheduler (CRS) mode controls how HA selects nodes for the recovery of a service as well as for migrations that are triggered by a shutdown policy."

Then for both modes it goes on to say:

"Non-HA-managed services are currently not counted."

There's some further details mentioned on the (tech preview) Static-Load Scheduler in terms of CPU and RAM weighing, but even then the only CRS points are recovery, config change and service started (from a stopped state).

So do I understand this correctly that there's:
1. no config options whatsoever to have running services auto-migrated in order to avoid uneven load (say for a set threshold) on HA group of nodes?
2. a service would be only migrated in case e.g. OOM killer takes it down, but not before?
3. after a node-down recovery, none of the running services will be migrated to it unless they fail on the other nodes in the same priority group?
4. the Basic Scheduler is good for nothing but a set of homogenous services on resource-equivalent nodes?
5. none of these are reliable in case there are also non-HA services on any of the nodes used for HA-services?

And finally:

6 Can I prevent non-HA services be started on HA-group of nodes? Or shouldn't the docs then advise on having a none-or-all services as HA approach on a select group of nodes?

Obviously I do not really want to be confirmed on (all of) the above, but if I am wrong, could you advise how to address the said issue?

Philipp Hufnagl · Jan 4, 2024

Hello

I think you got it mostly right, however, it sounds like you misunderstand what HA is for. HA currently is only for migrating a service to another node if it detects that this node is not operational anymore (usually because it can not be reached). In that case, HA is looking for the best node to migrate the service, but other than that it does not do any load scheduling.

There should be no problem running HA and none HA VMs and containers on the same node. In case of failure, the HA one will be migrated and the non HA one will be not. Also, while resources the HA services need have to be available on all HA nodes, the nodes do not have to be resource-equivalent.

tempacc346235 said:
6 Can I prevent non-HA services be started on HA-group of nodes? Or shouldn't the docs then advise on having a none-or-all services as HA approach on a select group of nodes?

I am not sure what you mean with that. Non HA services will not be migrated or moved automatically to another node. If you don't want them to start on a specific node, just don't migrate them there or start them (or set the 'start on boot' option).

esi_y · Jan 4, 2024

Philipp Hufnagl said:
Hello

Thanks for quick reply!

Philipp Hufnagl said:
I think you got it mostly right, however, it sounds like you misunderstand what HA is for.

I see where you are coming from, but I originally named my thread CRS, then realised there's no dedicated docs on that and it's all shoved under HA, so followed within that logic. CRS should not HA-specific, in fact I would like to see it expand for non-HA scenarios, which I understand currently is not possible.

Philipp Hufnagl said:
There should be no problem running HA and none HA VMs and containers on the same node. In case of failure, the HA one will be migrated and the non HA one will be not.

I understand they will run alongside, but since e.g. the scheduler does not take into account what non-HA is already running on the node, it's not really deterministic once there are non-HA guests there. *

Philipp Hufnagl said:
Also, while resources the HA services need have to be available on all HA nodes, the nodes do not have to be resource-equivalent.

But with Basic Scheduler, this will get me into problems sooner or later should one of the nodes be substantially underbudgeted, correct?

Philipp Hufnagl said:
I am not sure what you mean with that. Non HA services will not be migrated or moved automatically to another node. If you don't want them to start on a specific node, just don't migrate them there or start them (or set the 'start on boot' option).

I mostly meant, given the above*, can I set a node not to allow non-HA guests to even launch there as I want to have some control over the load distribution and only the HA services are subject to some scheduler.

fiona · Jan 5, 2024

Hi,

tempacc346235 said:
I see where you are coming from, but I originally named my thread CRS, then realised there's no dedicated docs on that and it's all shoved under HA, so followed within that logic. CRS should not HA-specific, in fact I would like to see it expand for non-HA scenarios, which I understand currently is not possible.

I understand they will run alongside, but since e.g. the scheduler does not take into account what non-HA is already running on the node, it's not really deterministic once there are non-HA guests there. *

this is planned: https://pve.proxmox.com/wiki/Roadmap#Roadmap

Cluster Resource Scheduling ImprovementsShort/Mid-Term:

~~Re-balance service on fresh start up (request-stop to request-start configuration change)~~ released with Proxmox VE 7.4

Account for non-HA virtual guests

Mid/Long-Term:

Add Dynamic-Load scheduling mode

Add option to schedule non-HA virtual guests too

sunsmasher · Mar 3, 2024

is there any plan to add to the documentation so that the new entries for CRS in 8.1.4 are explained?

fiona · Mar 4, 2024

Hi,

sunsmasher said:
is there any plan to add to the documentation so that the new entries for CRS in 8.1.4 are explained?

sorry, I'm not sure what you mean. The currently available options are already described in the documentation: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_crs

sunsmasher · Mar 5, 2024

oh, i see what happened. I thought "Default(basic) and "Basic(resource count)" were two different options...

Sorry, still learning proxmox. This is one of those interface nuances that i will get familiar with over time with using it. When i see default notations in other apps, its usually in parenthesis next to the option on the right. Just a UI thing.

PhantexTech · Apr 12, 2024

I have a more technical question regarding resource scheduling. Does non-basic one ignore ZFS arc cache when it comes to memory weight? Seems like cache should be completely ignored as it can be evicted.

fiona · Apr 15, 2024

PhantexTech said:
I have a more technical question regarding resource scheduling. Does non-basic one ignore ZFS arc cache when it comes to memory weight? Seems like cache should be completely ignored as it can be evicted.

Static mode currently considers the total memory of a node and subtracts the memory of running HA guests on that node as a crude heuristic to score the node. So the ZFS arc cache does not play a role.

For dynamic resource scheduling, it is planned to use the much more accurate PSI (pressure stall information) instead. But it's necessary to rework how the information is sent to nodes first, because with Corosync/knet, messages are broadcast to all nodes, so to collect stats for n nodes, you'd have n*(n-1) messages.

Search

Search

[HA] Cluster Resource Scheduling - filling in the missing pieces (from the docs)

esi_y

Renowned Member

Philipp Hufnagl

Active Member

esi_y

Renowned Member

fiona

Proxmox Staff Member

sunsmasher

New Member

fiona

Proxmox Staff Member

sunsmasher

New Member

PhantexTech

Member

fiona

Proxmox Staff Member

We value your privacy