Questions about the dynamic CRS

Cookiefamily · Apr 30, 2026

Hello,

I noticed that a dynamic mode was introduced to the CRS (yay! Been waiting for this for so long and it really comes in handy now with us planning our migration away from VMware, thank you so much Team!!!).
I enabled it on all my testing environments and it seemed to work pretty well.

There are two modes the CRS can run in, TOPSIS and "Brute Force" with the latter one being the default. What are the differences in practice between the two modes? Are there scenarios where you should choose one over the other?

In my small test clusters it distributed the load really well, but they are under almost no load CPU and memory wise so I couldn't really try some scenarios yet where ProxLB from credativ would fail in our production environment.
The issue was VMs with big "imbalances" of a lot of RAM and little CPU and vice versa.
What metrics does the CRS take into account? Both memory and CPU? How does it weigh between those (or does it do any weighting at all)?

dakralex · Apr 30, 2026

Hi!

Thanks for the feedback!

Cookiefamily said:
The issue was VMs with big "imbalances" of a lot of RAM and little CPU and vice versa.
What metrics does the CRS take into account? Both memory and CPU? How does it weigh between those (or does it do any weighting at all)?

The load balancer takes both memory and CPU in account. Ad weighing, see the next paragraphs.

Cookiefamily said:
There are two modes the CRS can run in, TOPSIS and "Brute Force" with the latter one being the default. What are the differences in practice between the two modes? Are there scenarios where you should choose one over the other?

The load balancer can score the balancing migrations by either one of these methods.

The brute-force method (as in 'greedily find the best balancing migration') does currently weigh average CPU load and memory usage as equal. Though it might be weighed differently in the future as well, it is a well-balanced starting point as both resources (CPU and memory) can cause pressure and therefore degradation in resource utilization over time.

The TOPSIS method does weigh memory as more important than the CPU load: a 5:1 ratio for average CPU/memory usage and a 10:5 ratio for CPU/memory peaks to signify that memory is a truly limited resource while high CPU pressure does 'only' degrade the processing time. This is already the method we used for scoring nodes to start new HA resources (if rebalance-on-start is enabled).

The TOPSIS method might be helpful for more memory-bound applications, though as for many applications an equal balance for both resources is useful as well as cpu pressure often being the more common problem, the brute force method is the current default.

Hope this helps!

PS: There is a patch series in review, which overhauls the CRS section itself and adds documentation for the new load balancing system here [0]. External feedback on these patches are also very welcome if things could be made clearer or certain things should be elaborated on more!

[0] https://lore.proxmox.com/pve-devel/20260415091635.162224-20-d.kral@proxmox.com/

EllerholdAG · Apr 30, 2026

Using the GUI to set the HA scheduling to "dynamic load" and checking the "Automatically rebalanceHA resources" leads to this error:

Code:

crs: invalid format - format error crs.ha: value 'dynamic' does not have a value in the enumeration 'basic, static' crs.ha-auto-rebalance: property is not defined in schema and the schema does not allow additional properties

This is on a PVE 9.1.9 (enterprise repo) server. `pve-ha-manager` is installed as version 5.1.3 though? The patch is for 5.2.0+

EllerholdAG · Apr 30, 2026

Also: the docs seem to be missing the difference between the dynamisch and the static load scheduler. And the differences between the "brute force" and "TOPSIS" method. Google AI gave me an answer, but idk if its correct.

There are more patches in the patch series - that explain these. Sorry!

Cookiefamily · Apr 30, 2026

@EllerholdAG my staging clusters are running no-subscription repos and there the 5.2.0 pve-ha-manager is already available. I have yet to roll it out on the enterprise ones.

jsterr · Apr 30, 2026

dakralex said:
Hi!

Thanks for the feedback!

The load balancer takes both memory and CPU in account. Ad weighing, see the next paragraphs.

The load balancer can score the balancing migrations by either one of these methods.

The brute-force method (as in 'greedily find the best balancing migration') does currently weigh average CPU load and memory usage as equal. Though it might be weighed differently in the future as well, it is a well-balanced starting point as both resources (CPU and memory) can cause pressure and therefore degradation in resource utilization over time.

The TOPSIS method does weigh memory as more important than the CPU load: a 5:1 ratio for average CPU/memory usage and a 10:5 ratio for CPU/memory peaks to signify that memory is a truly limited resource while high CPU pressure does 'only' degrade the processing time. This is already the method we used for scoring nodes to start new HA resources (if rebalance-on-start is enabled).

The TOPSIS method might be helpful for more memory-bound applications, though as for many applications an equal balance for both resources is useful as well as cpu pressure often being the more common problem, the brute force method is the current default.

Hope this helps!

PS: There is a patch series in review, which overhauls the CRS section itself and adds documentation for the new load balancing system here [0]. External feedback on these patches are also very welcome if things could be made clearer or certain things should be elaborated on more!

[0] https://lore.proxmox.com/pve-devel/20260415091635.162224-20-d.kral@proxmox.com/

Thanks for the insights! Have you thought about a exlude for containers, as these get automatically moved which causes a downtime?

Cookiefamily · Apr 30, 2026

@dakralex Thank you very much for the answer! That clears things up a lot.

As for the modes, I think we will just need to try the modes and see what happens. TOPSIS sounds best for our production clusters as they are usually memory limited.

One thing I would have as feedback is that it is way more "expensive" to do live migrations of VMs with vGPU resources as a migration halts them for extended periods of time (8gb takes ~6s for us, 24gb ~20s etc.). So ideally we would move those last as long as there are better options for shuffling VMs around.
For now I guess I can create an affinity group to "pin" them to one host with higher priority to not get them to balance except for in the case of host failures.

dakralex · May 4, 2026

EllerholdAG said:
This is on a PVE 9.1.9 (enterprise repo) server. `pve-ha-manager` is installed as version 5.1.3 though? The patch is for 5.2.0+

Yes, some recent security fixes for the pve-manager package forced us to ship the package, which already includes the load balancer options in the web interface, earlier to all repositories, while we're still waiting to move pve-ha-manager 5.2.0 and pve-cluster 9.1.2 to the enterprise repositories as well.

See this post [0] for a little more information.

[0] https://forum.proxmox.com/threads/183143/#post-850798

dakralex · May 4, 2026

jsterr said:
Thanks for the insights! Have you thought about a exlude for containers, as these get automatically moved which causes a downtime?

Yes, good idea! We already thought about this during development, though we were focusing on the core feature first. Feel free to create a Bugzilla entry [0] for this in the mean time, though as this is relatively trivial to implement and as you said causes downtime while moving and restarting the containers on the target host, this should be included relatively fast.

[0] https://bugzilla.proxmox.com/enter_bug.cgi?product=pve&component=HA

dakralex · May 4, 2026

Cookiefamily said:
As for the modes, I think we will just need to try the modes and see what happens. TOPSIS sounds best for our production clusters as they are usually memory limited.

One thing I would have as feedback is that it is way more "expensive" to do live migrations of VMs with vGPU resources as a migration halts them for extended periods of time (8gb takes ~6s for us, 24gb ~20s etc.). So ideally we would move those last as long as there are better options for shuffling VMs around.
For now I guess I can create an affinity group to "pin" them to one host with higher priority to not get them to balance except for in the case of host failures.

Thanks for the feedback!

We also thought about including more terms to the "cost function" of a migration, though we mainly focused on the core feature first.

The current load balancing implementation focuses on reducing the imbalance between the nodes as much as possible. The larger the imbalance, usually the more 'expensive' the balancing migrations will be (in terms of memory, etc. and therefore migration time) to minimize the total amount of migrations that are needed to rebalance the HA resources within the cluster. Without knowing the rest of the cluster, in these situations it might be the best option to move these 'heavy' HA resources first. Though it might be expensive if this becomes a transient state, where these HA resources are moved quite often. Does this occur for you?

jsterr · May 4, 2026

dakralex said:
Yes, good idea! We already thought about this during development, though we were focusing on the core feature first. Feel free to create a Bugzilla entry [0] for this in the mean time, though as this is relatively trivial to implement and as you said causes downtime while moving and restarting the containers on the target host, this should be included relatively fast.

[0] https://bugzilla.proxmox.com/enter_bug.cgi?product=pve&component=HA

Thanks! Done: https://bugzilla.proxmox.com/show_bug.cgi?id=7557

Cookiefamily · May 4, 2026

Hi Daniel, thanks for the reply!

dakralex said:
Though it might be expensive if this becomes a transient state, where these HA resources are moved quite often. Does this occur for you?

not right now, no. I was just thinking ahead to the future

Right now it is only active on two small 3-node test clusters, one of which has some Nvidia GPUs with NVAIE (yes, works the same as "normal" vGPUs, didn't experience any issues apart from the usual nvidia licensing hell that you experience everywhere). Those clusters and the load on them is pretty static so it is only migrating if we create some new VMs.
Over the next year we will migrate ~900 VMs and ~50 Hosts from VMware to Proxmox VE, the dynamic CRS will definitely be helpful in the bigger clusters and in clusters where K8s does automatic autoscaling of workers. I hope it is stable enough once we get into the larger migrations, currently we are still just testing, planning, adapting code.

But especially in bigger clusters imbalance might happen more often. There are currently some ways to tune the dynamic CRS with minimum imbalance improvement and threshold so it doesn't go too crazy and I read somewhere on one of the mailings that there are plans to do "better" statistics to filter out short peaks etc. which should also help.
Migrations are always "costly" in terms of performance, for normal VM migrations it isn't too big of a deal as long as it doesn't happen too often, freezes are sub 1s. For vGPUs the freeze when copying the vfio memory is just a lot longer so you might actually notice it.
I think the suggestion @jsterr made would also help for now, we could just not actively migrate the VMs that have GPUs attached.

One more question about the calculations done: I know the dynamic CRS only migrates resources managed by the HA Manager. But how does it calculate the host resource usage? Does it only take the ha managed resources into account or the "general" host CPU/Memory usage which also contains non-ha VMs or other processes like Ceph?

Thank you!

jsterr · May 7, 2026

dakralex said:
Hi!

Thanks for the feedback!

The load balancer takes both memory and CPU in account. Ad weighing, see the next paragraphs.

The load balancer can score the balancing migrations by either one of these methods.

The brute-force method (as in 'greedily find the best balancing migration') does currently weigh average CPU load and memory usage as equal. Though it might be weighed differently in the future as well, it is a well-balanced starting point as both resources (CPU and memory) can cause pressure and therefore degradation in resource utilization over time.

The TOPSIS method does weigh memory as more important than the CPU load: a 5:1 ratio for average CPU/memory usage and a 10:5 ratio for CPU/memory peaks to signify that memory is a truly limited resource while high CPU pressure does 'only' degrade the processing time. This is already the method we used for scoring nodes to start new HA resources (if rebalance-on-start is enabled).

The TOPSIS method might be helpful for more memory-bound applications, though as for many applications an equal balance for both resources is useful as well as cpu pressure often being the more common problem, the brute force method is the current default.

Hope this helps!

PS: There is a patch series in review, which overhauls the CRS section itself and adds documentation for the new load balancing system here [0]. External feedback on these patches are also very welcome if things could be made clearer or certain things should be elaborated on more!

[0] https://lore.proxmox.com/pve-devel/20260415091635.162224-20-d.kral@proxmox.com/

I might have missed it, but can you explain the difference between static and dynamic load-balancing regarding the new rebalancing feature? I cant find it in the mailing-list or current docs?

dakralex · May 7, 2026

Cookiefamily said:
One more question about the calculations done: I know the dynamic CRS only migrates resources managed by the HA Manager. But how does it calculate the host resource usage? Does it only take the ha managed resources into account or the "general" host CPU/Memory usage which also contains non-ha VMs or other processes like Ceph?

jsterr said:
I might have missed it, but can you explain the difference between static and dynamic load-balancing regarding the new rebalancing feature? I cant find it in the mailing-list or current docs?

I'll answer both questions in one, because they're quite related: The scheduling mode (basic, static-load, dynamic-load) specifies which measure is used to compare nodes and HA resources against each other:

For the basic mode, it's simply how many running guests are on the nodes.
For the static-load mode, it's how much CPU and memory the running guests have allocated statically (i.e., the quotas set in the guest configs with cores, sockets, memory, etc.).
For the dynamic-load mode, it additionally accounts for the actual current CPU and memory usage of each running guest and the node itself.

This scheduling mode is used for the load balancer, as well as other HA-related scheduling points, such as initial placement rebalancing (while starting HA resources), HA resource recovery, maintenance mode, etc.

The automatic rebalancing system can only be used in conjunction with the static-load and dynamic-load mode. So with the static-load mode, the system will actively rebalance according to the quotas set in the config (which might also change while guests are running if CPU/memory hotplugging is enabled). While with the dynamic-load mode, the system actively rebalances according to the current actual usage.

The rebalancing migrations are chosen by picking from the pool of migratable HA resources with their possible targets and predicting which will lower the imbalance the most in the forseeable future. In dynamic-load mode, the node usages currently also include the CPU/memory usages for Ceph, ZFS, etc., to accomodate these usage requirements as well.

Search

Search

Questions about the dynamic CRS

Cookiefamily

Renowned Member

dakralex

Proxmox Staff Member

EllerholdAG

Member

EllerholdAG

Member

Cookiefamily

Renowned Member

jsterr

Famous Member

Cookiefamily

Renowned Member

dakralex

Proxmox Staff Member

dakralex

Proxmox Staff Member

dakralex

Proxmox Staff Member

jsterr

Famous Member

Cookiefamily

Renowned Member

jsterr

Famous Member

dakralex

Proxmox Staff Member

We value your privacy