Clarification related to HA Maintenance Mode/Affinity Rules

Libero_AT

New Member
Apr 20, 2026
3
0
1
We currently have a three node cluster, each node running Proxmox 9.x, with shared storage (over iSCSI). We are currently evaluating the behavior of HA-managed resources with Node/Resource affinity rules when Maintenance Mode is enabled.

We have failback enabled on every VM that is registered in the HA Manager. This would ensure that whenever a higher priority node becomes active, the VM will migrate (back) to it.

Our intention here is to be able to understand the underlying logic of the HA Manager in the following scenarios.

Scenario 1:
We have two VMs running on Node#3. We have added the two VMs to a "Keep together" resource affinity rule.
I now place Node #3 in maintenance mode. As expected, the VMs migrate together to the node with the least HA-managed running VMs (the crs mode is basic); let's say this was Node#2.
However, when maintenance mode was disabled on Node#3, the VMs did not migrate back, but instead stayed on Node#2.
Isn't this against the premise of the maintenance mode, which state that the VMs would return back to the node they were originally running on?

Scenario 2:
Again, consider the same VM's running on Node#3
The two VMs are added to a Node affinity rule (Node #2 and Node #3, equal priority, and Node#1 not included), but it is not strict.
The two VMs are also added to a "Keep separate" resource affinity rule. At this stage, both the VMs initially migrate, one each to Nodes #1 and #2, but since Node #3 has higher priority, one of the VMs migrates back from Node #1 to #3.
I now place Node #3 in maintenance mode. One would expect the VM running on Node#3 to migrate to Node#1, as per the premise of the maintenance mode.
Instead, the VM continues running on Node#3 (which is under maintenance mode). There are no existing HA-managed VMs on Node#1.

Could you explain the behavior in the above two cases?
 
Welcome to the Proxmox forum, Libero_AT!

For scenario 1, it seems like that the current HA stack gives more priority that the resource affinity rule holds than whether it should migrate back to its maintenance node.

For scenario 2, I assume that the negative resource affinity rule (keep separate) was added a little before the node affinity rule, which made the VMs migrate to node1 and node2 before and only when the HA Manager read the node affinity rule migrated one of these VMs to node3 to comply with the node affinity rule. At least when I add both rules at the same time, there's only one migration for one VM to node2.

Nonetheless, for scenario 2 it seems like the HA Manager again prioritizes the affinity rules more strongly than whether the HA resource is on a maintenance node.

I will look into both!

Would be great to have a Bugzilla entry for this to track it independently, which can be created here [0].

[0] https://bugzilla.proxmox.com/enter_bug.cgi?product=pve&component=HA
 
Hi Daniel, thank you for replying!

For scenario 2, I assume that the negative resource affinity rule (keep separate) was added a little before the node affinity rule, which made the VMs migrate to node1 and node2 before and only when the HA Manager read the node affinity rule migrated one of these VMs to node3 to comply with the node affinity rule.
You are right; we added the node affinity rule after the resource affinity rule in Scenario 2. My bad.

We would like to be able to concretely predict the migration of HA resources when a migration trigger is applied.
Hence, we are looking into if there's a simplified way of looking at the dynamics between the different aspects of the HA stack.

For instance, is it possible to definitively rank the following in an order of precedence?
  • Resource affinity rules/Strict node affinity rules
  • Non-strict node affinity rules
  • Maintenance mode/Failback
  • Cluster resource scheduler (rebalancing resources across nodes based on number of HA-managed active resources)

Thank you for your time.
 
We would like to be able to concretely predict the migration of HA resources when a migration trigger is applied.
Hence, we are looking into if there's a simplified way of looking at the dynamics between the different aspects of the HA stack.

For instance, is it possible to definitively rank the following in an order of precedence?
In general, there shouldn't really be a precedence as all of those conditions should hold at the same time.

The rule verification system does dismiss many types of affinity rules, which cannot be determined to be resolvable at runtime, see [0].
There are still valid cases, which still could cause undesirable behavior to users, such as restricting an HA resource to only one node (cannot be recovered in case of failure or moved away from maintenance nodes) or negative resource affinity rules with as many HA resources as cluster nodes (a single failed or maintenance node would make the rule impossible to enforce). In these cases, admins should take great care that they think about the consequences these HA rules might have when nodes fail.

Nonetheless, the scenarios you described are bugs to me and I'm preparing a patch series to fix them and post here as soon as that's done.

Hope that clears things up, if you have any questions left, feel free to ask!

[0] https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_rule_conflicts
 
Hi Daniel, thanks for replying!

So, to conclude, the basic premise of maintenance mode, i.e.,
  • If maintenance mode is activated, any HA-registered VM must migrate out of it
  • If maintenance mode is disabled, the VMs that had migrated out of it, should migrate back to it*
will always be respected, irrespective of any affinity rules or other factors, as long as there are no conflicts?
*This is contingent to failback being set to true, right? In the case of failback set to false, we observe that the VMs do not migrate back.

In short, would you say the maintenance mode behavior is a hard constraint (like a strict affinity rule), as opposed to a soft constraint (like a non-strict node affinity rule, or the CRS)?