Clarification related to HA Maintenance Mode/Affinity Rules

Libero_AT

New Member
Apr 20, 2026
2
0
1
We currently have a three node cluster, each node running Proxmox 9.x, with shared storage (over iSCSI). We are currently evaluating the behavior of HA-managed resources with Node/Resource affinity rules when Maintenance Mode is enabled.

We have failback enabled on every VM that is registered in the HA Manager. This would ensure that whenever a higher priority node becomes active, the VM will migrate (back) to it.

Our intention here is to be able to understand the underlying logic of the HA Manager in the following scenarios.

Scenario 1:
We have two VMs running on Node#3. We have added the two VMs to a "Keep together" resource affinity rule.
I now place Node #3 in maintenance mode. As expected, the VMs migrate together to the node with the least HA-managed running VMs (the crs mode is basic); let's say this was Node#2.
However, when maintenance mode was disabled on Node#3, the VMs did not migrate back, but instead stayed on Node#2.
Isn't this against the premise of the maintenance mode, which state that the VMs would return back to the node they were originally running on?

Scenario 2:
Again, consider the same VM's running on Node#3
The two VMs are added to a Node affinity rule (Node #2 and Node #3, equal priority, and Node#1 not included), but it is not strict.
The two VMs are also added to a "Keep separate" resource affinity rule. At this stage, both the VMs initially migrate, one each to Nodes #1 and #2, but since Node #3 has higher priority, one of the VMs migrates back from Node #1 to #3.
I now place Node #3 in maintenance mode. One would expect the VM running on Node#3 to migrate to Node#1, as per the premise of the maintenance mode.
Instead, the VM continues running on Node#3 (which is under maintenance mode). There are no existing HA-managed VMs on Node#1.

Could you explain the behavior in the above two cases?
 
Welcome to the Proxmox forum, Libero_AT!

For scenario 1, it seems like that the current HA stack gives more priority that the resource affinity rule holds than whether it should migrate back to its maintenance node.

For scenario 2, I assume that the negative resource affinity rule (keep separate) was added a little before the node affinity rule, which made the VMs migrate to node1 and node2 before and only when the HA Manager read the node affinity rule migrated one of these VMs to node3 to comply with the node affinity rule. At least when I add both rules at the same time, there's only one migration for one VM to node2.

Nonetheless, for scenario 2 it seems like the HA Manager again prioritizes the affinity rules more strongly than whether the HA resource is on a maintenance node.

I will look into both!

Would be great to have a Bugzilla entry for this to track it independently, which can be created here [0].

[0] https://bugzilla.proxmox.com/enter_bug.cgi?product=pve&component=HA
 
Hi Daniel, thank you for replying!

For scenario 2, I assume that the negative resource affinity rule (keep separate) was added a little before the node affinity rule, which made the VMs migrate to node1 and node2 before and only when the HA Manager read the node affinity rule migrated one of these VMs to node3 to comply with the node affinity rule.
You are right; we added the node affinity rule after the resource affinity rule in Scenario 2. My bad.

We would like to be able to concretely predict the migration of HA resources when a migration trigger is applied.
Hence, we are looking into if there's a simplified way of looking at the dynamics between the different aspects of the HA stack.

For instance, is it possible to definitively rank the following in an order of precedence?
  • Resource affinity rules/Strict node affinity rules
  • Non-strict node affinity rules
  • Maintenance mode/Failback
  • Cluster resource scheduler (rebalancing resources across nodes based on number of HA-managed active resources)

Thank you for your time.