HA affinity problem (?)

FrancisS

Well-Known Member
Apr 26, 2019
47
7
48
60
Hello,

I have a problem with the HA affinity with PVE 9.1.2, I have two VMs with the resource affinity "Keep Together" and the node affinity "pve1".

When I put the "pve1" in maintenance, one VM migrate to "pve2" and the other to "pve3", not on the same node.

Best regards.
Francis
 
Hi Francis,

working as intended here with 9.1.2.
Both guests are migrated together to node2 when node1 is set to maintenance.

Here are my rules:

Code:
root@node1:~# cat /etc/pve/ha/rules.cfg
resource-affinity: ha-rule-91edaaa7-807b
        affinity positive
        resources vm:100,vm:102

node-affinity: ha-rule-e9b7994a-ecf1
        nodes node1
        resources vm:100,vm:102
        strict 0

If I disable my "Keep Together" rule then I get your behavior with migration to different nodes.

Is rule "enabled"?
Are correct vm ids in rule?

Are all referenced vm ids still HA resources?
You can create a rule and then delete one of HA resources that are referenced by the rule. That will "break" this rule. Re adding the deleted HA resource will not repair the affinity rule. The rule will stay "broken" with a warning sign in Web GUI.

BR
Marcus
 
Last edited:
  • Like
Reactions: waltar
Hello Marcus,

Thank you l have this

node-affinity: ha-rule-57390649-e233
nodes pve1
resources vm:102,vm:104,vm:110
strict 0

resource-affinity: ha-rule-b994d330-5ea0
affinity positive
resources vm:104,vm:110

Best regards.
Francis
 
Last edited:
Marcus,

# ha-manager status
quorum OK
master pve1 (active, Wed Dec 17 14:28:02 2025)
lrm pve1 (active, Wed Dec 17 14:28:03 2025)
lrm pve2 (active, Wed Dec 17 14:28:03 2025)
lrm pve3 (active, Wed Dec 17 14:28:01 2025)
service vm:102 (pve1, started)
service vm:104 (pve1, started)
service vm:110 (pve1, started)

# ha-mnt-on (script)

# ha-manager status
quorum OK
master pve1 (active, Wed Dec 17 14:32:33 2025)
lrm pve1 (maintenance mode, Wed Dec 17 14:32:31 2025)
lrm pve2 (active, Wed Dec 17 14:32:32 2025)
lrm pve3 (active, Wed Dec 17 14:32:31 2025)
service vm:102 (pve1, migrate)
service vm:104 (pve3, starting)
service vm:110 (pve2, started)

I have more VMs.

Best regards.
Francis
 
Hi Francis,

it is still working as expected with 3 guests here on my side.

I have node affinity rule "node1" for id 100, 101, 102 and 2 of them with positive affinity rule id 101,102. Config looks like yours but different ids and hostnames.

All 3 guests running on node1
node1 maintenance mode enable
migration actions:
vm100 -> node2
vm101+102 -> node3

So something strange is going on with your system.

BR
Marcus
 
Hi Marcus,

Probably you have the "chance" that vm101+102 migrate on node3 ???

Is there a way to debug HA "affinity" ?

Best regards.
Francis
 
Last edited:
Hi!

I have a problem with the HA affinity with PVE 9.1.2, I have two VMs with the resource affinity "Keep Together" and the node affinity "pve1".

When I put the "pve1" in maintenance, one VM migrate to "pve2" and the other to "pve3", not on the same node.
I have recreated your exact setup as described by the status output and rules config above and couldn't reproduce this either.

What should happen is that as soon as pve1 is put in maintenance mode, the vm:102 will select a new node (it will be pve2 here as it's empty), vm:104 will select another node (it will be pve3 as it's also empty and the HA CRM goes in a round-robin next-fit fashion with the basic scheduler), and as vm:110 will follow suit with vm:104 to pve3 as these are in a positive resource affinity rule.

As the node affinity rule is non-strict, it will fallback to {pve2, pve3} as the possible nodes for all three. If it were strict, all HA resources would stay on pve1 even though pve1 is in maintenance node.

Can you post the output of journalctl -u pve-ha-crm for the exact situation you posted above? That would help in investigating this issue. Here's a reference what is happening for my test setup:

Code:
adding new service 'vm:102' on node 'pve1'
adding new service 'vm:104' on node 'pve1'
adding new service 'vm:110' on node 'pve1'
service 'vm:102': state changed from 'request_start' to 'started'  (node = pve1)
service 'vm:104': state changed from 'request_start' to 'started'  (node = pve1)
service 'vm:110': state changed from 'request_start' to 'started'  (node = pve1)
status change wait_for_quorum => slave
status change wait_for_quorum => slave
node 'pve1': state changed from 'online' => 'maintenance'
migrate service 'vm:102' to node 'pve2' (running)
service 'vm:102': state changed from 'started' to 'migrate'  (node = pve1, target = pve2)
migrate service 'vm:104' to node 'pve3' (running)
service 'vm:104': state changed from 'started' to 'migrate'  (node = pve1, target = pve3)
migrate service 'vm:110' to node 'pve3' (running)
service 'vm:110': state changed from 'started' to 'migrate'  (node = pve1, target = pve3)
service 'vm:102': state changed from 'migrate' to 'started'  (node = pve2)
service 'vm:104': state changed from 'migrate' to 'started'  (node = pve3)
service 'vm:110': state changed from 'migrate' to 'started'  (node = pve3)
 
Hi Daniel,

Thank you.

Sorry I removed some lines in the logs, I have more than 3 VMs and the nodes (not the real names) are not empty.

I am buzy, I restart a test as soon as possible.

Best regards
Francis
 
Thanks!

If it's possible, it would be great to have a more complete reproducer for this to investigate the issue. The names can be changed, the only important part is that the changed names have the same alphabetical ordering (e.g. SN140 -> pve2, PVE003 -> pve1).
 
Hi Daniel,

Options "Cluster Resource Scheduling" = Default

Ha status with all VMs node renamed.
# ha-manager status
quorum OK
master pve1 (active, Thu Dec 18 12:22:25 2025)
lrm pve1 (active, Thu Dec 18 12:22:19 2025)
lrm pve2 (active, Thu Dec 18 12:22:17 2025)
lrm pve3 (active, Thu Dec 18 12:22:22 2025)
service vm:100 (pve3, started)
service vm:101 (pve2, started)
service vm:102 (pve1, started)
service vm:104 (pve1, started)
service vm:106 (pve2, started)
service vm:108 (pve3, started)
service vm:110 (pve1, started)
service vm:112 (pve3, started)
service vm:113 (pve2, started)

# journalctl -u pve-ha-crm | grep "Dec 18"
Dec 18 12:03:23 pve1 pve-ha-crm[2543]: node 'pve1': state changed from 'online' => 'maintenance'
Dec 18 12:03:23 pve1 pve-ha-crm[2543]: migrate service 'vm:102' to node 'pve2' (running)
Dec 18 12:03:23 pve1 pve-ha-crm[2543]: service 'vm:102': state changed from 'started' to 'migrate' (node = pve1, target = pve2)
Dec 18 12:03:23 pve1 pve-ha-crm[2543]: migrate service 'vm:104' to node 'pve3' (running)
Dec 18 12:03:23 pve1 pve-ha-crm[2543]: service 'vm:104': state changed from 'started' to 'migrate' (node = pve1, target = pve3)
Dec 18 12:03:23 pve1 pve-ha-crm[2543]: migrate service 'vm:110' to node 'pve2' (running)
Dec 18 12:03:23 pve1 pve-ha-crm[2543]: service 'vm:110': state changed from 'started' to 'migrate' (node = pve1, target = pve2)
Dec 18 12:04:04 pve1 pve-ha-crm[2543]: service 'vm:110': state changed from 'migrate' to 'started' (node = pve2)
Dec 18 12:07:04 pve1 pve-ha-crm[2543]: service 'vm:104': state changed from 'migrate' to 'started' (node = pve3)
Dec 18 12:07:44 pve1 pve-ha-crm[2543]: service 'vm:102': state changed from 'migrate' to 'started' (node = pve2)
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: node 'pve1': state changed from 'maintenance' => 'online'
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: moving service 'vm:102' back to 'pve1', node came back from maintenance.
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: migrate service 'vm:102' to node 'pve1' (running)
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: service 'vm:102': state changed from 'started' to 'migrate' (node = pve2, target = pve1)
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: moving service 'vm:104' back to 'pve1', node came back from maintenance.
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: migrate service 'vm:104' to node 'pve1' (running)
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: service 'vm:104': state changed from 'started' to 'migrate' (node = pve3, target = pve1)
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: moving service 'vm:110' back to 'pve1', node came back from maintenance.
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: migrate service 'vm:110' to node 'pve1' (running)
Dec 18 12:15:54 pve1 pve-ha-crm[2543]: service 'vm:110': state changed from 'started' to 'migrate' (node = pve2, target = pve1)
Dec 18 12:16:35 pve1 pve-ha-crm[2543]: service 'vm:110': state changed from 'migrate' to 'started' (node = pve1)
Dec 18 12:19:25 pve1 pve-ha-crm[2543]: service 'vm:104': state changed from 'migrate' to 'started' (node = pve1)
Dec 18 12:20:15 pve1 pve-ha-crm[2543]: service 'vm:102': state changed from 'migrate' to 'started' (node = pve1)

Best regards.
Francis