VMs migrated back to node in maintenance mode

BretO

Member
Nov 8, 2023
2
0
6
I am doing some testing on my three nodes in my cluster. I put node 1 into maintenance mode, the VMs migrated over to the other nodes, I then shut down node 1. When I powered it back on the VMs migrated back to node 1 even though it is still in maintenance mode. After several minutes the VMs then migrated over to the other nodes again. I guess it finally realized node 1 is in maintenance and shouldn't have those VMs yet?

I'm guessing that as soon as the HA manager saw node 1 back online it migrated those then the Maintenance process kicked in and moved them off?

Is this the correct behavior or should it realize maintenance mode is still on before tying to migrate those back to node 1?
I tested it again and so far it only did this the one time.

All nodes are on version 8.4.14.

Node 1 shows in the UI and on CLI that that it is still in maintenance mode.

root@prox03:~# ha-manager status
quorum OK
master prox03 (active, Thu Nov 6 09:18:12 2025)
lrm prox01 (maintenance mode, Thu Nov 6 09:18:08 2025)
lrm prox02 (active, Thu Nov 6 09:18:11 2025)
lrm prox03 (active, Thu Nov 6 09:18:03 2025)
 
Hi,
yes, that sounds like there might be a race window during booting where the maintenance mode was not correctly honored. Could you provide the system journal from all three nodes from around the time of the issue? (prox01 and prox03 should also be enough if prox03 was the master at the time).
 
Sorry just getting back to this... had unexpected stuff come up..... Are you referring to the System Logs? If so I'll try to narrow down that data.. if not I'm about to do some more maintenance so i'll see if it happens again.
 
Are you referring to the System Logs?
Yes. In particular the journal which includes the logs from the HA services (accessed with journalctl).
 
Same situation. I put node into maintenace mode with command ha-manager crm-command node-maintenance enable <node>
and tryied to drain node
Jan 31 22:26:17 pvesrv3 pvedaemon[1281]: <root@pam> starting task UPID:pvesrv3:000A992A:039A1562:697E5759:migrateall::root@pam:
Jan 31 22:26:17 pvesrv3 pvedaemon[694570]: <root@pam> starting task UPID:pvesrv3:000A992B:039A1565:697E5759:hamigrate:101:root@pam:
Jan 31 22:26:20 pvesrv3 pvedaemon[694570]: <root@pam> starting task UPID:pvesrv3:000A997B:039A1694:697E575C:hamigrate:105:root@pam:
Jan 31 22:26:23 pvesrv3 pvedaemon[694570]: <root@pam> starting task UPID:pvesrv3:000A99A7:039A17C2:697E575F:hamigrate:123:root@pam:
Jan 31 22:26:24 pvesrv3 pvedaemon[694695]: command 'ha-manager migrate vm:123 pvesrv1' failed: exit code 2
Jan 31 22:26:26 pvesrv3 pvedaemon[1281]: <root@pam> end task UPID:pvesrv3:000A992A:039A1562:697E5759:migrateall::root@pam: OK
Jan 31 22:26:29 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:26:44 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:26:59 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:27:14 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:27:29 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:27:44 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:27:59 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:28:14 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:28:28 pvesrv3 pvedaemon[1281]: <root@pam> starting task UPID:pvesrv3:000A9E45:039A488D:697E57DC:migrateall::root@pam:
Jan 31 22:28:28 pvesrv3 pvedaemon[695877]: <root@pam> starting task UPID:pvesrv3:000A9E46:039A4890:697E57DC:hamigrate:101:root@pam:
Jan 31 22:28:29 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:28:31 pvesrv3 pvedaemon[695877]: <root@pam> starting task UPID:pvesrv3:000A9E98:039A49BF:697E57DF:hamigrate:105:root@pam:
Jan 31 22:28:34 pvesrv3 pvedaemon[695877]: <root@pam> starting task UPID:pvesrv3:000A9EC2:039A4AEE:697E57E2:hamigrate:123:root@pam:
Jan 31 22:28:37 pvesrv3 pvedaemon[1281]: <root@pam> end task UPID:pvesrv3:000A9E45:039A488D:697E57DC:migrateall::root@pam: OK
Jan 31 22:28:44 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:28:59 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:29:14 pvesrv3 pmxcfs[1085]: [status] notice: received log
Jan 31 22:29:29 pvesrv3 pmxcfs[1085]: [status] notice: received log
but nothing happen, the only error because affiniti rule. after disabling HA rules nothing changed: all vms are still on the node
 
Hi!

Could you also send the output for journalctl -u pve-ha-crm -u pve-ha-lrm --since '2026-01-31 22:25:00' --until '2026-01-31 22:30:00' on the involved nodes (especially the HA Manager)? This shows explicitly when the node maintenance mode is enabled/disabled and which migrations are done from and why.

Additionally, is there any other external program issuing the migrateall commands?
 
Sorry but node was decommisioned already. all "migrate all" commands were pushed trough webUI, there was not any externall aps to do so.
Any way workaround is: disable maintaince mode, disable all HA rules, migrate vms and turn maintaince mode again.