Migration failed and loop trying again

Nemesiz

Renowned Member
Jan 16, 2009
796
89
93
Lithuania
Hello,

I updated one node and wanted to reboot. In this node was running single VM at the time. HA started migration process but failed with:

Code:
task started by HA resource agent
2026-05-18 09:29:24 conntrack state migration not supported or disabled, active connections might get dropped
2026-05-18 09:29:25 use dedicated network address for sending migration traffic (10.10.8.5)
2026-05-18 09:29:25 starting migration of VM 100 to node 'nmz-cl-2' (10.10.8.5)
2026-05-18 09:29:25 starting VM 100 on remote node 'nmz-cl-2'
2026-05-18 09:29:26 [nmz-cl-2] Installed QEMU version '10.1.2' is too old to run machine type 'pc-i440fx-11.0+pve0', please upgrade node 'nmz-cl-2'
2026-05-18 09:29:26 ERROR: online migrate failure - remote command failed with exit code 255
2026-05-18 09:29:26 aborting phase 2 - cleanup resources
2026-05-18 09:29:26 migrate_cancel
2026-05-18 09:29:28 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems

I know node1 and node2 had different running versions of qemu but why HA did not tried to migrate to another node3 but loop migration to the same node2 ?
 
Hi!

In general, the cluster nodes should have the same package versions installed to function properly.

The HA Manager also has some basic assumptions, as is described in the relevant documentation section [0]. Specifically, if specific nodes cannot satisfy the dependency of a VM (here: the correct QEMU version), this constraint must be encoded in a node affinity, so that the HA Manager can acknowledge this.

Otherwise, the HA Manager should retry the migration to another node. Can you provide more information with the logs from journalctl -u pve-ha-crm -u pve-ha-lrm for all nodes in that time interval? Are there any affinity rules present?

[0] https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_resources
 
node1
Code:
May 18 09:28:13 nmz-cl-1 systemd[1]: Stopping pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
May 18 09:28:14 nmz-cl-1 pve-ha-lrm[754284]: received signal TERM
May 18 09:28:14 nmz-cl-1 pve-ha-lrm[754284]: got shutdown request with shutdown policy 'migrate'
May 18 09:28:14 nmz-cl-1 pve-ha-lrm[754284]: reboot LRM, doing maintenance, removing this node from active list
May 18 09:28:22 nmz-cl-1 pve-ha-lrm[754284]: status change active => maintenance
May 18 09:28:23 nmz-cl-1 pve-ha-lrm[763568]: <root@pam> starting task UPID:nmz-cl-1:000BA6B1:0211CB00:6A0ADBB7:qmigrate:100:root@pam:
May 18 09:28:28 nmz-cl-1 pve-ha-lrm[763568]: Task 'UPID:nmz-cl-1:000BA6B1:0211CB00:6A0ADBB7:qmigrate:100:root@pam:' still active, waiting
May 18 09:28:28 nmz-cl-1 pve-ha-lrm[763569]: migration problems
May 18 09:28:28 nmz-cl-1 pve-ha-lrm[763568]: <root@pam> end task UPID:nmz-cl-1:000BA6B1:0211CB00:6A0ADBB7:qmigrate:100:root@pam: migration problems
May 18 09:28:28 nmz-cl-1 pve-ha-lrm[763568]: service vm:100 not moved (migration error)
May 18 09:28:33 nmz-cl-1 pve-ha-lrm[763586]: <root@pam> starting task UPID:nmz-cl-1:000BA6C3:0211CEF2:6A0ADBC1:qmigrate:100:root@pam:
May 18 09:28:37 nmz-cl-1 pve-ha-lrm[763587]: migration problems
May 18 09:28:37 nmz-cl-1 pve-ha-lrm[763586]: <root@pam> end task UPID:nmz-cl-1:000BA6C3:0211CEF2:6A0ADBC1:qmigrate:100:root@pam: migration problems
May 18 09:28:37 nmz-cl-1 pve-ha-lrm[763586]: service vm:100 not moved (migration error)
May 18 09:28:43 nmz-cl-1 pve-ha-lrm[763605]: <root@pam> starting task UPID:nmz-cl-1:000BA6D6:0211D31A:6A0ADBCB:qmigrate:100:root@pam:
May 18 09:28:48 nmz-cl-1 pve-ha-lrm[763606]: migration problems
May 18 09:28:48 nmz-cl-1 pve-ha-lrm[763605]: <root@pam> end task UPID:nmz-cl-1:000BA6D6:0211D31A:6A0ADBCB:qmigrate:100:root@pam: migration problems
May 18 09:28:48 nmz-cl-1 pve-ha-lrm[763605]: service vm:100 not moved (migration error)
May 18 09:28:53 nmz-cl-1 pve-ha-lrm[763621]: <root@pam> starting task UPID:nmz-cl-1:000BA6E6:0211D6D9:6A0ADBD5:qmigrate:100:root@pam:
May 18 09:28:57 nmz-cl-1 pve-ha-lrm[763622]: migration problems
May 18 09:28:57 nmz-cl-1 pve-ha-lrm[763621]: <root@pam> end task UPID:nmz-cl-1:000BA6E6:0211D6D9:6A0ADBD5:qmigrate:100:root@pam: migration problems
May 18 09:28:57 nmz-cl-1 pve-ha-lrm[763621]: service vm:100 not moved (migration error)
May 18 09:29:03 nmz-cl-1 pve-ha-lrm[763639]: <root@pam> starting task UPID:nmz-cl-1:000BA6F8:0211DAF4:6A0ADBDF:qmigrate:100:root@pam:
May 18 09:29:08 nmz-cl-1 pve-ha-lrm[763640]: migration problems
May 18 09:29:08 nmz-cl-1 pve-ha-lrm[763639]: <root@pam> end task UPID:nmz-cl-1:000BA6F8:0211DAF4:6A0ADBDF:qmigrate:100:root@pam: migration problems
May 18 09:29:08 nmz-cl-1 pve-ha-lrm[763639]: service vm:100 not moved (migration error)
May 18 09:29:13 nmz-cl-1 pve-ha-lrm[763653]: <root@pam> starting task UPID:nmz-cl-1:000BA707:0211DEB4:6A0ADBE9:qmigrate:100:root@pam:
May 18 09:29:17 nmz-cl-1 pve-ha-lrm[763655]: migration problems
May 18 09:29:17 nmz-cl-1 pve-ha-lrm[763653]: <root@pam> end task UPID:nmz-cl-1:000BA707:0211DEB4:6A0ADBE9:qmigrate:100:root@pam: migration problems
May 18 09:29:17 nmz-cl-1 pve-ha-lrm[763653]: service vm:100 not moved (migration error)
May 18 09:29:24 nmz-cl-1 pve-ha-lrm[763673]: <root@pam> starting task UPID:nmz-cl-1:000BA71A:0211E2D4:6A0ADBF4:qmigrate:100:root@pam:
May 18 09:29:28 nmz-cl-1 pve-ha-lrm[763674]: migration problems
May 18 09:29:28 nmz-cl-1 pve-ha-lrm[763673]: <root@pam> end task UPID:nmz-cl-1:000BA71A:0211E2D4:6A0ADBF4:qmigrate:100:root@pam: migration problems
May 18 09:29:28 nmz-cl-1 pve-ha-lrm[763673]: service vm:100 not moved (migration error)
May 18 09:29:33 nmz-cl-1 pve-ha-lrm[763684]: <root@pam> starting task UPID:nmz-cl-1:000BA725:0211E68F:6A0ADBFD:qmigrate:100:root@pam:
May 18 09:29:38 nmz-cl-1 pve-ha-lrm[763685]: migration problems
May 18 09:29:38 nmz-cl-1 pve-ha-lrm[763684]: <root@pam> end task UPID:nmz-cl-1:000BA725:0211E68F:6A0ADBFD:qmigrate:100:root@pam: migration problems
May 18 09:29:38 nmz-cl-1 pve-ha-lrm[763684]: service vm:100 not moved (migration error)
May 18 09:29:43 nmz-cl-1 pve-ha-lrm[763702]: <root@pam> starting task UPID:nmz-cl-1:000BA737:0211EA4F:6A0ADC07:qmigrate:100:root@pam:
May 18 09:29:47 nmz-cl-1 pve-ha-lrm[763703]: migration problems
May 18 09:29:47 nmz-cl-1 pve-ha-lrm[763702]: <root@pam> end task UPID:nmz-cl-1:000BA737:0211EA4F:6A0ADC07:qmigrate:100:root@pam: migration problems
May 18 09:29:47 nmz-cl-1 pve-ha-lrm[763702]: service vm:100 not moved (migration error)
May 18 09:29:53 nmz-cl-1 pve-ha-lrm[763720]: <root@pam> starting task UPID:nmz-cl-1:000BA749:0211EE71:6A0ADC11:qmigrate:100:root@pam:
May 18 09:29:58 nmz-cl-1 pve-ha-lrm[763721]: migration problems
May 18 09:29:58 nmz-cl-1 pve-ha-lrm[763720]: <root@pam> end task UPID:nmz-cl-1:000BA749:0211EE71:6A0ADC11:qmigrate:100:root@pam: migration problems
May 18 09:29:58 nmz-cl-1 pve-ha-lrm[763720]: service vm:100 not moved (migration error)
May 18 09:30:03 nmz-cl-1 pve-ha-lrm[763737]: <root@pam> starting task UPID:nmz-cl-1:000BA75A:0211F22E:6A0ADC1B:qmigrate:100:root@pam:
May 18 09:30:04 nmz-cl-1 pve-ha-lrm[763738]: VM 100 qmp command failed - VM 100 qmp command 'query-backup' failed - client closed connection
May 18 09:30:04 nmz-cl-1 pve-ha-lrm[763738]: VM 100 qmp command failed - VM 100 not running
May 18 09:30:05 nmz-cl-1 pve-ha-lrm[763738]: failed to connect to DBus system bus: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus securi>
May 18 09:30:06 nmz-cl-1 pve-ha-lrm[763737]: <root@pam> end task UPID:nmz-cl-1:000BA75A:0211F22E:6A0ADC1B:qmigrate:100:root@pam: OK
May 18 09:30:13 nmz-cl-1 pve-ha-lrm[754284]: watchdog closed (disabled)
May 18 09:30:13 nmz-cl-1 pve-ha-lrm[754284]: server stopped
May 18 09:30:14 nmz-cl-1 systemd[1]: pve-ha-lrm.service: Deactivated successfully.
May 18 09:30:14 nmz-cl-1 systemd[1]: Stopped pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
May 18 09:30:14 nmz-cl-1 systemd[1]: pve-ha-lrm.service: Consumed 5.198s CPU time, 252.5M memory peak.
May 18 09:30:14 nmz-cl-1 systemd[1]: Stopping pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon...
May 18 09:30:15 nmz-cl-1 pve-ha-crm[754406]: received signal TERM
May 18 09:30:15 nmz-cl-1 pve-ha-crm[754406]: server received shutdown request
May 18 09:30:17 nmz-cl-1 pve-ha-crm[754406]: server stopped
May 18 09:30:18 nmz-cl-1 systemd[1]: pve-ha-crm.service: Deactivated successfully.
May 18 09:30:18 nmz-cl-1 systemd[1]: Stopped pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon.
May 18 09:30:18 nmz-cl-1 systemd[1]: pve-ha-crm.service: Consumed 2.616s CPU time, 234.5M memory peak.

node2
Code:
May 14 13:23:39 nmz-cl-2 pve-ha-crm[3921]: got crm command: migrate vm:100 nmz-cl-1
May 14 13:23:39 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-1'
May 14 13:23:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-2, target = nmz-cl-1)
May 14 13:23:46 nmz-cl-2 pve-ha-lrm[359278]: <root@pam> starting task UPID:nmz-cl-2:00057B72:0546250C:6A05CCE2:qmigrate:100:root@pam:
May 14 13:23:51 nmz-cl-2 pve-ha-lrm[359278]: Task 'UPID:nmz-cl-2:00057B72:0546250C:6A05CCE2:qmigrate:100:root@pam:' still active, waiting
May 14 13:23:56 nmz-cl-2 pve-ha-lrm[359278]: Task 'UPID:nmz-cl-2:00057B72:0546250C:6A05CCE2:qmigrate:100:root@pam:' still active, waiting
May 14 13:23:58 nmz-cl-2 pve-ha-lrm[359278]: <root@pam> end task UPID:nmz-cl-2:00057B72:0546250C:6A05CCE2:qmigrate:100:root@pam: OK
May 14 13:23:59 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:25:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'freeze'
May 18 09:25:49 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'freeze' to 'started'
May 18 09:28:19 nmz-cl-2 pve-ha-crm[3921]: node 'nmz-cl-1': state changed from 'online' => 'maintenance'
May 18 09:28:19 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:28:19 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:28:29 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:28:29 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:28:29 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:28:29 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:28:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:28:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:28:39 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:28:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:28:49 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:28:49 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:28:49 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:28:49 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:28:59 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:28:59 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:28:59 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:28:59 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:29:09 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:29:09 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:29:09 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:29:09 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:29:19 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:29:19 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:29:19 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:29:19 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:29:29 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:29:29 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:29:29 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:29:29 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:29:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:29:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:29:39 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:29:39 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:29:49 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:29:49 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:29:49 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:29:49 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:29:59 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100' - migration failed (exit code 1)
May 18 09:29:59 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-1)
May 18 09:29:59 nmz-cl-2 pve-ha-crm[3921]: migrate service 'vm:100' to node 'nmz-cl-2' (running)
May 18 09:29:59 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'started' to 'migrate'  (node = nmz-cl-1, target = nmz-cl-2)
May 18 09:30:09 nmz-cl-2 pve-ha-crm[3921]: service 'vm:100': state changed from 'migrate' to 'started'  (node = nmz-cl-2)
May 18 09:30:16 nmz-cl-2 pve-ha-lrm[947372]: starting service vm:100
May 18 09:30:16 nmz-cl-2 pve-ha-lrm[947372]: <root@pam> starting task UPID:nmz-cl-2:000E74B0:07401C52:6A0ADC28:qmstart:100:root@pam:
May 18 09:30:16 nmz-cl-2 pve-ha-lrm[947376]: start VM 100: UPID:nmz-cl-2:000E74B0:07401C52:6A0ADC28:qmstart:100:root@pam:
May 18 09:30:19 nmz-cl-2 pve-ha-lrm[947376]: VM 100 started with PID 947449.
May 18 09:30:19 nmz-cl-2 pve-ha-lrm[947372]: <root@pam> end task UPID:nmz-cl-2:000E74B0:07401C52:6A0ADC28:qmstart:100:root@pam: OK
May 18 09:30:19 nmz-cl-2 pve-ha-lrm[947372]: service status vm:100 started

node3 no log at all.

As of affinity rules: default created, all nodes, no priority, no strict.

p.s. VM migrated then I did shutdown from VM.