Hello guys,
I am reinstalling two Proxmox hypervisors to new hardware, so I need to migrate all VMs from old hypervisors. I have lowered HA priority (from 0 to 1) of given hosts and VMs started to get migrated out (as expected). Unfortunately single VM (ID 204) migration failed (unable to allocate enough memory, that completely OK and expected) into target hypervisor, but CRM daemon didn't tried to choose another hypervisor from cluster and was stucked in endless loop.
Migration task details was kind of expected (migration started, failed to start VM on target host, migration canceled):
VM is in HA group
VM was in
Took a look into CRM status file, where I cannot see anything wrong:
Here's some log from active CRM
As last chance I tried to stop leader CRM to enforce daemon re-election, but nothing really changed and new leader just continued in migration loop.
I am using older version of Proxmox
Do you have any clue what I might be doing wrong? I am pretty sure that this worked absolutely flawlessly before so this migration loop was quite surprise to me, but I am unable to find any misconfiguration.
I am reinstalling two Proxmox hypervisors to new hardware, so I need to migrate all VMs from old hypervisors. I have lowered HA priority (from 0 to 1) of given hosts and VMs started to get migrated out (as expected). Unfortunately single VM (ID 204) migration failed (unable to allocate enough memory, that completely OK and expected) into target hypervisor, but CRM daemon didn't tried to choose another hypervisor from cluster and was stucked in endless loop.
Migration task details was kind of expected (migration started, failed to start VM on target host, migration canceled):
Code:
task started by HA resource agent
2022-09-12 13:05:46 use dedicated network address for sending migration traffic (10.30.40.34)
2022-09-12 13:05:47 starting migration of VM 204 to node 'ovirt17' (10.30.40.34)
2022-09-12 13:05:47 starting VM 204 on remote node 'ovirt17'
2022-09-12 13:05:50 [ovirt17] kvm: cannot set up guest memory 'pc.ram': Cannot allocate memory
2022-09-12 13:05:51 [ovirt17] start failed: QEMU exited with code 1
2022-09-12 13:05:51 ERROR: online migrate failure - remote command failed with exit code 255
2022-09-12 13:05:51 aborting phase 2 - cleanup resources
2022-09-12 13:05:51 migrate_cancel
2022-09-12 13:05:53 ERROR: migration finished with problems (duration 00:00:07)
TASK ERROR: migration problems
VM is in HA group
cluster
which consists of every hypervisor in cluster, so there is plenty of room in whole cluster.
Code:
-> ha-manager config | grep vm:204 -A 4
vm:204
group cluster
max_relocate 2
state started
VM was in
migrate
state as expected
Code:
-> ha-manager status | grep 204
service vm:204 (ovirt2, migrate)
Took a look into CRM status file, where I cannot see anything wrong:
Code:
-> cat /etc/pve/ha/manager_status | jq '.service_status."vm:204"'
{
"target": "ovirt17",
"uid": "kZY7FM3JZ5u3yiFZxs5J0w",
"node": "ovirt8",
"state": "migrate"
}
Here's some log from active CRM
Code:
-> journalctl -u pve-ha-crm | cat
-- Logs begin at Sun 2022-09-11 21:22:10 CEST, end at Mon 2022-09-12 13:26:20 CEST. --
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:113' to node 'ovirt16' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:113': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt16)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:115' to node 'ovirt3' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:115': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt3)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:118' to node 'ovirt9' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:118': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt9)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:130' to node 'ovirt16' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:130': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt16)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:135' to node 'ovirt3' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:135': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt3)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:149' to node 'ovirt4' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:149': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt4)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:151' to node 'ovirt9' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:151': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt9)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:168' to node 'ovirt13' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:168': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt13)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:196' to node 'ovirt14' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:196': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt14)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:198' to node 'ovirt15' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:198': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt15)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:199' to node 'ovirt16' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:199': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt16)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:206' to node 'ovirt3' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:206': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt3)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:209' to node 'ovirt4' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:209': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt4)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:220' to node 'ovirt9' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:220': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt9)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:221' to node 'ovirt10' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:221': state changed from 'started' to 'migrate' (node = ovirt7, target = ovirt10)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: migrate service 'vm:224' to node 'ovirt11' (running)
Sep 12 12:50:01 ovirt15 pve-ha-crm[10330]: service 'vm:224': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt11)
Sep 12 12:50:21 ovirt15 pve-ha-crm[10330]: service 'vm:113': state changed from 'migrate' to 'started' (node = ovirt16)
Sep 12 12:50:21 ovirt15 pve-ha-crm[10330]: service 'vm:118': state changed from 'migrate' to 'started' (node = ovirt9)
Sep 12 12:50:33 ovirt15 pve-ha-crm[10330]: service 'vm:115': state changed from 'migrate' to 'started' (node = ovirt3)
Sep 12 12:50:33 ovirt15 pve-ha-crm[10330]: service 'vm:130': state changed from 'migrate' to 'started' (node = ovirt16)
Sep 12 12:50:33 ovirt15 pve-ha-crm[10330]: service 'vm:135': state changed from 'migrate' to 'started' (node = ovirt3)
Sep 12 12:50:33 ovirt15 pve-ha-crm[10330]: service 'vm:168': state changed from 'migrate' to 'started' (node = ovirt13)
Sep 12 12:50:43 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:50:43 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:50:43 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:50:43 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:50:53 ovirt15 pve-ha-crm[10330]: service 'vm:149': state changed from 'migrate' to 'started' (node = ovirt4)
Sep 12 12:50:53 ovirt15 pve-ha-crm[10330]: service 'vm:151': state changed from 'migrate' to 'started' (node = ovirt9)
Sep 12 12:50:53 ovirt15 pve-ha-crm[10330]: service 'vm:196': state changed from 'migrate' to 'started' (node = ovirt14)
Sep 12 12:50:53 ovirt15 pve-ha-crm[10330]: service 'vm:198': state changed from 'migrate' to 'started' (node = ovirt15)
Sep 12 12:51:03 ovirt15 pve-ha-crm[10330]: service 'vm:199': state changed from 'migrate' to 'started' (node = ovirt16)
Sep 12 12:51:03 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:51:03 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:51:03 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:51:03 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:51:13 ovirt15 pve-ha-crm[10330]: service 'vm:206': state changed from 'migrate' to 'started' (node = ovirt3)
Sep 12 12:51:13 ovirt15 pve-ha-crm[10330]: service 'vm:220': state changed from 'migrate' to 'started' (node = ovirt9)
Sep 12 12:51:23 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:51:23 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:51:23 ovirt15 pve-ha-crm[10330]: service 'vm:221': state changed from 'migrate' to 'started' (node = ovirt10)
Sep 12 12:51:23 ovirt15 pve-ha-crm[10330]: service 'vm:224': state changed from 'migrate' to 'started' (node = ovirt11)
Sep 12 12:51:24 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:51:24 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:51:43 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:51:43 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:51:43 ovirt15 pve-ha-crm[10330]: service 'vm:209': state changed from 'migrate' to 'started' (node = ovirt4)
Sep 12 12:51:43 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:51:43 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:52:03 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:52:03 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:52:03 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:52:03 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:52:23 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:52:23 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:52:23 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:52:23 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:52:43 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:52:43 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:52:44 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:52:44 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
Sep 12 12:53:03 ovirt15 pve-ha-crm[10330]: service 'vm:204' - migration failed (exit code 1)
Sep 12 12:53:03 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'migrate' to 'started' (node = ovirt8)
Sep 12 12:53:03 ovirt15 pve-ha-crm[10330]: migrate service 'vm:204' to node 'ovirt17' (running)
Sep 12 12:53:03 ovirt15 pve-ha-crm[10330]: service 'vm:204': state changed from 'started' to 'migrate' (node = ovirt8, target = ovirt17)
As last chance I tried to stop leader CRM to enforce daemon re-election, but nothing really changed and new leader just continued in migration loop.
I am using older version of Proxmox
Code:
-> pveversion
pve-manager/6.4-13/9f411e79 (running kernel: 5.4.174-2-pve)
Do you have any clue what I might be doing wrong? I am pretty sure that this worked absolutely flawlessly before so this migration loop was quite surprise to me, but I am unable to find any misconfiguration.