Migration not working

kurdam

Active Member
Sep 29, 2020
45
5
28
34
Hi, I have a curious problem this morning. Yesterday i went and chage some iscsi timeout in /etc/iscsi/iscsid.conf
node.conn[0].timeo.login_timeout = 60 (previsously 15)

We have this configuration set on our VMWare clusters in order not to crash the VM il we have a light disconnect in our iscsi network.
What i did, is migrate all the VMs from the host, changed the config then rebooted the node.

This morning, i tried to migrate the machines back to their node. But unfortunately, the migration just hangs and doesn't progress.

I don't think it could be linked to my modifications, but i'm giving you the context. It's happening on all my nodes and i can't migrate anything (hot or cold)

This is what the migrate window is showing:

2021-03-11 10:23:14 use dedicated network address for sending migration traffic (10.10.1.4)
2021-03-11 10:23:14 starting migration of VM 506 to node 'pve' (10.10.1.4)
2021-03-11 10:23:54 ERROR: Failed to sync data - rbd error: interrupted by signal
2021-03-11 10:23:54 aborting phase 1 - cleanup resources
2021-03-11 10:23:54 ERROR: migration aborted (duration 00:00:40): Failed to sync data - rbd error: interrupted by signal
TASK ERROR: migration aborted

-The traffic is going on all the nodes on the network 10.10.1.0/24 (i did the pings to verify)
-I did some research before opening this thread and i saw some posts about HA on which i tried about everything.
-Some other posts talked about ticking or not the shared box on my storage. It didn't seems do change a damn thing either.

I'm starting to run out of ideas and i could use some help.

The only thing that looks out of place (but i'm not even sure is this):
Captureproxmox.PNG

My cluster is constituted of 4 nodes pve, pve1, pve2 and pve3. I'm having some trouble understanding why on the Datacenter -> HA tab pve2 is here twice and pve1 is on idle. All the nodes are on the latest version (i've just checked).

Thank you in advance.
 
Hi,

I found what was going on. It looks like Ceph was interfiering with my migration, had to remove it all.

Would like to understand why though.
 
Last edited: