Hi, I have a curious problem this morning. Yesterday i went and chage some iscsi timeout in /etc/iscsi/iscsid.conf
node.conn[0].timeo.login_timeout = 60 (previsously 15)
We have this configuration set on our VMWare clusters in order not to crash the VM il we have a light disconnect in our iscsi network.
What i did, is migrate all the VMs from the host, changed the config then rebooted the node.
This morning, i tried to migrate the machines back to their node. But unfortunately, the migration just hangs and doesn't progress.
I don't think it could be linked to my modifications, but i'm giving you the context. It's happening on all my nodes and i can't migrate anything (hot or cold)
This is what the migrate window is showing:
-The traffic is going on all the nodes on the network 10.10.1.0/24 (i did the pings to verify)
-I did some research before opening this thread and i saw some posts about HA on which i tried about everything.
-Some other posts talked about ticking or not the shared box on my storage. It didn't seems do change a damn thing either.
I'm starting to run out of ideas and i could use some help.
The only thing that looks out of place (but i'm not even sure is this):
My cluster is constituted of 4 nodes pve, pve1, pve2 and pve3. I'm having some trouble understanding why on the Datacenter -> HA tab pve2 is here twice and pve1 is on idle. All the nodes are on the latest version (i've just checked).
Thank you in advance.
node.conn[0].timeo.login_timeout = 60 (previsously 15)
We have this configuration set on our VMWare clusters in order not to crash the VM il we have a light disconnect in our iscsi network.
What i did, is migrate all the VMs from the host, changed the config then rebooted the node.
This morning, i tried to migrate the machines back to their node. But unfortunately, the migration just hangs and doesn't progress.
I don't think it could be linked to my modifications, but i'm giving you the context. It's happening on all my nodes and i can't migrate anything (hot or cold)
This is what the migrate window is showing:
2021-03-11 10:23:14 use dedicated network address for sending migration traffic (10.10.1.4)
2021-03-11 10:23:14 starting migration of VM 506 to node 'pve' (10.10.1.4)
2021-03-11 10:23:54 ERROR: Failed to sync data - rbd error: interrupted by signal
2021-03-11 10:23:54 aborting phase 1 - cleanup resources
2021-03-11 10:23:54 ERROR: migration aborted (duration 00:00:40): Failed to sync data - rbd error: interrupted by signal
TASK ERROR: migration aborted
-The traffic is going on all the nodes on the network 10.10.1.0/24 (i did the pings to verify)
-I did some research before opening this thread and i saw some posts about HA on which i tried about everything.
-Some other posts talked about ticking or not the shared box on my storage. It didn't seems do change a damn thing either.
I'm starting to run out of ideas and i could use some help.
The only thing that looks out of place (but i'm not even sure is this):
My cluster is constituted of 4 nodes pve, pve1, pve2 and pve3. I'm having some trouble understanding why on the Datacenter -> HA tab pve2 is here twice and pve1 is on idle. All the nodes are on the latest version (i've just checked).
Thank you in advance.