[SOLVED] Can not migrate due to full name in cluster environment

Whatever

Renowned Member
Nov 19, 2012
393
63
93
I have 5-nodes PVE cluster. I'm expecting problem with migration from one particular to another particular one.
The error message is: no such cluster node 'pve02A' (500)

I've checked pvecm on all the nodes and got:


root@pve01A:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 pve01A (local)
2 1 pve01B
3 1 pve01C
4 1 pve01D
5 1 pve02A

root@pve01B:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 pve01A
2 1 pve01B (local)
3 1 pve01C
4 1 pve01D
5 1 pve02A.domain.local

root@pve01C:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 pve01A
2 1 pve01B
3 1 pve01C (local)
4 1 pve01D
5 1 pve02A


So, the problem is that on pve01B the fifth node in cluster list is in FQDN format.

Any ideas how this could be fixed?
 
Last edited:
root@pve01B:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 pve01A
2 1 pve01B (local)
3 1 pve01C
4 1 pve01D
5 1 pve02A.domain.local



So, the problem is that on pve01B the fifth node in cluster list is in FQDN format.

Any ideas how this could be fixed?


Check /etc/pve/corosync.conf and /etc/hosts
 
When tried to connect from pve01B to pve02B had an warning message:
Warning: the ECDSA host key for '' differs from the key for the IP address ''

After deleting offending key from known_hosts and restarting corosync daemon the problem has gone!
Thanks for the hint!
 
  • Like
Reactions: fibo_fr
Hi,
I have the same problem with proxmox 4.1

root@hn46:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
2 1 hn42
3 1 hn43
1 1 hn45
4 1 hn46 (local)
5 1 hn47.in****ax.hu (!!!)

root@hn47:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
2 1 hn42
3 1 hn43
1 1 hn45
4 1 hn46
5 1 hn47 (local)

hn47 is a newly added node in the cluster.

If I want to migrate from hn46 to hn47 it gives me error: no such cluster node 'hn47' (500)
I can migrate from hn46 to hn45 and then to hn47.
If I migrate from hn47 to hn46 the migrate process hangs after copy. I can not access webgui on hn47 any more. The other nodes's webgui shows all other nodes offline. After rebooting (power cycling) hn47, everything works again and I can start the migrated VM on hn46 after unlocking it.

I checked the files mentions above, they seems to be OK.
I can ssh from hn46 to hn47 and back without any error.

On hn46's webgui hn47 is missing, but the other nodes's webgui sees all nodes.

I found two strange thing.

In corosync.conf:

totem {
cluster_name: I*****X
config_version: 9
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: xx.xx.xx.59
ringnumber: 0
}

}

The bindnetaddr IP address is the IP address of a node which has been removed from the cluster. Don't know why is it there.

Second thing is that, already removed nodes's config dirs are still in /etc/pve/nodes/ . May this cause any problem?

Thanks for any help.
 
Seems you have slightly different issue than mine.
Try to check /etc/hosts

An empty folder with deleted nodename should not be the problem
 
Seems you have slightly different issue than mine.
Try to check /etc/hosts

An empty folder with deleted nodename should not be the problem

Thank for fast your response.

I already checked etc/hosts, I think they are OK:

hn46:
127.0.0.1 localhost.localdomain localhost
xx.xx.xx.76 hn46.in****ax.hu hn46 pvelocalhost

hn47:
127.0.0.1 localhost.localdomain localhost
xx.xx.xx.77 hn47.in****ax.hu hn47 pvelocalhost
 
Restarting corosync on hn46 solved this problem

So something went wrong when I added hn47 to the cluster, but I don't know what.
 
  • Like
Reactions: fibo_fr