no such cluster node 'nodename' (500) [SOLVED]

Dmitry Pronyaev

New Member
Jan 9, 2019
4
1
1
38
Hi all!

I have cluster consists of 3 nodes:
pve01
pve02
pve03

running Proxmox Virtual Environment 5.1-51

Recently I've added 4th node pve04 running 5.3-6.

It appeared in web interface and is operationable, but I have one problem.
I can Migrate guest from pve01 and pve03 to pve04, but when I try to Migrate from pve02, I get error No such cluster node 'pve04' (500).

I have the same /etc/hosts on all nodes looking like:

127.0.0.1 localhost.localdomain localhost
172.20.71.111 pve01.virt.tul.ztlc.net pve01 pvelocalhost
172.20.71.112 pve02.virt.tul.ztlc.net pve02
172.20.71.113 pve03.virt.tul.ztlc.net pve03
172.20.71.114 pve04.virt.tul.ztlc.net pve04
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts



pvecm nodes looks like

Membership information
----------------------
Nodeid Votes Name
1 1 pve01 (local)
3 1 pve02
2 1 pve03
4 1 pve04


on all nodes but on pve02 it looks like this:
Membership information
----------------------
Nodeid Votes Name
1 1 pve01
3 1 pve02 (local)
2 1 pve03
4 1 pve04.virt.tul.ztlc.net


/etc/pve/.members on pve01 (this file on pve03, pve04 also contains 4 nodes)
{
"nodename": "pve01",
"version": 21,
"cluster": { "name": "PVE-CLUSTER", "version": 4, "nodes": 4, "quorate": 1 },
"nodelist": {
"pve01": { "id": 1, "online": 1, "ip": "172.20.71.111"},
"pve02": { "id": 3, "online": 1, "ip": "172.20.71.112"},
"pve04": { "id": 4, "online": 1, "ip": "172.20.71.114"},
"pve03": { "id": 2, "online": 1, "ip": "172.20.71.113"}
}
}


on pve02 there is no entry for the 4th node:
{
"nodename": "pve02",
"version": 28,
"cluster": { "name": "PVE-CLUSTER", "version": 3, "nodes": 3, "quorate": 1 },
"nodelist": {
"pve01": { "id": 1, "online": 1, "ip": "172.20.71.111"},
"pve02": { "id": 3, "online": 1, "ip": "172.20.71.112"},
"pve03": { "id": 2, "online": 1, "ip": "172.20.71.113"}
}
}

/etc/pve/corosync.conf on all 4 nodes (including pve02 !) contains lines for pve01-pve04:
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve01
nodeid: 1
quorum_votes: 1
ring0_addr: pve01
}
node {
name: pve02
nodeid: 3
quorum_votes: 1
ring0_addr: pve02
}
node {
name: pve03
nodeid: 2
quorum_votes: 1
ring0_addr: pve03
}
node {
name: pve04
nodeid: 4
quorum_votes: 1
ring0_addr: pve04
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: PVE-CLUSTER
config_version: 4
interface {
bindnetaddr: 172.20.71.111
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}



So, by some reason pve02 can't operate pve04 as member of cluster. How can I fix it?
 
Last edited:
the `/etc/pve/.members` file should be the same across all nodes in a working cluster - check the logs for entries from `pmxcfs` and `corosync` (pve-cluster.service and corosync.service are the corresponding units).
 
the `/etc/pve/.members` file should be the same across all nodes in a working cluster - check the logs for entries from `pmxcfs` and `corosync` (pve-cluster.service and corosync.service are the corresponding units).

There is no syslog entries with keywords "corosync" and "pmxcfs" on all nodes. Service status is also fine on every node:
(here is status from pve04):

root@pve04:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2018-12-28 12:31:30 MSK; 1 weeks 5 days ago
Process: 7112 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 6927 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 6996 (pmxcfs)
Tasks: 7 (limit: 7372)
Memory: 74.0M
CPU: 29min 17.577s
CGroup: /system.slice/pve-cluster.service
└─6996 /usr/bin/pmxcfs

Jan 09 12:05:26 pve04 pmxcfs[6996]: [dcdb] notice: data verification successful
Jan 09 12:13:18 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:13:18 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:28:19 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:28:20 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:43:21 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:43:22 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:58:23 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:58:23 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 13:05:26 pve04 pmxcfs[6996]: [dcdb] notice: data verification successful
root@pve04:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2018-12-28 12:31:30 MSK; 1 weeks 5 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 7117 (corosync)
Tasks: 2 (limit: 7372)
Memory: 46.2M
CPU: 4h 21min 25.788s
CGroup: /system.slice/corosync.service
└─7117 /usr/sbin/corosync -f

Dec 28 12:31:32 pve04 corosync[7117]: [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: warning [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: notice [QUORUM] This node is within the primary component and will provide service.
Dec 28 12:31:32 pve04 corosync[7117]: notice [QUORUM] Members[4]: 1 3 2 4
Dec 28 12:31:32 pve04 corosync[7117]: notice [MAIN ] Completed service synchronization, ready to provide service.
Dec 28 12:31:32 pve04 corosync[7117]: [QUORUM] This node is within the primary component and will provide service.
Dec 28 12:31:32 pve04 corosync[7117]: [QUORUM] Members[4]: 1 3 2 4
Dec 28 12:31:32 pve04 corosync[7117]: [MAIN ] Completed service synchronization, ready to provide service.
 
are you sure this (and the other) ring address(es) resolve to the correct IP? Or is there a wrong entry in any nodes /etc/hosts? Maybe set the ring0_addr directly to the IP used (see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#edit-corosync-conf ) and restart corosync pve-cluster on all nodes:
Code:
systemctl restart corosync pve-cluster

I've tried to ping pve04 from all 4 nodes:

root@pve01:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=1.68 ms

root@pve02:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=0.195 ms

root@pve03:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=0.151 ms

root@pve04:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=0.030 ms


It resolved IP 172.20.71.114 from name "pve04" correctly, so I think the problem is not in DNS or /etc/hosts wrong entry.

Should I restart corosync and pve-cluster services in this case?
 
I've updated all nodes to the same version (one by one, migrating all VMs to other nodes before upgrade) - with reboot of each node. Now all node works with each other. Thank for help!
 
  • Like
Reactions: aasami
are you sure this (and the other) ring address(es) resolve to the correct IP? Or is there a wrong entry in any nodes /etc/hosts? Maybe set the ring0_addr directly to the IP used (see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#edit-corosync-conf ) and restart corosync pve-cluster on all nodes:
Code:
systemctl restart corosync pve-cluster
I had the same problem with a newly added node, after the restart of corsync the old existing nodes know the new node, and the migration works...