no such cluster node 'nodename' (500) [SOLVED]

Dmitry Pronyaev · Jan 9, 2019

Hi all!

I have cluster consists of 3 nodes:
pve01
pve02
pve03
running Proxmox Virtual Environment 5.1-51

Recently I've added 4th node pve04 running 5.3-6.

It appeared in web interface and is operationable, but I have one problem.
I can Migrate guest from pve01 and pve03 to pve04, but when I try to Migrate from pve02, I get error No such cluster node 'pve04' (500).

I have the same /etc/hosts on all nodes looking like:

127.0.0.1 localhost.localdomain localhost
172.20.71.111 pve01.virt.tul.ztlc.net pve01 pvelocalhost
172.20.71.112 pve02.virt.tul.ztlc.net pve02
172.20.71.113 pve03.virt.tul.ztlc.net pve03
172.20.71.114 pve04.virt.tul.ztlc.net pve04
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

pvecm nodes looks like

Membership information
----------------------
Nodeid Votes Name
1 1 pve01 (local)
3 1 pve02
2 1 pve03
4 1 pve04

on all nodes but on pve02 it looks like this:
Membership information
----------------------
Nodeid Votes Name
1 1 pve01
3 1 pve02 (local)
2 1 pve03
4 1 pve04.virt.tul.ztlc.net

/etc/pve/.members on pve01 (this file on pve03, pve04 also contains 4 nodes)
{
"nodename": "pve01",
"version": 21,
"cluster": { "name": "PVE-CLUSTER", "version": 4, "nodes": 4, "quorate": 1 },
"nodelist": {
"pve01": { "id": 1, "online": 1, "ip": "172.20.71.111"},
"pve02": { "id": 3, "online": 1, "ip": "172.20.71.112"},
"pve04": { "id": 4, "online": 1, "ip": "172.20.71.114"},
"pve03": { "id": 2, "online": 1, "ip": "172.20.71.113"}
}
}

on pve02 there is no entry for the 4th node:
{
"nodename": "pve02",
"version": 28,
"cluster": { "name": "PVE-CLUSTER", "version": 3, "nodes": 3, "quorate": 1 },
"nodelist": {
"pve01": { "id": 1, "online": 1, "ip": "172.20.71.111"},
"pve02": { "id": 3, "online": 1, "ip": "172.20.71.112"},
"pve03": { "id": 2, "online": 1, "ip": "172.20.71.113"}
}
}

/etc/pve/corosync.conf on all 4 nodes (including pve02 !) contains lines for pve01-pve04:
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve01
nodeid: 1
quorum_votes: 1
ring0_addr: pve01
}
node {
name: pve02
nodeid: 3
quorum_votes: 1
ring0_addr: pve02
}
node {
name: pve03
nodeid: 2
quorum_votes: 1
ring0_addr: pve03
}
node {
name: pve04
nodeid: 4
quorum_votes: 1
ring0_addr: pve04
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: PVE-CLUSTER
config_version: 4
interface {
bindnetaddr: 172.20.71.111
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

So, by some reason pve02 can't operate pve04 as member of cluster. How can I fix it?

Stoiko Ivanov · Jan 9, 2019

the `/etc/pve/.members` file should be the same across all nodes in a working cluster - check the logs for entries from `pmxcfs` and `corosync` (pve-cluster.service and corosync.service are the corresponding units).

Dmitry Pronyaev · Jan 9, 2019

Stoiko Ivanov said:
the `/etc/pve/.members` file should be the same across all nodes in a working cluster - check the logs for entries from `pmxcfs` and `corosync` (pve-cluster.service and corosync.service are the corresponding units).

There is no syslog entries with keywords "corosync" and "pmxcfs" on all nodes. Service status is also fine on every node:
(here is status from pve04):

root@pve04:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2018-12-28 12:31:30 MSK; 1 weeks 5 days ago
Process: 7112 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 6927 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 6996 (pmxcfs)
Tasks: 7 (limit: 7372)
Memory: 74.0M
CPU: 29min 17.577s
CGroup: /system.slice/pve-cluster.service
└─6996 /usr/bin/pmxcfs

Jan 09 12:05:26 pve04 pmxcfs[6996]: [dcdb] notice: data verification successful
Jan 09 12:13:18 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:13:18 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:28:19 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:28:20 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:43:21 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:43:22 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:58:23 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 12:58:23 pve04 pmxcfs[6996]: [status] notice: received log
Jan 09 13:05:26 pve04 pmxcfs[6996]: [dcdb] notice: data verification successful
root@pve04:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2018-12-28 12:31:30 MSK; 1 weeks 5 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 7117 (corosync)
Tasks: 2 (limit: 7372)
Memory: 46.2M
CPU: 4h 21min 25.788s
CGroup: /system.slice/corosync.service
└─7117 /usr/sbin/corosync -f

Dec 28 12:31:32 pve04 corosync[7117]: [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: warning [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: [CPG ] downlist left_list: 0 received
Dec 28 12:31:32 pve04 corosync[7117]: notice [QUORUM] This node is within the primary component and will provide service.
Dec 28 12:31:32 pve04 corosync[7117]: notice [QUORUM] Members[4]: 1 3 2 4
Dec 28 12:31:32 pve04 corosync[7117]: notice [MAIN ] Completed service synchronization, ready to provide service.
Dec 28 12:31:32 pve04 corosync[7117]: [QUORUM] This node is within the primary component and will provide service.
Dec 28 12:31:32 pve04 corosync[7117]: [QUORUM] Members[4]: 1 3 2 4
Dec 28 12:31:32 pve04 corosync[7117]: [MAIN ] Completed service synchronization, ready to provide service.

t.lamprecht · Jan 9, 2019

Dmitry Pronyaev said:
ring0_addr: pve04

are you sure this (and the other) ring address(es) resolve to the correct IP? Or is there a wrong entry in any nodes /etc/hosts? Maybe set the ring0_addr directly to the IP used (see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#edit-corosync-conf ) and restart corosync pve-cluster on all nodes:

Code:

systemctl restart corosync pve-cluster

Dmitry Pronyaev · Jan 9, 2019

t.lamprecht said:
are you sure this (and the other) ring address(es) resolve to the correct IP? Or is there a wrong entry in any nodes /etc/hosts? Maybe set the ring0_addr directly to the IP used (see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#edit-corosync-conf ) and restart corosync pve-cluster on all nodes:

Code:

systemctl restart corosync pve-cluster

I've tried to ping pve04 from all 4 nodes:

root@pve01:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=1.68 ms

root@pve02:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=0.195 ms

root@pve03:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=0.151 ms

root@pve04:~# ping pve04
PING pve04.virt.tul.ztlc.net (172.20.71.114) 56(84) bytes of data.
64 bytes from pve04.virt.tul.ztlc.net (172.20.71.114): icmp_seq=1 ttl=64 time=0.030 ms

It resolved IP 172.20.71.114 from name "pve04" correctly, so I think the problem is not in DNS or /etc/hosts wrong entry.

Should I restart corosync and pve-cluster services in this case?

Dmitry Pronyaev · Jan 16, 2019

I've updated all nodes to the same version (one by one, migrating all VMs to other nodes before upgrade) - with reboot of each node. Now all node works with each other. Thank for help!

Jospeh Huber · Jun 17, 2019

t.lamprecht said:
are you sure this (and the other) ring address(es) resolve to the correct IP? Or is there a wrong entry in any nodes /etc/hosts? Maybe set the ring0_addr directly to the IP used (see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#edit-corosync-conf ) and restart corosync pve-cluster on all nodes:

Code:

systemctl restart corosync pve-cluster

I had the same problem with a newly added node, after the restart of corsync the old existing nodes know the new node, and the migration works...

brucexx · Feb 4, 2020

Got that issue too on 6.1.1, restarting corosync on affected node fixed the issue.

Search

Search

no such cluster node 'nodename' (500) [SOLVED]

Dmitry Pronyaev

New Member

Stoiko Ivanov

Proxmox Staff Member

Dmitry Pronyaev

New Member

t.lamprecht

Proxmox Staff Member

Dmitry Pronyaev

New Member

Dmitry Pronyaev

New Member

Jospeh Huber

Renowned Member

brucexx

Renowned Member