Proxmox Cluster Issue-Join Information is not visible

arup647

New Member
Jan 19, 2023
8
0
1
Dear Members!!!
Greetings

We have a running cluster of nodes 12, out of which 3 nodes are having some hardware issues.

We have observed that in the data center cluster, the join cluster tab is greyed out because of this we are unable to add any nodes.

I have attached the screenshot as attachment.

regards
Arup
 

Attachments

  • error.png
    error.png
    42.8 KB · Views: 21
Hi,
I assume this is a node which is already part of the cluster (as you see the members below). You have to join the cluster from the new node, which is not part of the cluster yet.

Edit: Ah sorry, you mean you cannot get the cluster join information. Can you check the status of systemctl status pve-cluster corosync
 
Last edited:
  • Like
Reactions: arup647
Hi,
I assume this is a node which is already part of the cluster (as you see the members below). You have to join the cluster from the new node, which is not part of the cluster yet.
Thanks, Chris for the quick revert. But from where we will get the "join information" to join the existing cluster?
 
Can you check the cluster status pvecm status
 
Can you check the cluster status pvecm status
Please find the below output---

root@dellr6525-E2-28tb-dc-node1:~# pvecm status
Cluster information
-------------------
Name: asdc-dc
Config Version: 20
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Thu Jan 19 16:15:14 2023
Quorum provider: corosync_votequorum
Nodes: 17
Node ID: 0x0000000d
Ring ID: 1.12a
Quorate: Yes

Votequorum information
----------------------
Expected votes: 20
Highest expected: 20
Total votes: 17
Quorum: 11
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.14.11
0x00000002 1 192.168.14.12
0x00000003 1 192.168.14.13
0x00000004 1 192.168.14.14
0x00000005 1 192.168.14.15
0x00000006 1 192.168.14.16
0x00000007 1 192.168.14.17
0x00000008 1 192.168.14.18
0x00000009 1 192.168.14.19
0x0000000a 1 192.168.14.20
0x0000000b 1 192.168.14.21
0x0000000c 1 192.168.14.22
0x0000000d 1 192.168.14.36 (local)
0x0000000e 1 192.168.14.37
0x00000010 1 192.168.14.44
0x00000011 1 192.168.14.45
0x00000014 1 192.168.14.49
root@dellr6525-E2-28tb-dc-node1:~#
root@dellr6525-E2-28tb-dc-node1:~# systemctl status pve-cluster corosync
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-01-19 10:58:16 IST; 5h 36min ago
Process: 1132192 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 1132193 (pmxcfs)
Tasks: 8 (limit: 618545)
Memory: 56.2M
CPU: 36.389s
CGroup: /system.slice/pve-cluster.service
└─1132193 /usr/bin/pmxcfs

Jan 19 15:58:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:11:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:13:16 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:14:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:15:08 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:15:08 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:27:31 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [dcdb] notice: data verification successful
Jan 19 16:27:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:28:16 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:30:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log

● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-01-13 16:44:28 IST; 5 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 3549 (corosync)
Tasks: 9 (limit: 618545)
Memory: 181.2M
CPU: 1h 48min 13.354s
CGroup: /system.slice/corosync.service
└─3549 /usr/sbin/corosync -f

Jan 19 13:25:02 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 11 (passive) best link: 0 (pri: 1)
Jan 19 13:25:02 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] pmtud: Global data MTU changed to: 1397
Jan 19 14:12:49 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] link: host: 8 link: 0 is down
Jan 19 14:12:49 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)
Jan 19 14:12:49 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 8 has no active links
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] rx: host: 8 link: 0 is up
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] link: Resetting MTU for link 0 because host 8 joined
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] pmtud: Global data MTU changed to: 1397
Jan 19 15:10:04 dellr6525-E2-28tb-dc-node1 corosync[3549]: [TOTEM ] Retransmit List: 181c04
root@dellr6525-E2-28tb-dc-node1:~#
 
Last edited:
I remember such an issue with an outdated version, so please provide your:

> pveversion -v

are you logged as root?
 
Can you check the output of pvesh get /cluster/config/join
 
I remember such an issue with an outdated version, so please provide your:

> pveversion -v

are you logged as root?
Yes logged as root

root@dellr6525-E2-28tb-dc-node1:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-8
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
root@dellr6525-E2-28tb-dc-node1:~#

Yes logged as root
 
Please find the output for the "pvesh get /cluster/config/join"

root@dellr6525-E2-28tb-dc-node1:~# pvesh get /cluster/config/join
hostname lookup 'dellr7425-E1-600GB-dc-node4' failed - failed to get address info for: dellr7425-E1-600GB-dc-node4: Name or service not known
root@dellr6525-E2-28tb-dc-node1:~#
 
Please find the output for the "pvesh get /cluster/config/join"

root@dellr6525-E2-28tb-dc-node1:~# pvesh get /cluster/config/join
hostname lookup 'dellr7425-E1-600GB-dc-node4' failed - failed to get address info for: dellr7425-E1-600GB-dc-node4: Name or service not known
root@dellr6525-E2-28tb-dc-node1:~#
Did you change the name of one of the nodes in the cluster? Can you please provide the /etc/pve/corosync.conf
 
Did you change the name of one of the nodes in the cluster? Can you please provide the /etc/pve/corosync.conf
No change in the node name from our side, the only issue with those 3 nodes, they are having some network cable issues for which they are disconnected from the network & which is why the Proxmox dashboard they are showing in Red Colour with a cross mark. Please find the requested output---

root@dellr6525-E2-28tb-dc-node1:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: dellr-7425-E1-600GB-dc-node6
nodeid: 19
quorum_votes: 1
ring0_addr: 192.168.14.48
}
node {
name: dellr-7425-E1-600GB-dc-node7
nodeid: 20
quorum_votes: 1
ring0_addr: 192.168.14.49
}
node {
name: dellr6525-E2-28tb-dc-node1
nodeid: 13
quorum_votes: 1
ring0_addr: 192.168.14.36
}
node {
name: dellr6525-E2-28tb-dc-node2
nodeid: 14
quorum_votes: 1
ring0_addr: 192.168.14.37
}
node {
name: dellr6525-E5-950GB-dc-node1
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.14.11
}
node {
name: dellr6525-E5-950GB-dc-node11
nodeid: 11
quorum_votes: 1
ring0_addr: 192.168.14.21
}
node {
name: dellr6525-E5-950GB-dc-node12
nodeid: 12
quorum_votes: 1
ring0_addr: 192.168.14.22
}
node {
name: dellr6525-E5-950GB-dc-node2
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.14.12
}
node {
name: dellr6525-E5-950GB-dc-node3
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.14.13
}
node {
name: dellr6525-E5-950GB-dc-node4
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.14.14
}
node {
name: dellr6525-E5-950GB-dc-node5
nodeid: 5
quorum_votes: 1
ring0_addr: 192.168.14.15
}
node {
name: dellr6525-E5-950GB-dc-node6
nodeid: 6
quorum_votes: 1
ring0_addr: 192.168.14.16
}
node {
name: dellr6525-E5-950GB-dc-node7
nodeid: 7
quorum_votes: 1
ring0_addr: 192.168.14.17
}
node {
name: dellr6525-E5-950GB-dc-node8
nodeid: 8
quorum_votes: 1
ring0_addr: 192.168.14.18
}
node {
name: dellr6526-E5-950GB-dc-node10
nodeid: 10
quorum_votes: 1
ring0_addr: 192.168.14.20
}
node {
name: dellr6526-E5-950GB-dc-node9
nodeid: 9
quorum_votes: 1
ring0_addr: 192.168.14.19
}
node {
name: dellr7425-E1-600GB-dc-node1
nodeid: 15
quorum_votes: 1
ring0_addr: 192.168.14.43
}
node {
name: dellr7425-E1-600GB-dc-node2
nodeid: 16
quorum_votes: 1
ring0_addr: 192.168.14.44
}
node {
name: dellr7425-E1-600GB-dc-node3
nodeid: 17
quorum_votes: 1
ring0_addr: 192.168.14.45
}
node {
name: dellr7425-E1-600GB-dc-node4
nodeid: 18
quorum_votes: 1
ring0_addr: 192.168.14.46
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: asdc-dc
config_version: 20
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

root@dellr6525-E2-28tb-dc-node1:~#
 
The issue here seems to be that the nodename is probably not known to the cluster filesystem because it was never online since a restart of the cluster. So the easiest fix is to bring back up all nodes in the cluster, before joining new ones.
 
The issue here seems to be that the nodename is probably not known to the cluster filesystem because it was never online since a restart of the cluster. So the easiest fix is to bring back up all nodes in the cluster, before joining new ones.
Just curious, how does one recover from this when a node is completely bricked and therefore can't be brought back online?
 
Just curious, how does one recover from this when a node is completely bricked and therefore can't be brought back online?
Well in that case you would remove it from the cluster I suppose. You don't want to keep a bricked node to be part of the cluster.
 
Yes, the issue here is that the node tries to resolve the IP of the other node by looking it up via the cluster file system. Since that and also the fallback via lookup by nodename fails, the API call getting the join information fails and results in the grayed out panel. So if a node is not part of the cluster anymore, no need for looking up its address.

On a side note: Independent from this issue, a join via CLI should still work in this case.
 
  • Like
Reactions: mow

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!