Proxmox Cluster Issue-Join Information is not visible

arup647 · Jan 19, 2023

Dear Members!!!
Greetings

We have a running cluster of nodes 12, out of which 3 nodes are having some hardware issues.

We have observed that in the data center cluster, the join cluster tab is greyed out because of this we are unable to add any nodes.

I have attached the screenshot as attachment.

regards
Arup

Chris · Jan 19, 2023

Hi,
I assume this is a node which is already part of the cluster (as you see the members below). You have to join the cluster from the new node, which is not part of the cluster yet.

Edit: Ah sorry, you mean you cannot get the cluster join information. Can you check the status of systemctl status pve-cluster corosync

arup647 · Jan 19, 2023

Chris said:
Hi,
I assume this is a node which is already part of the cluster (as you see the members below). You have to join the cluster from the new node, which is not part of the cluster yet.

Thanks, Chris for the quick revert. But from where we will get the "join information" to join the existing cluster?

Chris · Jan 19, 2023

Can you check the cluster status pvecm status

arup647 · Jan 19, 2023

Chris said:
Can you check the cluster status pvecm status

Please find the below output---

root@dellr6525-E2-28tb-dc-node1:~# pvecm status
Cluster information
-------------------
Name: asdc-dc
Config Version: 20
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Thu Jan 19 16:15:14 2023
Quorum provider: corosync_votequorum
Nodes: 17
Node ID: 0x0000000d
Ring ID: 1.12a
Quorate: Yes

Votequorum information
----------------------
Expected votes: 20
Highest expected: 20
Total votes: 17
Quorum: 11
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.14.11
0x00000002 1 192.168.14.12
0x00000003 1 192.168.14.13
0x00000004 1 192.168.14.14
0x00000005 1 192.168.14.15
0x00000006 1 192.168.14.16
0x00000007 1 192.168.14.17
0x00000008 1 192.168.14.18
0x00000009 1 192.168.14.19
0x0000000a 1 192.168.14.20
0x0000000b 1 192.168.14.21
0x0000000c 1 192.168.14.22
0x0000000d 1 192.168.14.36 (local)
0x0000000e 1 192.168.14.37
0x00000010 1 192.168.14.44
0x00000011 1 192.168.14.45
0x00000014 1 192.168.14.49
root@dellr6525-E2-28tb-dc-node1:~#
root@dellr6525-E2-28tb-dc-node1:~# systemctl status pve-cluster corosync
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-01-19 10:58:16 IST; 5h 36min ago
Process: 1132192 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 1132193 (pmxcfs)
Tasks: 8 (limit: 618545)
Memory: 56.2M
CPU: 36.389s
CGroup: /system.slice/pve-cluster.service
└─1132193 /usr/bin/pmxcfs

Jan 19 15:58:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:11:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:13:16 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:14:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:15:08 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:15:08 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:27:31 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [dcdb] notice: data verification successful
Jan 19 16:27:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:28:16 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log
Jan 19 16:30:49 dellr6525-E2-28tb-dc-node1 pmxcfs[1132193]: [status] notice: received log

● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-01-13 16:44:28 IST; 5 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 3549 (corosync)
Tasks: 9 (limit: 618545)
Memory: 181.2M
CPU: 1h 48min 13.354s
CGroup: /system.slice/corosync.service
└─3549 /usr/sbin/corosync -f

Jan 19 13:25:02 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 11 (passive) best link: 0 (pri: 1)
Jan 19 13:25:02 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] pmtud: Global data MTU changed to: 1397
Jan 19 14:12:49 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] link: host: 8 link: 0 is down
Jan 19 14:12:49 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)
Jan 19 14:12:49 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 8 has no active links
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] rx: host: 8 link: 0 is up
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] link: Resetting MTU for link 0 because host 8 joined
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)
Jan 19 14:12:57 dellr6525-E2-28tb-dc-node1 corosync[3549]: [KNET ] pmtud: Global data MTU changed to: 1397
Jan 19 15:10:04 dellr6525-E2-28tb-dc-node1 corosync[3549]: [TOTEM ] Retransmit List: 181c04
root@dellr6525-E2-28tb-dc-node1:~#

arup647 · Jan 19, 2023

Have updated the previous output

tom · Jan 19, 2023

I remember such an issue with an outdated version, so please provide your:

> pveversion -v

are you logged as root?

Chris · Jan 19, 2023

Can you check the output of pvesh get /cluster/config/join

arup647 · Jan 19, 2023

tom said:
I remember such an issue with an outdated version, so please provide your:

> pveversion -v

are you logged as root?

Yes logged as root

root@dellr6525-E2-28tb-dc-node1:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-8
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
root@dellr6525-E2-28tb-dc-node1:~#

Yes logged as root

arup647 · Jan 19, 2023

Please find the output for the "pvesh get /cluster/config/join"

root@dellr6525-E2-28tb-dc-node1:~# pvesh get /cluster/config/join
hostname lookup 'dellr7425-E1-600GB-dc-node4' failed - failed to get address info for: dellr7425-E1-600GB-dc-node4: Name or service not known
root@dellr6525-E2-28tb-dc-node1:~#

Chris · Jan 19, 2023

arup647 said:
Please find the output for the "pvesh get /cluster/config/join"

root@dellr6525-E2-28tb-dc-node1:~# pvesh get /cluster/config/join
hostname lookup 'dellr7425-E1-600GB-dc-node4' failed - failed to get address info for: dellr7425-E1-600GB-dc-node4: Name or service not known
root@dellr6525-E2-28tb-dc-node1:~#

Did you change the name of one of the nodes in the cluster? Can you please provide the /etc/pve/corosync.conf

arup647 · Jan 19, 2023

Chris said:
Did you change the name of one of the nodes in the cluster? Can you please provide the /etc/pve/corosync.conf

No change in the node name from our side, the only issue with those 3 nodes, they are having some network cable issues for which they are disconnected from the network & which is why the Proxmox dashboard they are showing in Red Colour with a cross mark. Please find the requested output---

root@dellr6525-E2-28tb-dc-node1:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: dellr-7425-E1-600GB-dc-node6
nodeid: 19
quorum_votes: 1
ring0_addr: 192.168.14.48
}
node {
name: dellr-7425-E1-600GB-dc-node7
nodeid: 20
quorum_votes: 1
ring0_addr: 192.168.14.49
}
node {
name: dellr6525-E2-28tb-dc-node1
nodeid: 13
quorum_votes: 1
ring0_addr: 192.168.14.36
}
node {
name: dellr6525-E2-28tb-dc-node2
nodeid: 14
quorum_votes: 1
ring0_addr: 192.168.14.37
}
node {
name: dellr6525-E5-950GB-dc-node1
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.14.11
}
node {
name: dellr6525-E5-950GB-dc-node11
nodeid: 11
quorum_votes: 1
ring0_addr: 192.168.14.21
}
node {
name: dellr6525-E5-950GB-dc-node12
nodeid: 12
quorum_votes: 1
ring0_addr: 192.168.14.22
}
node {
name: dellr6525-E5-950GB-dc-node2
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.14.12
}
node {
name: dellr6525-E5-950GB-dc-node3
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.14.13
}
node {
name: dellr6525-E5-950GB-dc-node4
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.14.14
}
node {
name: dellr6525-E5-950GB-dc-node5
nodeid: 5
quorum_votes: 1
ring0_addr: 192.168.14.15
}
node {
name: dellr6525-E5-950GB-dc-node6
nodeid: 6
quorum_votes: 1
ring0_addr: 192.168.14.16
}
node {
name: dellr6525-E5-950GB-dc-node7
nodeid: 7
quorum_votes: 1
ring0_addr: 192.168.14.17
}
node {
name: dellr6525-E5-950GB-dc-node8
nodeid: 8
quorum_votes: 1
ring0_addr: 192.168.14.18
}
node {
name: dellr6526-E5-950GB-dc-node10
nodeid: 10
quorum_votes: 1
ring0_addr: 192.168.14.20
}
node {
name: dellr6526-E5-950GB-dc-node9
nodeid: 9
quorum_votes: 1
ring0_addr: 192.168.14.19
}
node {
name: dellr7425-E1-600GB-dc-node1
nodeid: 15
quorum_votes: 1
ring0_addr: 192.168.14.43
}
node {
name: dellr7425-E1-600GB-dc-node2
nodeid: 16
quorum_votes: 1
ring0_addr: 192.168.14.44
}
node {
name: dellr7425-E1-600GB-dc-node3
nodeid: 17
quorum_votes: 1
ring0_addr: 192.168.14.45
}
node {
name: dellr7425-E1-600GB-dc-node4
nodeid: 18
quorum_votes: 1
ring0_addr: 192.168.14.46
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: asdc-dc
config_version: 20
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

root@dellr6525-E2-28tb-dc-node1:~#

Chris · Jan 20, 2023

The issue here seems to be that the nodename is probably not known to the cluster filesystem because it was never online since a restart of the cluster. So the easiest fix is to bring back up all nodes in the cluster, before joining new ones.

mow · Jan 26, 2023

Chris said:
The issue here seems to be that the nodename is probably not known to the cluster filesystem because it was never online since a restart of the cluster. So the easiest fix is to bring back up all nodes in the cluster, before joining new ones.

Just curious, how does one recover from this when a node is completely bricked and therefore can't be brought back online?

Chris · Jan 26, 2023

mow said:
Just curious, how does one recover from this when a node is completely bricked and therefore can't be brought back online?

Well in that case you would remove it from the cluster I suppose. You don't want to keep a bricked node to be part of the cluster.

mow · Jan 26, 2023

Chris said:
Well in that case you would remove it from the cluster I suppose.

And that would also make the cluster fs happy and allow adding nodes again?

Chris · Jan 27, 2023

Yes, the issue here is that the node tries to resolve the IP of the other node by looking it up via the cluster file system. Since that and also the fallback via lookup by nodename fails, the API call getting the join information fails and results in the grayed out panel. So if a node is not part of the cluster anymore, no need for looking up its address.

On a side note: Independent from this issue, a join via CLI should still work in this case.

Proxmox Cluster Issue-Join Information is not visible

arup647

Member

Attachments

Chris

Proxmox Staff Member

arup647

Member

Chris

Proxmox Staff Member

arup647

Member

arup647

Member

tom

Proxmox Staff Member

Chris

Proxmox Staff Member

arup647

Member

arup647

Member

Chris

Proxmox Staff Member

arup647

Member

Chris

Proxmox Staff Member

mow

Active Member

Chris

Proxmox Staff Member

mow

Active Member

Chris

Proxmox Staff Member

We value your privacy