[SOLVED] new node add fail

Oct 23, 2020
83
3
13
31
Hi Team! Today I tried to add new host into PVE cluster (10 nodes)

I used command

Code:
 pvecm add [cluster_ip] --link0 [cluster_vlan_ip] --link1 [corosync_vlan_ip]

Everything was without errors. But in cluster I see only my 10 nodes (without new one) in Datacenter.
But! I can see it in Datacenter - Cluster section. "Join cluster" and another functions are inactive.
Code:
pvecm status
on node show that it isn't in cluster and trying to re-add to cluster finishes with error

Proxmox manager version the same on all nodes

изображение_2022-12-02_144856526.png

I tried delete node and got this

Code:
pvecm delnode 20pve04
Could not kill node (error = CS_ERR_NOT_EXIST)
Killing node 11
 
Last edited:
Hello,

Do you see the new node folder in the /etc/pve/nodes/ path?

Can you please check the Syslog in order to see if there is any error during adding the new node?

What the Proxmox VE version the cluster use (pveversion -v)?
 
Hello,

Do you see the new node folder in the /etc/pve/nodes/ path?

Can you please check the Syslog in order to see if there is any error during adding the new node?

What the Proxmox VE version the cluster use (pveversion -v)?
Hello @Moayad
I've already deleted node, but folder still there
At the moment I reinstalled node and wanna try to re-add to cluster

pveversion output:
Code:
pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph: 16.2.9-pve1
ceph-fuse: 16.2.9-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.3.1-1
proxmox-backup-file-restore: 2.3.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
 
Hi,

I forget to ask you, did you Ctrl+Shift+R on the PVE WebUI or tried a different/private browser when you do not see the newly added node? Maybe the browser had cached content?
 
Hi,

I forget to ask you, did you Ctrl+Shift+R on the PVE WebUI or tried a different/private browser when you do not see the newly added node? Maybe the browser had cached content?
@Moayad I found the problem. Before I've added node 20pve03 using management ip as link0 and corosync ip as link 1. After that mistake (we use for management - management vlan ip, for cluster - cluster vlan ip, corosync - corosync vlan ip) I reconfigured
Code:
corosync.conf
changend link0 to correct ip (cluster vlan ip). Maybe it was incorrect. So how can I do correct change ip for node 20pve03 and add new?

Code:
Dec 02 17:48:48 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:48:48 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:48:48 20pve03 sshd[620069]: Accepted publickey for root from 172.16.133.75 port 33170 ssh2: RSA SHA256:1IDKz3mpQb56w5GXd9uHfn1tXX0UD/2y+IDhBebZyBw
Dec 02 17:48:48 20pve03 sshd[620069]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Dec 02 17:48:48 20pve03 systemd-logind[819]: New session 63 of user root.
Dec 02 17:48:48 20pve03 systemd[1]: Started Session 63 of user root.
Dec 02 17:48:48 20pve03 login[620075]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)
Dec 02 17:48:48 20pve03 login[620080]: ROOT LOGIN  on '/dev/pts/0' from '172.16.133.75'
Dec 02 17:48:55 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:49:16 20pve03 sshd[620069]: Received disconnect from 172.16.133.75 port 33170:11: disconnected by user
Dec 02 17:49:16 20pve03 sshd[620069]: Disconnected from user root 172.16.133.75 port 33170
Dec 02 17:49:16 20pve03 sshd[620069]: pam_unix(sshd:session): session closed for user root
Dec 02 17:49:16 20pve03 systemd-logind[819]: Session 63 logged out. Waiting for processes to exit.
Dec 02 17:49:16 20pve03 systemd[1]: session-63.scope: Succeeded.
Dec 02 17:49:16 20pve03 systemd-logind[819]: Removed session 63.
Dec 02 17:49:16 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:49:16 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:49:16 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:49:33 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:49:33 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:49:33 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:49:33 20pve03 sshd[620380]: Accepted publickey for root from 172.16.133.75 port 59660 ssh2: RSA SHA256:1IDKz3mpQb56w5GXd9uHfn1tXX0UD/2y+IDhBebZyBw
Dec 02 17:49:33 20pve03 sshd[620380]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Dec 02 17:49:33 20pve03 systemd-logind[819]: New session 64 of user root.
Dec 02 17:49:33 20pve03 systemd[1]: Started Session 64 of user root.
Dec 02 17:49:33 20pve03 login[620386]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)
Dec 02 17:49:33 20pve03 login[620391]: ROOT LOGIN  on '/dev/pts/0' from '172.16.133.75'
Dec 02 17:50:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:03 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:04 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:05 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:06 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:07 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:22 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:22 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:50:22 20pve03 pmxcfs[3446]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 17)
Dec 02 17:50:23 20pve03 corosync[3444]:   [CFG   ] Config reload requested by node 2
Dec 02 17:50:23 20pve03 corosync[3444]:   [TOTEM ] new config has different address for link 0 (addr changed from 172.16.133.77 to 172.16.133.140). Internal value was NOT changed.
Dec 02 17:50:23 20pve03 corosync[3444]:   [CFG   ] Cannot configure new interface definitions: To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
Dec 02 17:50:23 20pve03 corosync[3444]:   [KNET  ] pmtud: MTU manually set to: 0
Dec 02 17:50:40 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:01 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:01 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:01 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:02 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:03 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:03 20pve03 sshd[620380]: Received disconnect from 172.16.133.75 port 59660:11: disconnected by user
Dec 02 17:51:03 20pve03 sshd[620380]: Disconnected from user root 172.16.133.75 port 59660
Dec 02 17:51:03 20pve03 sshd[620380]: pam_unix(sshd:session): session closed for user root
Dec 02 17:51:03 20pve03 systemd-logind[819]: Session 64 logged out. Waiting for processes to exit.
Dec 02 17:51:03 20pve03 systemd[1]: session-64.scope: Succeeded.
Dec 02 17:51:03 20pve03 systemd-logind[819]: Removed session 64.
Dec 02 17:51:03 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:14 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:14 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:14 20pve03 sshd[621078]: Accepted publickey for root from 172.16.133.75 port 50842 ssh2: RSA SHA256:1IDKz3mpQb56w5GXd9uHfn1tXX0UD/2y+IDhBebZyBw
Dec 02 17:51:14 20pve03 sshd[621078]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Dec 02 17:51:14 20pve03 systemd-logind[819]: New session 65 of user root.
Dec 02 17:51:14 20pve03 systemd[1]: Started Session 65 of user root.
Dec 02 17:51:14 20pve03 login[621084]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)
Dec 02 17:51:14 20pve03 login[621089]: ROOT LOGIN  on '/dev/pts/0' from '172.16.133.75'
Dec 02 17:51:40 20pve03 pmxcfs[3446]: [status] notice: received log
Dec 02 17:51:54 20pve03 sshd[621078]: Received disconnect from 172.16.133.75 port 50842:11: disconnected by user
Dec 02 17:51:54 20pve03 sshd[621078]: Disconnected from user root 172.16.133.75 port 50842
Dec 02 17:51:54 20pve03 sshd[621078]: pam_unix(sshd:session): session closed for user root
Dec 02 17:51:54 20pve03 systemd-logind[819]: Session 65 logged out. Waiting for processes to exit.
Dec 02 17:51:54 20pve03 systemd[1]: session-65.scope: Succeeded.
Dec 02 17:51:54 20pve03 systemd-logind[819]: Removed session 65.
Dec 02 17:51:54 20pve03 pmxcfs[3446]: [status] notice: received log

That is correct ip for link0 and link1 (Datacenter - Cluster section (after changing corosync.conf file)
1669982789237.png

In Datacenter - Summary section (correct ip for management ip)
1669982952088.png
 
Last edited:
@Moayad Hello!
At the moment the problem with adding new node is
Code:
Dec 02 17:50:23 20pve03 corosync[3444]:   [CFG   ] Cannot configure new interface definitions: To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
corosync-cfgtool -s shows that interface is disconected

Code:
root@220pve03:~# corosync-cfgtool -s
Local node ID 10, transport knet
LINK ID 0 udp
        addr    = 172.16.133.140
        status:
                nodeid:          1:     disconnected
                nodeid:          2:     disconnected
                nodeid:          3:     disconnected
                nodeid:          4:     disconnected
                nodeid:          5:     disconnected
                nodeid:          6:     disconnected
                nodeid:          7:     disconnected
                nodeid:          8:     disconnected
                nodeid:          9:     disconnected
                nodeid:         10:     localhost
LINK ID 1 udp
        addr    = 10.222.145.15
        status:
                nodeid:          1:     connected
                nodeid:          2:     connected
                nodeid:          3:     connected
                nodeid:          4:     connected
                nodeid:          5:     connected
                nodeid:          6:     connected
                nodeid:          7:     connected
                nodeid:          8:     connected
                nodeid:          9:     connected
                nodeid:         10:     localhost

The corosynf.conf at the moment use 2 link, hosts file configured for using ip address from cluster vlan.

How could I delete and reconfigure links to solve the problem?
I tried to edit corosync.conf file and delete link0 only on problem node. Should I delete link0 on all nodes and then re-add?
 
Last edited:
@Moayad
At the moment corosync.conf configured correctly but receiving error about deleting interface
Code:
cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: 20pve01
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 172.16.133.136
    ring1_addr: 10.222.145.6
  }
  node {
    name: 20pve02
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 172.16.133.137
    ring1_addr: 10.222.145.7
  }
  node {
    name: 20pve03
    nodeid: 10
    quorum_votes: 1
    ring0_addr: 172.16.133.140
    ring1_addr: 10.222.145.15
  }
  node {
    name: 530pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 172.16.133.134
    ring1_addr: 10.222.145.8
  }
  node {
    name: 530pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 172.16.133.135
    ring1_addr: 10.222.145.9
  }
  node {
    name: bf01pve03
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 172.16.133.138
    ring1_addr: 10.222.145.10
  }
  node {
    name: bf01pve04
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 172.16.133.139
    ring1_addr: 10.222.145.11
  }
  node {
    name: bf02pve01
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 172.16.133.153
    ring1_addr: 10.222.145.25
  }
  node {
    name: bf02pve09
    nodeid: 8
    quorum_votes: 1
    ring0_addr: 172.16.133.154
    ring1_addr: 10.222.145.26
  }
  node {
    name: bf02pve12
    nodeid: 9
    quorum_votes: 1
    ring0_addr: 172.16.133.156
    ring1_addr: 10.222.145.27
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pve-int01
  config_version: 22
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 1
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2

hosts file
Code:
127.0.0.1    localhost

#DC1 w ceph
172.16.133.134    530pve01.kt.kz    530pve01
172.16.133.135    530pve02.kt.kz    530pve02
172.16.133.136    20pve01.kt.kz    20pve01
172.16.133.137    20pve02.kt.kz    20pve02
172.16.133.140    20pve03.kt.kz    20pve03
172.16.133.141    20pve04.kt.kz    20pve04

#DC1 blade-servers
172.16.133.138    bf01pve03.kt.kz    bf01pve03
172.16.133.139    bf01pve04.kt.kz    bf01pve04

#DC2
172.16.133.153    bf02pve01.kt.kz    bf02pve01
172.16.133.154    bf02pve09.kt.kz    bf02pve09
172.16.133.156    bf02pve12.kt.kz    bf02pve12

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
 
Hello,

thank you for the corosync.conf file!

Did you restart the corosync service after you edited the corosync.conf? systemctl restart corosync.service

Are the IPs in the /etc/pve/corosync.conf correct on all nodes?
 
Thank you @Moayad. Look slike that corosync.service was hanging on 1 node. After restarting it new node successfelly added in cluster
Hello,

thank you for the corosync.conf file!

Did you restart the corosync service after you edited the corosync.conf? systemctl restart corosync.service

Are the IPs in the /etc/pve/corosync.conf correct on all nodes?
 
  • Like
Reactions: Moayad

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!