Happen problem when add a node in cluster on PVE 7.0

CrystalCat

New Member
Jul 23, 2021
9
0
1
30
I created a cluster with 2 nodes and it works correctly. But today when I add a new node(the No.3 Node) to the cluster, it can not succeed. The master shows that the lastest add node is down, but the node is up actually. Each node have two networking adapter, one is used for internet, another is used for cluster only.
The version is Virtual Environment 7.0-8


I was trying to delete node 04 on master and reinstall node04 but the problem is still not solved, the different nodes can ping each other.


1627883867993.png
the logs in master 01:
root@master01:/etc# journalctl -b -u corosync|tail -n 30
Aug 01 20:47:59 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:47:59 master01 corosync[9496]: [TOTEM ] A new membership (1.1b47) was formed. Members
Aug 01 20:48:01 master01 corosync[9496]: [TOTEM ] Token has not been received in 2738 ms
Aug 01 20:48:05 master01 corosync[9496]: [TOTEM ] Token has not been received in 6389 ms
Aug 01 20:48:07 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:48:07 master01 corosync[9496]: [TOTEM ] A new membership (1.1b53) was formed. Members
Aug 01 20:48:10 master01 corosync[9496]: [TOTEM ] Token has not been received in 2738 ms
Aug 01 20:48:14 master01 corosync[9496]: [TOTEM ] Token has not been received in 6388 ms
Aug 01 20:48:16 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:48:16 master01 corosync[9496]: [TOTEM ] A new membership (1.1b5f) was formed. Members
Aug 01 20:48:19 master01 corosync[9496]: [TOTEM ] Token has not been received in 2738 ms
Aug 01 20:48:23 master01 corosync[9496]: [TOTEM ] Token has not been received in 6403 ms
Aug 01 20:48:25 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:48:25 master01 corosync[9496]: [TOTEM ] A new membership (1.1b6f) was formed. Members
Aug 01 20:48:28 master01 corosync[9496]: [TOTEM ] Token has not been received in 2737 ms
Aug 01 20:48:31 master01 corosync[9496]: [TOTEM ] Token has not been received in 6388 ms
Aug 01 20:48:34 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:48:34 master01 corosync[9496]: [TOTEM ] A new membership (1.1b7b) was formed. Members
Aug 01 20:48:36 master01 corosync[9496]: [TOTEM ] Token has not been received in 2741 ms
Aug 01 20:48:40 master01 corosync[9496]: [TOTEM ] Token has not been received in 6438 ms
Aug 01 20:48:42 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:48:42 master01 corosync[9496]: [TOTEM ] A new membership (1.1b8b) was formed. Members
Aug 01 20:48:45 master01 corosync[9496]: [TOTEM ] Token has not been received in 2817 ms
Aug 01 20:48:49 master01 corosync[9496]: [TOTEM ] Token has not been received in 6438 ms
Aug 01 20:48:51 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:48:51 master01 corosync[9496]: [TOTEM ] A new membership (1.1b9b) was formed. Members
Aug 01 20:48:54 master01 corosync[9496]: [TOTEM ] Token has not been received in 2738 ms
Aug 01 20:48:58 master01 corosync[9496]: [TOTEM ] Token has not been received in 6389 ms
Aug 01 20:49:00 master01 corosync[9496]: [QUORUM] Sync members[2]: 1 2
Aug 01 20:49:00 master01 corosync[9496]: [TOTEM ] A new membership (1.1ba7) was formed. Members
root@master01:/etc#

the logs in node02:
root@node02:~# journalctl -b -u corosync|tail -n 30
Aug 01 22:41:48 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:41:48 node02 corosync[5968]: [TOTEM ] A new membership (1.1927) was formed. Members
Aug 01 22:41:50 node02 corosync[5968]: [TOTEM ] Token has not been received in 2738 ms
Aug 01 22:41:54 node02 corosync[5968]: [TOTEM ] Token has not been received in 6388 ms
Aug 01 22:41:56 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:41:56 node02 corosync[5968]: [TOTEM ] A new membership (1.1933) was formed. Members
Aug 01 22:41:59 node02 corosync[5968]: [TOTEM ] Token has not been received in 2737 ms
Aug 01 22:42:03 node02 corosync[5968]: [TOTEM ] Token has not been received in 6389 ms
Aug 01 22:42:05 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:42:05 node02 corosync[5968]: [TOTEM ] A new membership (1.1947) was formed. Members
Aug 01 22:42:08 node02 corosync[5968]: [TOTEM ] Token has not been received in 2759 ms
Aug 01 22:42:12 node02 corosync[5968]: [TOTEM ] Token has not been received in 6388 ms
Aug 01 22:42:14 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:42:14 node02 corosync[5968]: [TOTEM ] A new membership (1.1953) was formed. Members
Aug 01 22:42:17 node02 corosync[5968]: [TOTEM ] Token has not been received in 2737 ms
Aug 01 22:42:20 node02 corosync[5968]: [TOTEM ] Token has not been received in 6388 ms
Aug 01 22:42:23 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:42:23 node02 corosync[5968]: [TOTEM ] A new membership (1.195f) was formed. Members
Aug 01 22:42:25 node02 corosync[5968]: [TOTEM ] Token has not been received in 2738 ms
Aug 01 22:42:29 node02 corosync[5968]: [TOTEM ] Token has not been received in 6438 ms
Aug 01 22:42:31 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:42:31 node02 corosync[5968]: [TOTEM ] A new membership (1.196f) was formed. Members
Aug 01 22:42:34 node02 corosync[5968]: [TOTEM ] Token has not been received in 2737 ms
Aug 01 22:42:38 node02 corosync[5968]: [TOTEM ] Token has not been received in 6412 ms
Aug 01 22:42:41 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:42:41 node02 corosync[5968]: [TOTEM ] A new membership (1.197b) was formed. Members
Aug 01 22:42:48 node02 corosync[5968]: [MAIN ] Corosync main process was not scheduled (@1627882968069) for 3542.7036 ms (threshold is 2920.0000 ms). Consider token timeout increase.
Aug 01 22:42:50 node02 corosync[5968]: [QUORUM] Sync members[2]: 1 2
Aug 01 22:42:50 node02 corosync[5968]: [TOTEM ] A new membership (1.1987) was formed. Members
Aug 01 22:42:52 node02 corosync[5968]: [TOTEM ] Token has not been received in 2858 ms

the logs in node 04


root@node04:~# journalctl -b -u corosync|tail -n 30
Aug 01 23:01:03 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:01:03 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 01 23:01:12 node04 corosync[4271]: [QUORUM] Sync members[1]: 3
Aug 01 23:01:12 node04 corosync[4271]: [TOTEM ] A new membership (3.1fef) was formed. Members
Aug 01 23:01:12 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:01:12 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 01 23:01:21 node04 corosync[4271]: [QUORUM] Sync members[1]: 3
Aug 01 23:01:21 node04 corosync[4271]: [TOTEM ] A new membership (3.1ffb) was formed. Members
Aug 01 23:01:21 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:01:21 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 01 23:01:29 node04 corosync[4271]: [QUORUM] Sync members[1]: 3
Aug 01 23:01:29 node04 corosync[4271]: [TOTEM ] A new membership (3.2007) was formed. Members
Aug 01 23:01:29 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:01:29 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 01 23:01:38 node04 corosync[4271]: [QUORUM] Sync members[1]: 3
Aug 01 23:01:38 node04 corosync[4271]: [TOTEM ] A new membership (3.2013) was formed. Members
Aug 01 23:01:38 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:01:38 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 01 23:01:47 node04 corosync[4271]: [QUORUM] Sync members[1]: 3
Aug 01 23:01:47 node04 corosync[4271]: [TOTEM ] A new membership (3.2023) was formed. Members
Aug 01 23:01:47 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:01:47 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 01 23:01:56 node04 corosync[4271]: [QUORUM] Sync members[1]: 3
Aug 01 23:01:56 node04 corosync[4271]: [TOTEM ] A new membership (3.202f) was formed. Members
Aug 01 23:01:56 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:01:56 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 01 23:02:05 node04 corosync[4271]: [QUORUM] Sync members[1]: 3
Aug 01 23:02:05 node04 corosync[4271]: [TOTEM ] A new membership (3.203b) was formed. Members
Aug 01 23:02:05 node04 corosync[4271]: [QUORUM] Members[1]: 3
Aug 01 23:02:05 node04 corosync[4271]: [MAIN ] Completed service synchronization, ready to provide service.

Configuration:
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: master01
nodeid: 1
quorum_votes: 1
ring0_addr: 10.10.200.1
}
node {
name: node02
nodeid: 2
quorum_votes: 1
ring0_addr: 10.10.200.2
}
node {
name: node04
nodeid: 3
quorum_votes: 1
ring0_addr: 10.10.200.4
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Dev-LAX
config_version: 5
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
 
Last edited:
Aug 01 22:42:48 node02 corosync[5968]: [MAIN ] Corosync main process was not scheduled (@1627882968069) for 3542.7036 ms (threshold is 2920.0000 ms). Consider token timeout increase.

and the token round trip times don't sound like you have a healthy system/network... are the nodes all on the same local network?
 
Aug 01 22:42:48 node02 corosync[5968]: [MAIN ] Corosync main process was not scheduled (@1627882968069) for 3542.7036 ms (threshold is 2920.0000 ms). Consider token timeout increase.

and the token round trip times don't sound like you have a healthy system/network... are the nodes all on the same local network?
Three Nodes on one Nexus 3064 switch in the same vlan...Idk if I need to do some configuration on the switch? Three Node can ping each other with lower than 1 ms latency
 
Here is some info from the node04 when I try again this afternoon, I reinstall the node04 again and using another address in same network(10.10.200.6), it seems not to work

The ping from master to node04
root@master01:~# ping 10.10.200.6
PING 10.10.200.6 (10.10.200.6) 56(84) bytes of data.
64 bytes from 10.10.200.6: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.10.200.6: icmp_seq=2 ttl=64 time=0.051 ms
64 bytes from 10.10.200.6: icmp_seq=3 ttl=64 time=0.050 ms
64 bytes from 10.10.200.6: icmp_seq=4 ttl=64 time=0.057 ms
64 bytes from 10.10.200.6: icmp_seq=5 ttl=64 time=0.056 ms


Task viewer: Join Cluster

OutputStatus

Stop
Establishing API connection with host '10.10.200.1'
Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP '10.10.200.6'
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
 
Last edited:
the pvecm status on master:
Cluster information
-------------------
Name: Codec-Dev-LAX
Config Version: 7
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Aug 2 09:48:02 2021
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.2503
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.200.1 (local)
0x00000002 1 10.10.200.2






The pvecm status on node04:
Cluster information
-------------------
Name: Codec-Dev-LAX
Config Version: 7
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Aug 2 09:46:44 2021
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000003
Ring ID: 3.264b
Quorate: No

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 1
Quorum: 2 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.10.200.6 (local)
 
If I add a node in cluster using SSH,it will blocking on quorum


Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP '10.10.200.6'
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1627934015.sql.gz'
waiting for quorum...
If I add a node in cluster using SSH,it will blocking on quorum
 
could you please post the output of
pveversion -v
cat /etc/pve/corosync.conf
corosync-cfgtool -sb

on each node?
 
could you please post the output of
pveversion -v
cat /etc/pve/corosync.conf
corosync-cfgtool -sb

on each node?
all node is this configuraton:
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: master01
nodeid: 1
quorum_votes: 1
ring0_addr: 10.10.200.1
}
node {
name: node02
nodeid: 2
quorum_votes: 1
ring0_addr: 10.10.200.2
}
node {
name: node04
nodeid: 3
quorum_votes: 1
ring0_addr: 10.10.200.6
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Dev-LAX
config_version: 5
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
proxmox-ve: 7.0-2 (running kernel: 5.11.22-1-pve)
pve-manager: 7.0-8 (running version: 7.0-8/b1dbf562)
pve-kernel-5.11: 7.0-3
pve-kernel-helper: 7.0-3
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph: 16.2.5-pve1
ceph-fuse: 16.2.5-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.0.0-1+pve5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.1.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-4
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-7
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-2
lxcfs: 4.0.8-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.1-1
proxmox-backup-file-restore: 2.0.1-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.2-4
pve-cluster: 7.0-3
pve-container: 4.0-5
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-7
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.4-pve1
this is before I rebuild my cluster.

this is my system software configuration
I was solved the problem by separate each node and then rebuild the cluster, it works well now with 4 nodes,thanks for your help
 
I've had similar issues which I was able to fix. The issue was due to one of the nodes had incorrect /etc/hosts files which had aliased `pvelocalhost` to other node, not to itself (copy-paste issue on my fault). I've fixed it, restarted corosync on that node & after that new node joined on pvecm status.
I've also had another issue after - missing SSL certs for new node. Somehow they got deleted during that unsuccessful node add attempt. I.e. /etc/pve/nodes/node3/pve-ssl.* was missing. That was fixed by running `pvecm updatecerts --force`.
 
  • Like
Reactions: rlljorge

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!