The cluster crashes when the 14th node is added

ngwt

New Member
Jul 29, 2023
13
0
1
Hello everyone,The Proxmox version of cluster nodes is 7.4.3. In my cluster, there are 13 nodes. When I add the 14th node, the cluster will experience the same situation as Figure 1. I must detach the 14th node from the cluster, otherwise it will continue to be like this. Why is this and what is the solutio

The log content when adding nodes is:
root@hddmass3:~# tail -f /var/log/daemon.log
Aug 21 19:57:01 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 8101 ms
Aug 21 19:57:11 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 18955 ms
Aug 21 19:57:22 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 29761 ms
Aug 21 19:57:33 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 19:57:51 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 19:58:02 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 18908 ms
Aug 21 19:58:13 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 29713 ms
Aug 21 19:58:23 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 19:58:28 hddmass3 systemd[1]: Started Session 16 of user root.
Aug 21 19:58:34 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 18908 ms
Aug 21 19:58:44 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 19:59:03 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 19:59:21 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 19:59:32 hddmass3 corosync[75652]: [TOTEM ] Token has not been received in 18909 ms
 

Attachments

  • 1.png
    1.png
    31.9 KB · Views: 9
It doesn't seem too good. I executed the system ctl stop pve ha crm on all nodes, but the same logs still appeared
root@hddmass3:~# tail -f /var/log/daemon.log
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 10 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 9 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: Global data MTU changed to: 1397
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [dcdb] notice: received sync request (epoch 1/896371/00000013)
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: received sync request (epoch 1/896371/00000013)
Aug 21 21:14:48 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:15:07 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:15:23 hddmass3 systemd[1]: Started Session 3 of user root.
Aug 21 21:15:23 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:15:42 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:05 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:16 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:32 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:48 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:04 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:20 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:39 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:55 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:11 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:21 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:34 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:50 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:19:06 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:19:17 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:19:35 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
^C
root@hddmass3:~# tail -100 /var/log/daemon.log
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 3 has no active links
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 0)
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 4 has no active links
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 4 has no active links
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Aug 21 21:14:34 hddmass3 corosync[6577]: [KNET ] host: host: 4 has no active links
Aug 21 21:14:35 hddmass3 systemd[1]: Started The Proxmox VE cluster filesystem.
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: update cluster info (cluster name datacenter, version = 18)
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [dcdb] notice: members: 14/6579
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [dcdb] notice: all data is up to date
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: members: 14/6579
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: all data is up to date
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 4 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 4 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 3 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 2 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 13 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 13 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 1 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 1 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 12 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 12 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 11 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 11 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 8 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 8 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 6 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 6 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 7 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 7 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 5 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 10 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 10 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] rx: host: 9 link: 0 is up
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] link: Resetting MTU for link 0 because host 9 joined
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 13 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 12 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 11 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 7 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 10 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] host: host: 9 (passive) best link: 0 (pri: 1)
Aug 21 21:14:40 hddmass3 corosync[6577]: [QUORUM] Sync members[14]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Aug 21 21:14:40 hddmass3 corosync[6577]: [QUORUM] Sync joined[13]: 1 2 3 4 5 6 7 8 9 10 11 12 13
Aug 21 21:14:40 hddmass3 corosync[6577]: [TOTEM ] A new membership (1.674b) was formed. Members joined: 1 2 3 4 5 6 7 8 9 10 11 12 13
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [dcdb] notice: members: 1/896371, 2/49585, 3/248227, 4/695101, 5/803148, 6/780170, 7/2726796, 8/3230847, 9/845827, 10/917885, 11/1340623, 12/1539750, 13/34286, 14/6579
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [dcdb] notice: starting data syncronisation
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: members: 1/896371, 2/49585, 3/248227, 4/695101, 5/803148, 6/780170, 7/2726796, 8/3230847, 9/845827, 10/917885, 11/1340623, 12/1539750, 13/34286, 14/6579
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: starting data syncronisation
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 13 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 12 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 11 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 8 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [QUORUM] This node is within the primary component and will provide service.
Aug 21 21:14:40 hddmass3 corosync[6577]: [QUORUM] Members[14]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Aug 21 21:14:40 hddmass3 corosync[6577]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: node has quorum
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 7 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 6 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 10 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: PMTUD link change for host: 9 link: 0 from 469 to 1397
Aug 21 21:14:40 hddmass3 corosync[6577]: [KNET ] pmtud: Global data MTU changed to: 1397
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [dcdb] notice: received sync request (epoch 1/896371/00000013)
Aug 21 21:14:40 hddmass3 pmxcfs[6579]: [status] notice: received sync request (epoch 1/896371/00000013)
Aug 21 21:14:48 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:15:07 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:15:23 hddmass3 systemd[1]: Started Session 3 of user root.
Aug 21 21:15:23 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:15:42 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:05 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:16 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:32 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:16:48 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:04 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:20 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:39 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:17:55 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:11 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:21 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:34 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:18:50 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:19:06 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:19:17 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
Aug 21 21:19:35 hddmass3 corosync[6577]: [TOTEM ] Token has not been received in 8100 ms
And my cluster is not configured with HA, not a single one
 
Configure:
token: 15000
token_retransmits_before_loss_const: 10



On totem {} zone on /etc/pve/corosync.conf, restart corosync service on all nodes and check.
 
Is this a bug in this version?
none that I am aware of, but potentially it's just the limit of your network without further tuning of the configs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!