Corosync breaks with third node

rjoensen

Active Member
Jun 3, 2017
10
0
41
36
Hello,


I have got 3x proxmox nodes in a cluster.


This cluster worked fine up until some 4 hours ago, when the bond between them broke.


I have pinned the issue down to the 3rd node that is causing havoc on the corosync level.


If I stop corosync, the two other nodes join together just fine, but when I start corosync back on the third, this happens.

Code:
root@pve3-ams:/var/log# service corosync start
root@pve3-ams:/var/log# tail -f /var/log/daemon.log | grep corosync
Nov 27 19:19:30 ams-pve3 corosync[5605]:   [KNET  ] pmtud: Global data MTU changed to: 1366
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83040) was formed. Members
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83052) was formed. Members
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83064) was formed. Members
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83076) was formed. Members
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83088) was formed. Members
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83100) was formed. Members
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.

^C

root@pve3-ams:/var/log# service corosync stop
root@pve3-ams:/var/log#

Code:
root@pve2-ams:~# tail -f /var/log/daemon.log | grep corosync
Nov 28 06:15:40 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1615 ms
Nov 28 06:15:42 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 3266 ms
Nov 28 06:15:42 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:82972) was formed. Members
Nov 28 06:15:43 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:45 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:46 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:82984) was formed. Members
Nov 28 06:15:46 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:15:47 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:48 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:49 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:82996) was formed. Members
Nov 28 06:15:50 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:15:52 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:52 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83008) was formed. Members
Nov 28 06:15:54 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:55 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:56 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83020) was formed. Members
Nov 28 06:15:57 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83028) was formed. Members
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [QUORUM] Members[2]: 1 2
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 28 06:19:42 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:44 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:19:44 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83076) was formed. Members
Nov 28 06:19:44 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:19:45 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:19:47 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2887 ms
Nov 28 06:19:47 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83088) was formed. Members
Nov 28 06:19:47 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:19:49 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:50 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:19:51 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83100) was formed. Members
Nov 28 06:19:52 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:54 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:19:54 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83112) was formed. Members
Nov 28 06:19:55 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:57 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2889 ms
Nov 28 06:19:58 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83124) was formed. Members
Nov 28 06:19:59 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:00 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:01 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83136) was formed. Members
Nov 28 06:20:02 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:20:04 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:04 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83148) was formed. Members
Nov 28 06:20:04 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:06 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:20:07 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:08 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83160) was formed. Members
Nov 28 06:20:08 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:09 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:11 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2889 ms
Nov 28 06:20:11 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83172) was formed. Members
Nov 28 06:20:12 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:14 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:14 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83184) was formed. Members
Nov 28 06:20:16 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:17 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2889 ms
Nov 28 06:20:18 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83196) was formed. Members
Nov 28 06:20:18 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:19 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:21 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:21 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83208) was formed. Members
Nov 28 06:20:22 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:24 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:25 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83220) was formed. Members
Nov 28 06:20:26 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83228) was formed. Members
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [QUORUM] Members[2]: 1 2
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [MAIN  ] Completed service synchronization, ready to provide service.


Any ideas how to resolve this? I am seeing no obvious errors. Corosync is run on a separate dedicated 1G vlan, and they are all connected to the same switch, this just randomly broke. No updates whatsoever done in the last Month.
 
Last edited by a moderator:
Can you show your /etc/network/interfaces (obfuscate any public IPs) and your /etc/corosync/corosync.conf files?
 
Hello,


third node;

auto lo
iface lo inet loopback
iface eno1 inet manual
#PUBLIC
iface eno2 inet manual
#PRIVATE
auto vmbr0
iface vmbr0 inet static
address <snipped>
netmask 29
gateway <snipped>
bridge-ports eno1
bridge-stp off
bridge-fd 0
#PUBLIC
auto vmbr10
iface vmbr10 inet static
address 10.65.0.33
netmask 24
bridge-ports eno2
bridge-stp off
bridge-fd 0
#PRIVATE

logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1-ams
nodeid: 1
quorum_votes: 1
ring0_addr: 10.65.0.11
}
node {
name: pve2-ams
nodeid: 2
quorum_votes: 1
ring0_addr: 10.65.0.22
}
node {
name: pve3-ams
nodeid: 3
quorum_votes: 1
ring0_addr: 10.65.0.33
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: NK-AMS-DC01
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}
 
Hello,

2nd node:

auto lo
iface lo inet loopback
iface eno1 inet manual
#PUBLIC
iface eno2 inet manual
#PRIVATE
auto vmbr0
iface vmbr0 inet static
address <snipped>
netmask 29
gateway <snipped>
bridge-ports eno1
bridge-stp off
bridge-fd 0
#PUBLIC
auto vmbr10
iface vmbr10 inet static
address 10.65.0.22
netmask 24
bridge-ports eno2
bridge-stp off
bridge-fd 0
#PRIVATE

logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1-ams
nodeid: 1
quorum_votes: 1
ring0_addr: 10.65.0.11
}
node {
name: pve2-ams
nodeid: 2
quorum_votes: 1
ring0_addr: 10.65.0.22
}
node {
name: pve3-ams
nodeid: 3
quorum_votes: 1
ring0_addr: 10.65.0.33
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: NK-AMS-DC01
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}
 
Hello

1st node


auto lo
iface lo inet loopback
iface eno1 inet manual
#PUBLIC
iface eno2 inet manual
#PRIVATE
auto vmbr0
iface vmbr0 inet static
address <snipped>
netmask 29
gateway <snipped>
bridge-ports eno1
bridge-stp off
bridge-fd 0
#PUBLIC
auto vmbr10
iface vmbr10 inet static
address 10.65.0.11
netmask 24
bridge-ports eno2
bridge-stp off
bridge-fd 0
#PRIVATE

logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1-ams
nodeid: 1
quorum_votes: 1
ring0_addr: 10.65.0.11
}
node {
name: pve2-ams
nodeid: 2
quorum_votes: 1
ring0_addr: 10.65.0.22
}
node {
name: pve3-ams
nodeid: 3
quorum_votes: 1
ring0_addr: 10.65.0.33
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: NK-AMS-DC01
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}
 
Hello,

I disabled ipv6 in sysctl, restarted corosync on all nodes, and then did a pve-cluster restart on node3 and they all came back into a cluster.

Continuing to monitor.

This issue randomly happened after 6 months of uptime and no issues whatsoever.

Thanks,
Ragnar