Corosync breaks with third node

rjoensen

Member
Jun 3, 2017
7
0
6
35
Hello,


I have got 3x proxmox nodes in a cluster.


This cluster worked fine up until some 4 hours ago, when the bond between them broke.


I have pinned the issue down to the 3rd node that is causing havoc on the corosync level.


If I stop corosync, the two other nodes join together just fine, but when I start corosync back on the third, this happens.

Code:
root@pve3-ams:/var/log# service corosync start
root@pve3-ams:/var/log# tail -f /var/log/daemon.log | grep corosync
Nov 27 19:19:30 ams-pve3 corosync[5605]:   [KNET  ] pmtud: Global data MTU changed to: 1366
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83040) was formed. Members
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:34 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83052) was formed. Members
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:37 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83064) was formed. Members
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:41 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83076) was formed. Members
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:44 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83088) was formed. Members
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:47 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [TOTEM ] A new membership (3:83100) was formed. Members
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [CPG   ] downlist left_list: 0 received
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [QUORUM] Members[1]: 3
Nov 27 19:19:51 ams-pve3 corosync[5605]:   [MAIN  ] Completed service synchronization, ready to provide service.

^C

root@pve3-ams:/var/log# service corosync stop
root@pve3-ams:/var/log#

Code:
root@pve2-ams:~# tail -f /var/log/daemon.log | grep corosync
Nov 28 06:15:40 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1615 ms
Nov 28 06:15:42 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 3266 ms
Nov 28 06:15:42 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:82972) was formed. Members
Nov 28 06:15:43 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:45 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:46 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:82984) was formed. Members
Nov 28 06:15:46 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:15:47 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:48 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:49 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:82996) was formed. Members
Nov 28 06:15:50 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:15:52 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:52 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83008) was formed. Members
Nov 28 06:15:54 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:55 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:15:56 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83020) was formed. Members
Nov 28 06:15:57 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83028) was formed. Members
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [QUORUM] Members[2]: 1 2
Nov 28 06:15:59 ams-pve2 corosync[36088]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 28 06:19:42 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:44 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:19:44 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83076) was formed. Members
Nov 28 06:19:44 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:19:45 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:19:47 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2887 ms
Nov 28 06:19:47 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83088) was formed. Members
Nov 28 06:19:47 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:19:49 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:50 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:19:51 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83100) was formed. Members
Nov 28 06:19:52 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:54 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:19:54 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83112) was formed. Members
Nov 28 06:19:55 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:19:57 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2889 ms
Nov 28 06:19:58 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83124) was formed. Members
Nov 28 06:19:59 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:00 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:01 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83136) was formed. Members
Nov 28 06:20:02 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:20:04 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:04 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83148) was formed. Members
Nov 28 06:20:04 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:06 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1237 ms
Nov 28 06:20:07 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:08 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83160) was formed. Members
Nov 28 06:20:08 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:09 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:11 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2889 ms
Nov 28 06:20:11 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83172) was formed. Members
Nov 28 06:20:12 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:14 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:14 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83184) was formed. Members
Nov 28 06:20:16 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:17 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2889 ms
Nov 28 06:20:18 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83196) was formed. Members
Nov 28 06:20:18 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:19 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:21 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:21 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83208) was formed. Members
Nov 28 06:20:22 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:24 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 2888 ms
Nov 28 06:20:25 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83220) was formed. Members
Nov 28 06:20:26 ams-pve2 corosync[36088]:   [TOTEM ] Token has not been received in 1238 ms
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [TOTEM ] A new membership (1:83228) was formed. Members
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [CPG   ] downlist left_list: 0 received
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [QUORUM] Members[2]: 1 2
Nov 28 06:20:28 ams-pve2 corosync[36088]:   [MAIN  ] Completed service synchronization, ready to provide service.


Any ideas how to resolve this? I am seeing no obvious errors. Corosync is run on a separate dedicated 1G vlan, and they are all connected to the same switch, this just randomly broke. No updates whatsoever done in the last Month.
 
Last edited by a moderator:
Can you show your /etc/network/interfaces (obfuscate any public IPs) and your /etc/corosync/corosync.conf files?
 
Hello,


third node;

auto lo
iface lo inet loopback
iface eno1 inet manual
#PUBLIC
iface eno2 inet manual
#PRIVATE
auto vmbr0
iface vmbr0 inet static
address <snipped>
netmask 29
gateway <snipped>
bridge-ports eno1
bridge-stp off
bridge-fd 0
#PUBLIC
auto vmbr10
iface vmbr10 inet static
address 10.65.0.33
netmask 24
bridge-ports eno2
bridge-stp off
bridge-fd 0
#PRIVATE

logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1-ams
nodeid: 1
quorum_votes: 1
ring0_addr: 10.65.0.11
}
node {
name: pve2-ams
nodeid: 2
quorum_votes: 1
ring0_addr: 10.65.0.22
}
node {
name: pve3-ams
nodeid: 3
quorum_votes: 1
ring0_addr: 10.65.0.33
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: NK-AMS-DC01
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}
 
Hello,

2nd node:

auto lo
iface lo inet loopback
iface eno1 inet manual
#PUBLIC
iface eno2 inet manual
#PRIVATE
auto vmbr0
iface vmbr0 inet static
address <snipped>
netmask 29
gateway <snipped>
bridge-ports eno1
bridge-stp off
bridge-fd 0
#PUBLIC
auto vmbr10
iface vmbr10 inet static
address 10.65.0.22
netmask 24
bridge-ports eno2
bridge-stp off
bridge-fd 0
#PRIVATE

logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1-ams
nodeid: 1
quorum_votes: 1
ring0_addr: 10.65.0.11
}
node {
name: pve2-ams
nodeid: 2
quorum_votes: 1
ring0_addr: 10.65.0.22
}
node {
name: pve3-ams
nodeid: 3
quorum_votes: 1
ring0_addr: 10.65.0.33
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: NK-AMS-DC01
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}
 
Hello

1st node


auto lo
iface lo inet loopback
iface eno1 inet manual
#PUBLIC
iface eno2 inet manual
#PRIVATE
auto vmbr0
iface vmbr0 inet static
address <snipped>
netmask 29
gateway <snipped>
bridge-ports eno1
bridge-stp off
bridge-fd 0
#PUBLIC
auto vmbr10
iface vmbr10 inet static
address 10.65.0.11
netmask 24
bridge-ports eno2
bridge-stp off
bridge-fd 0
#PRIVATE

logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1-ams
nodeid: 1
quorum_votes: 1
ring0_addr: 10.65.0.11
}
node {
name: pve2-ams
nodeid: 2
quorum_votes: 1
ring0_addr: 10.65.0.22
}
node {
name: pve3-ams
nodeid: 3
quorum_votes: 1
ring0_addr: 10.65.0.33
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: NK-AMS-DC01
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}
 
Hello,

I disabled ipv6 in sysctl, restarted corosync on all nodes, and then did a pve-cluster restart on node3 and they all came back into a cluster.

Continuing to monitor.

This issue randomly happened after 6 months of uptime and no issues whatsoever.

Thanks,
Ragnar
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!