[TOTEM ] Retransmit List

Dexoid

New Member
Oct 23, 2016
4
0
1
34
Hello.
I have a 4-node cluster. Some of them have been updated to version 5.1. I noticed the following messages in the log:
Nov 27 11:18:47 pve0 corosync[6600]: [TOTEM ] Retransmit List: 85cc
Nov 27 11:18:47 pve0 corosync[6600]: [TOTEM ] Retransmit List: 85cc
Nov 27 11:19:17 pve0 corosync[6600]: [TOTEM ] Retransmit List: 8672 8673
Nov 27 11:19:28 pve0 corosync[6600]: [TOTEM ] Retransmit List: 86b1
...
corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve1
nodeid: 2
quorum_votes: 1
ring0_addr: pve1
}

node {
name: pve2
nodeid: 3
quorum_votes: 1
ring0_addr: pve2
}

node {
name: pve4
nodeid: 5
quorum_votes: 1
ring0_addr: pve4
}

node {
name: pve3
nodeid: 4
quorum_votes: 1
ring0_addr: pve3
}

node {
name: pve0
nodeid: 1
quorum_votes: 1
ring0_addr: pve0
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: www-klass
config_version: 7
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.10.253
ringnumber: 0
}
}

Quorum information
------------------
Date: Mon Nov 27 11:28:42 2017
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000001
Ring ID: 5/1028
Quorate: Yes

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 5
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000005 1 192.168.10.250
0x00000004 1 192.168.10.251
0x00000003 1 192.168.10.252
0x00000001 1 192.168.10.253 (local)
0x00000002 1 192.168.10.254


This problem can be due to the fact that some nodes of version 4.x, some 5.1?
 
on pve1, try iptables -A OUTPUT -p udp --dport 5405 -j DROP
messages in the logs on other machines stopped. What could be the reason?
 
on pve1, try iptables -A OUTPUT -p udp --dport 5405 -j DROP
messages in the logs on other machines stopped. What could be the reason?
You just blocked corosync. Now the pmxcfs stopped working on that node and it also may go into reboot as it got fenced (no quorum).
Code:
netstat -plantu | grep 54

This problem can be due to the fact that some nodes of version 4.x, some 5.1?
Are the PVE4.x hosts on the latest packages?

Nov 27 11:18:47 pve0 corosync[6600]: [TOTEM ] Retransmit List: 85cc
You should see more then just the retransmit, most likely you have other traffic on the cluster network that is interrupting.
 
Managed to find the culprit on Proxmox VE 5.1.42 kernel 4.13.16-1. It was caused by IPv6 multicast.

I use cheap D-link 1GbE switch across 4 Proxmox hosts and after disabling all IPv6 addresses on both hosts and guests, the Corosync totem retransmit list log spam disappears in syslog.

I'm not networking guy and hope this helps someone from headaches in the future.

Update: the totem retransmit occurs again in daylight. *sigh
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!