corosync[364738]: [TOTEM ] Retransmit List: ca8 ca9 caa cab ...

dank

New Member
Nov 18, 2011
20
2
1
Hi,

have a cluster of nodes with different powerfull hardware. After a while, the cluster loses nodes (going red in the UI). VMs are still running. The logfile produces a lot of

corosync[364738]: [TOTEM ] Retransmit List: ca8 ca9 caa cab ...

Messages. Google means http://www.hastexo.com/resources/hints-and-kinks/whats-totem-retransmit-list-all-about-corosync. There was also a former post here in this forum (link). But no solution/answer.

Is there a "proxmox" way to change the mentioned window_size? Or solve the problem in another way?

I thinks, this problem is essentialt, since the usual way to build a cluster is: buy less nodes and inrease the number continuously.

Regards
 
the thread you linked explains how to set custom values in cluster.conf. try it.
 
See "man cman" and "man corosync.conf"

You can set the value in cluster.conf with:

<totem window_size="50"/>
 
from man corosync.conf default value is allready 50.

Code:
window_size 
This constant specifies the maximum number of messages that may  be sent on one token rotation. If all processors perform equally well,  this value could be large (300),
which would introduce higher latency from origination to  delivery for very large rings.
To reduce latency in large rings(16+),  the defaults are a safe compromise. If 1 or more slow [B]processor[/B](s) are present among  fast processors,
window_size should be no larger then 256000 / netmtu  to avoid overflow of the kernel receive buffers.
The user is notified of this by the display  of a retransmit list in the notification logs.
There is no loss of data,  but performance is reduced when these errors occur. The default is 50 messages.

Is it a better idea to adgust the token value instead? at 20000 value maybe?

Code:
token
This timeout specifies in milliseconds until a token loss is declared  after not receiving a token. This is the time spent detecting a failure 
of a processor in the current configuration. Reforming a new configuration takes about  50 milliseconds in addition to this timeout. The default is 1000 milliseconds.


http://docs.openstack.org/high-availability-guide/content/_setting_up_corosync.html
https://access.redhat.com/documenta...dministration/s1-creating-cluster-cli-CA.html
 
My token is configured to 54000. I cannot remember why but my synchronization never gives any kind of problem with this setting.
 
Thank you mir. I will try a bigger value.
Restarting each node after migrating all kvms to other nodes will be ok, yes?
I fear that i may have problems since some nodes will have different token values for some hours untill the last node gets restarted.

I guess default values are not enough as you scale a proxmox cluster.
 
It should be save to restart a node when all VM's are migrated to other servers.

I think you are right in the assumption that the default is not enough when scaling. Also a mix between fast and slow performing nodes could also be the cause.
 
Wondering if anyone ever figured this out?
We're also seeing TOTEM retrans but only on a jumbo framed ring of two rings not on our public default 1500 mtu NW.
 
Retransmits indicate that the network latency is too high or the package is lost.

When you got this on an MTU 9000 network, I guess this network is a storage network too?
If so the retransmit would be normal because storage will raise the latency on a network.
 
Retransmits indicate that the network latency is too high or the package is lost.

When you got this on an MTU 9000 network, I guess this network is a storage network too?
If so the retransmit would be normal because storage will raise the latency on a network.
Yes it's a storage NW, also seeing the same issue on our production cluster configured similar though on different HW.
But this testlab is a 'pretty idle' and the retrans are occurring constantly, got both rings set at netmtu 1500 even the ring0 bonded NIC is at mtu 9000:

corosync.conf:

logging {
debug: off
syslog_facility: local7
to_syslog: yes
}

nodelist {
node {
name: node5
nodeid: 3
quorum_votes: 1
ring0_addr: 10.0.0.55
ring1_addr: <public net>.21
}
node {
name: node6
nodeid: 2
quorum_votes: 1
ring0_addr: 10.0.0.56
ring1_addr: <public net>.22
}
node {
name: node7
nodeid: 1
quorum_votes: 1
ring0_addr: 10.0.0.57
ring1_addr: <public net>.23
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: sprawl5
config_version: 3
interface {
bindnetaddr: 10.0.0.0
mcastport: 5409
netmtu: 1500
ringnumber: 0
transport: udpu
}
interface {
bindnetaddr: <public net>.0
mcastport: 5411
netmtu: 1500
ringnumber: 1
transport: udp
}
ip_version: ipv4
rrp_mode: passive
secauth: on
version: 2
}

Retrans are seen constantly it seems:

Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13081e 130820 130822 130824 130826
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13081e 130822 130826
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13081e 130822 130826
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 130828
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13082c
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13082c
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13082c
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13082e
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Retransmit List: 13082e
Jun 15 06:25:04 node5 corosync[2126]: [TOTEM ] Marking ringid 0 interface 10.0.0.55 FAULTY
Jun 15 06:25:05 node5 corosync[2126]: [TOTEM ] Automatically recovered ring 0
Jun 15 06:25:09 node5 corosync[2126]: [TOTEM ] Retransmit List: 13083e
Jun 15 06:25:09 node5 corosync[2126]: [TOTEM ] Retransmit List: 130840
Jun 15 06:25:09 node5 corosync[2126]: [TOTEM ] Retransmit List: 130840 130842
Jun 15 06:25:09 node5 corosync[2126]: [TOTEM ] Retransmit List: 130842
Jun 15 06:25:09 node5 corosync[2126]: [TOTEM ] Retransmit List: 130843
Jun 15 06:25:09 node5 corosync[2126]: [TOTEM ] Retransmit List: 130843
Jun 15 06:25:14 node5 corosync[2126]: [TOTEM ] Retransmit List: 130850 130852 130854
Jun 15 06:25:14 node5 corosync[2126]: [TOTEM ] Retransmit List: 130850 130852 130854
Jun 15 06:25:14 node5 corosync[2126]: [TOTEM ] Retransmit List: 130857 130859 13085b
...
Jun 15 06:25:24 node5 corosync[2126]: [TOTEM ] Retransmit List: 130893
Jun 15 06:25:24 node5 corosync[2126]: [TOTEM ] Retransmit List: 130893
Jun 15 06:25:24 node5 corosync[2126]: [TOTEM ] Retransmit List: 130893
Jun 15 06:25:25 node5 corosync[2126]: [TOTEM ] Retransmit List: 130896
Jun 15 06:25:25 node5 corosync[2126]: [TOTEM ] Retransmit List: 130896
Jun 15 06:25:25 node5 corosync[2126]: [TOTEM ] Marking ringid 0 interface 10.0.0.55 FAULTY
Jun 15 06:25:26 node5 corosync[2126]: [TOTEM ] Automatically recovered ring 0
Jun 15 06:25:29 node5 corosync[2126]: [TOTEM ] Retransmit List: 1308a3
Jun 15 06:25:29 node5 corosync[2126]: [TOTEM ] Retransmit List: 1308a5
Jun 15 06:25:29 node5 corosync[2126]: [TOTEM ] Retransmit List: 1308a5
Jun 15 06:25:29 node5 corosync[2126]: [TOTEM ] Retransmit List: 1308a6
Jun 15 06:25:29 node5 corosync[2126]: [TOTEM ] Retransmit List: 1308a8
...
Jun 15 11:16:55 node5 corosync[2126]: [TOTEM ] Retransmit List: 147043
Jun 15 11:16:55 node5 corosync[2126]: [TOTEM ] Marking ringid 0 interface 10.0.0.55 FAULTY
Jun 15 11:16:56 node5 corosync[2126]: [TOTEM ] Automatically recovered ring 0
Jun 15 11:16:58 node5 corosync[2126]: [TOTEM ] Retransmit List: 14704b
Jun 15 11:16:58 node5 corosync[2126]: [TOTEM ] Retransmit List: 14704b
 
What NIC HW do you use?
 
What NIC HW do you use?
Oh, are you asking of the specific chip set...
e1000e: Intel(R) PRO/1000 Network Driver - 3.4.1.1-NAPI

[ 1.872787] e1000e 0000:0b:00.1 eth1: (PCI Express:2.5GT/s:Width x4) 00:1f:29:61:b6:c5
[ 1.872790] e1000e 0000:0b:00.1 eth1: Intel(R) PRO/1000 Network Connection
[ 1.872870] e1000e 0000:0b:00.1 eth1: MAC: 0, PHY: 4, PBA No: D51930-004

[ 2.116367] bnx2 0000:03:00.0 eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f8000000, IRQ 16, node addr 00:1e:0b:6f:2f:fc
[ 2.716347] bnx2 0000:05:00.0 eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem fa000000, IRQ 17, node addr 00:1e:0b:6f:2f:fa
Testlab servers are older HPE DL380G5 w/bonding of 3x1Gbs (1x onboard, 2 x pci add-on).
Production are bonding of 2x10Gbs in HPE DL360 gen9.
 
Last edited:
Note that you use unicast (transport:udpu). so you need low latencies on your switches.

I have cluster running 20nodes, unicast, with token: 4000, and average ping latency of 0.016ms.
Yes want to avoid issues w/broadcast thus unicasted.
This just a 3 node cluster and production 7 nodes.
Token at 4000 meaning...?

Testlab:
ping from node5:
--- node6 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8155ms
rtt min/avg/max/mdev = 0.167/0.323/0.940/0.221 ms
--- node7 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
rtt min/avg/max/mdev = 0.156/0.313/0.842/0.266 ms

Production:
ping from n1:
--- n2 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9204ms
rtt min/avg/max/mdev = 0.071/0.130/0.539/0.137 ms