[SOLVED] More Fix corosync [TOTEM ] Retransmit List errors

bferrell

Well-Known Member
Nov 16, 2018
100
2
58
54
So, I've read this thread, and the multicast notes and I'm still seeing occasional retransmits. I have a 4 node cluster, all on the same VLAN (and they are the only machines on this VLAN). They all have 10G copper ports onto the sam Ubiquiti USW-XG-16 switch, connected to USG-XG 10G router with snooping enabled. "omping -c 600 -i 1 -q svr-01 svr-02 svr-03 svr-04" generates 0% lost packets. They can all ping and nslookup the other node names. What else to check?

Code:
svr-02 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.133/0.273/0.372/0.041
svr-02 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.175/0.296/0.436/0.042
svr-03 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.113/0.209/0.317/0.029
svr-03 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.118/0.230/0.295/0.030
svr-04 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.116/0.212/0.354/0.033
svr-04 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.139/0.222/0.340/0.023
 
Last edited:
Also occasionally get this, which causes HA to not be useful. Will try turning off Snooping on the UBNT switch to see if that has a positive effect.

"error with cfs lock 'file-replication_cfg': no quorum!"
 
We had this problem on and off for a little over a year. We then moved the corosync to its own network and it went away forever. Until you make an isolated corosync network you'll always be in danger of having this problem.
 
Last edited:
Fair enough, I already started looking at that path. I did see a decent walkthough of someone doing this on YouTube, setting up the new network on a new interface, but I didn't see them add any routing rule, which I think this is required, right? I need to brush up on dual-homing...
 
From my vague recollection, once you have set up the additional network interface in Proxmox (IP/Subnet), you can set the corosync network in one of the config files (edit: /etc/pve/corosync) to that particular IP for each host. There's a few other settings too. Probably some restart of services may be required

Repeat for all hosts.

Definitely familiarize yourself with the official wiki.
 
Last edited:
I did setup a 1G corosync network and this went away.

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: svr-01
nodeid: 1
quorum_votes: 1
ring0_addr: 1corosync
}
node {
name: svr-02
nodeid: 3
quorum_votes: 1
ring0_addr: 2corosync
}
node {
name: svr-03
nodeid: 2
quorum_votes: 1
ring0_addr: 3corosync
}
node {
name: svr-04
nodeid: 4
quorum_votes: 1
ring0_addr: 4corosync
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Congress
config_version: 10
interface {
bindnetaddr: 192.168.102.12
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!