corosync constant retransmit

hancok62

Member
Jun 22, 2020
17
4
8
26
hello,

i have up to 16 nodes in a proxmox cluster and corosync is constantly showing retransmit in logs:
Code:
Jan 05 13:38:15 node2 corosync[18227]:   [TOTEM ] Retransmit List: 66a9
Jan 05 13:38:23 node2 corosync[18227]:   [TOTEM ] Retransmit List: 6708
Jan 05 13:38:30 node2 corosync[18227]:   [TOTEM ] Retransmit List: 67b0
Jan 05 13:38:33 node2 corosync[18227]:   [TOTEM ] Retransmit List: 67b2
Jan 05 13:38:43 node2 corosync[18227]:   [TOTEM ] Retransmit List: 6863
Jan 05 13:38:50 node2 corosync[18227]:   [TOTEM ] Retransmit List: 6919
Jan 05 13:39:03 node2 corosync[18227]:   [TOTEM ] Retransmit List: 69d3
Jan 05 13:39:10 node2 corosync[18227]:   [TOTEM ] Retransmit List: 6b1a
Jan 05 13:39:26 node2 corosync[18227]:   [TOTEM ] Retransmit List: 6bcf
Jan 05 13:39:26 node2 corosync[18227]:   [TOTEM ] Retransmit List: 6bd6

all cluster is working in local network with 2 network interfaces but this seems work,
not have any network drop throught servers

proxmox-ve: 6.2-1 (running kernel: 5.0.15-1-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-1
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-11
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-10
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

corosync stats to other random node not show any problem:

corosync-cmapctl -m stats
stats.knet.node1.link0.connected (u8) = 1
stats.knet.node1.link0.down_count (u32) = 1
stats.knet.node1.link0.enabled (u8) = 1
stats.knet.node1.link0.latency_ave (u32) = 255
stats.knet.node1.link0.latency_max (u32) = 799
stats.knet.node1.link0.latency_min (u32) = 255
stats.knet.node1.link0.latency_samples (u32) = 613
stats.knet.node1.link0.mtu (u32) = 1397
stats.knet.node1.link0.rx_data_bytes (u64) = 2586052
stats.knet.node1.link0.rx_data_packets (u64) = 22891
stats.knet.node1.link0.rx_ping_bytes (u64) = 14014
stats.knet.node1.link0.rx_ping_packets (u64) = 539
stats.knet.node1.link0.rx_pmtu_bytes (u64) = 104527
stats.knet.node1.link0.rx_pmtu_packets (u64) = 145
stats.knet.node1.link0.rx_pong_bytes (u64) = 15938
stats.knet.node1.link0.rx_pong_packets (u64) = 613
stats.knet.node1.link0.rx_total_bytes (u64) = 2720531
stats.knet.node1.link0.rx_total_packets (u64) = 24188
stats.knet.node1.link0.rx_total_retries (u64) = 0
stats.knet.node1.link0.tx_data_bytes (u64) = 6814000
stats.knet.node1.link0.tx_data_errors (u32) = 0
stats.knet.node1.link0.tx_data_packets (u64) = 6260
stats.knet.node1.link0.tx_data_retries (u32) = 0
stats.knet.node1.link0.tx_ping_bytes (u64) = 49040
stats.knet.node1.link0.tx_ping_errors (u32) = 0
stats.knet.node1.link0.tx_ping_packets (u64) = 613
stats.knet.node1.link0.tx_ping_retries (u32) = 0
stats.knet.node1.link0.tx_pmtu_bytes (u64) = 105984
stats.knet.node1.link0.tx_pmtu_errors (u32) = 0
stats.knet.node1.link0.tx_pmtu_packets (u64) = 72
stats.knet.node1.link0.tx_pmtu_retries (u32) = 0
stats.knet.node1.link0.tx_pong_bytes (u64) = 43120
stats.knet.node1.link0.tx_pong_errors (u32) = 0
stats.knet.node1.link0.tx_pong_packets (u64) = 539
stats.knet.node1.link0.tx_pong_retries (u32) = 0
stats.knet.node1.link0.tx_total_bytes (u64) = 7012144
stats.knet.node1.link0.tx_total_errors (u64) = 0
stats.knet.node1.link0.tx_total_packets (u64) = 7484
stats.knet.node1.link0.up_count (u32) = 1

the cluster latency is lower than 1ms

Thanks,
Regards,
 
Please provide your network config (/etc/network/interfaces).
Is there anything else running over the same network as corosync?
 
> Is there anything else running over the same network as corosync?

Hi @mira ,

I see there was no reply to this question, so I have a related one:

Can Promox VE Cluster work over a WAN, over the internet? We have a 1 gbit/s DC and a 200 mbit/s DC and I'm trying to get it working but I'm getting lots of Corosync `Retransmit List` errors. I seem to recall that you need a dedicated network for Corosync...


EDIT
Nevermind, it seems I have an architectural problem. This is an AI response but even if it's half true running Corosync over a WAN will be problematic:

1700558824215.png
 
Last edited:
corosync needs stable, low-latency links, so no, that is not expected to work (well).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!