[SOLVED] Corosync - Cluster retransmit issues | pmxcfs / corosync synchronization problems | Proxmox cluster 3 nodes

no you didn't - the third node is missing. like I wrote, please provide the configs and logs from all three nodes (covering the same boot/time period). please include the network configuration as well!
Hello,

I attach you the files.

Regards,
IDEZ Ugo
 

Attachments

Also, there are two mentions about 10.100.37.0/27 network but no IP addresses shown that match the subnet. Typos when manually changing the outputs for some reason?
Hello,

I changed the NIC interface of the cluster for some tests with a fresh install.
So this is not the same network.

Regards,
IDEZ Ugo
 
what about "corosync-cfgtool -n" on each node?
 
what about "corosync-cfgtool -n" on each node?
Node1:
corosync-cfgtool -n

Local node ID 2, transport knet
nodeid: 1 reachable
LINK: 0 udp (10.30.7.151->10.30.7.152) enabled connected mtu: 1397

nodeid: 3 reachable
LINK: 0 udp (10.30.7.151->10.30.7.153) enabled connected mtu: 1397

Node2:
corosync-cfgtool -n

Local node ID 1, transport knet
nodeid: 2 reachable
LINK: 0 udp (10.30.7.152->10.30.7.151) enabled connected mtu: 1397

nodeid: 3 reachable
LINK: 0 udp (10.30.7.152->10.30.7.153) enabled connected mtu: 1397

Node3:
corosync-cfgtool -n

Local node ID 3, transport knet
nodeid: 1 reachable
LINK: 0 udp (10.30.7.153->10.30.7.152) enabled connected mtu: 1397

nodeid: 2 reachable
LINK: 0 udp (10.30.7.153->10.30.7.151) enabled connected mtu: 1397
 
this is with the retransmit problem still ongoing? have you tried stopping and starting corosync and pve-cluster on all three nodes? if the problem persists, the next step would be to enable debug logging (very verbose!) and dump those logs somewhere..
 
Hello everyone,

The problem was solved, and it was more twisted than one might have imagined.

The interfaces of each of our cluster nodes could in fact communicate in a stable way for Corosync only in 10Gbps or 25Gbps.

What was totally illogical was that the rest of the network (except corosync) communicated very well and without any apparent problem on the rest (ICMP, DNS, SSH, HTTPS, ...) in 1Gbps by fixing the interfaces in self-negotiation.

But the goal is positive, we have ordered a switch with communicating interfaces in 10Gbps and now the Proxmox cluster - Corosync - is totally stable and functional :).

Thank you all for your time invested in helping me and your help.
Maybe the next one! (Although I don't hope so haha)

Best regards,
IDEZ Ugo