latest update for proxmox community edition 8.3.5 broke corosync

lix

New Member
Apr 4, 2025
1
0
1
I have 8.3.5 proxmox. After latest update I have under loop, every 2-5 seconds, receiving next message for all cluster nodes:

Apr 04 22:17:04 prx-01 corosync[1488]: [KNET ] udp: Received packet on ifindex 1664013570 when expected ifindex 3
Apr 04 22:17:04 prx-01 corosync[1488]: [KNET ] udp: Received packet on ifindex 1664013570 when expected ifindex 3
Apr 04 22:17:04 prx-01 corosync[1488]: [KNET ] udp: Received packet on ifindex -714920235 when expected ifindex 3
Apr 04 22:17:03 prx-01 corosync[1488]: [KNET ] udp: Received packet on ifindex 1664013570 when expected ifindex 3
Apr 04 22:17:03 prx-01 corosync[1488]: [KNET ] udp: Received packet on ifindex 1664013570 when expected ifindex 3

Cluster proxmox have 4 nodes.

Corosync version -
corosync/stable,now 3.1.9-pve1 amd64 [installed,automatic]
cluster engine daemon and utilities

How to solve it or debug and solve it? Could you please help
 
Last edited:
How to solve it or debug and solve it?

Step zero: restart corosync on all nodes, one after another: systemctl restart corosync.service ; systemctl status corosync.service

Step one: verify that all rings are actually both enabled and connected, similar like this:
Code:
~# corosync-cfgtool   -n
Local node ID 3, transport knet
nodeid: 2 reachable
   LINK: 0 udp (10.3.16.6->10.3.16.9) enabled connected mtu: 1397
   LINK: 1 udp (10.11.16.6->10.11.16.9) enabled connected mtu: 1397

nodeid: 4 reachable
   LINK: 0 udp (10.3.16.6->10.3.16.10) enabled connected mtu: 1397
   LINK: 1 udp (10.11.16.6->10.11.16.10) enabled connected mtu: 1397

nodeid: 6 reachable
   LINK: 0 udp (10.3.16.6->10.3.16.7) enabled connected mtu: 1397
   LINK: 1 udp (10.11.16.6->10.11.16.7) enabled connected mtu: 1397
...

Disclaimer: just random thoughts...
 
I am having similar problems. The error I get differs slightly, but it is caused by the same piece of code - function check_dst_addr_is_valid in libknet/transport_udp.c from the kronosnet package. This code was added last year to kronosnet and only found its way to Proxmox a few days ago.

I am running a 5-node cluster with some mesh networking between the nodes, which is what is triggering the warning (as the comment for the function says, this is indeed "weird routing"...)

At this time, the only workaround I've found is to set the minimum log level for corosync to error (syslog_priority: error in the logging section), but I'm not too fond of that. If I don't do it though, all nodes spam the warning continuously, filling the log partition and saturating my Graylog's index as well.
 
  • Like
Reactions: wbedard