Corosync - nsscrypto: Incorrect packet size

Dudeplayz · Nov 10, 2021

fabian said:
and could you post the network config (/etc/network/interfaces) from both nodes? something is mangling packets here, the question is who/what/where.. my next best guess would be to run tcpdump and monitor traffic on both nodes and links

How do I have to run tcpdump? I simply started it, but at first glance, there was nothing noticeable.

Dudeplayz · Nov 10, 2021

Ok here is the next weird info:
I have now restarted both nodes. Independent from each other. And syslog spam still occurs while the other server is down while rebooting. So it shouldn't be a problem in packet loss between these servers when one is down. Or I am wrong?

Dudeplayz · Nov 10, 2021

Ok it has something to do with the qdevice. I stopped the corosync service on the qdevice and the log spam disappeared.

Here is the corosync.conf of the device. Maybe there is a misconfiguration.

Bash:

pi@pve-qdevice1:~ $ cat /etc/corosync/corosync.conf
# Please read the corosync.conf.5 manual page
totem {
        version: 2

        secauth: on

        # Corosync itself works without a cluster name, but DLM needs one.
        # The cluster name is also written into the VG metadata of newly
        # created shared LVM volume groups, if lvmlockd uses DLM locking.
        cluster_name: PVE-Cluster

        # crypto_cipher and crypto_hash: Used for mutual node authentication.
        # If you choose to enable this, then do remember to create a shared
        # secret with "corosync-keygen".
        # enabling crypto_cipher, requires also enabling of crypto_hash.
        # crypto works only with knet transport
        crypto_cipher: none
        crypto_hash: none
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to yes. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: yes
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: off
        # Log messages with time stamps. When in doubt, set to hires (or on)
        #timestamp: hires
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
}

nodelist {
        # Change/uncomment/add node sections to match cluster configuration

        node {
                # Hostname of the node
                name: pve1
                # Cluster membership node identifier
                nodeid: 1
                quorum_votes: 1
                # Address of first link
                ring0_addr: 192.168.41.10
                ring1_addr: 192.168.40.11
                # When knet transport is used it's possible to define up to 8 links
                #ring1_addr: 192.168.1.1
        }
        node {
                name: pve2
                nodeid: 2
                quorum_votes: 1
                ring0_addr: 192.168.41.20
                ring1_addr: 192.168.40.21
        }
        node {
                name: pve-qdevice1
                nodeid: 3
                quorum_votes: 1
                ring0_addr: 192.168.41.30
                ring1_addr: 192.168.40.30
        }
        # ...
}

fabbione · Nov 10, 2021

Hi, my name is Fabio and I am one of the corosync/knet maintainers looking at this issue together with Fabian.

What is odd about this report is that the cluster is forming without any problems and at the same time we see those packets that are being rejected because either too small or too big. The double log you see now with debug enable is perfectly fine. It's the same error reported by different code paths inside knet.

First, I would love if you could capture a tcpdump on both nodes on the corosync interface. A few seconds while the problem is happening should be more than enough. Please use verbose options in tcpdump.

Next, I would like you to please try one thing for me. In corosync.conf, in the totem section, please add: crypto_model: openssl

The section would look like:
totem {
....
crypto_model: openssl
}

and restart both nodes, one at a time. If the problem disappear, then we have at least isolated part of the problem to be inside nss code. Otherwise tcpdump can help us investigate further.

Thanks
Fabio

fabbione · Nov 10, 2021

Hi again, I just saw your last post about qdevice config. That might be the problem actually. I will let Fabian look into it.

Fabio

t.lamprecht · Nov 10, 2021

Dudeplayz said:
Ok it has something to do with the qdevice. I stopped the corosync service on the qdevice and the log spam disappeared.

Is there corosync running on the QDevice host? That could be an issue yes, or do you mean you only stopped the corosync-qnetd.service?

Dudeplayz · Nov 10, 2021

t.lamprecht said:
Is there corosync running on the QDevice host? That could be an issue yes, or do you mean you only stopped the corosync-qnetd.service?

The QDevice is an RPi. QDevice and QNet are installed on it. I simply stopped the corosync service, nothing else. The setup worked before migration (I think, haven't looked at the Syslog before that much). Maybe caused by a qdevice update or similar.

fabian · Nov 11, 2021

on the rpi the corosync service is not supposed to run - the whole point of the qdevice/qnetd feature is to have a tie-braker vote that is not running the full corosync stack

the config on the rpi probably had the two other nodes, but crypto disabled so it was sending plain traffic to the actual cluster which was expecting encrypted traffic..

Dudeplayz · Nov 11, 2021

fabian said:
on the rpi the corosync service is not supposed to run - the whole point of the qdevice/qnetd feature is to have a tie-braker vote that is not running the full corosync stack the config on the rpi probably had the two other nodes, but crypto disabled so it was sending plain traffic to the actual cluster which was expecting encrypted traffic..

Thank you Fabian. I removed corosync and everything is still working, but now without the log spam. I have also discussed it a little bit more in the github issue.

King regards,
Dario

Corosync - nsscrypto: Incorrect packet size

Dudeplayz

Member

Dudeplayz

Member

Dudeplayz

Member

fabbione

New Member

fabbione

New Member

t.lamprecht

Proxmox Staff Member

Dudeplayz

Member

fabian

Proxmox Staff Member

Dudeplayz

Member

We value your privacy