Corosync 3 - Kronosnet - link: host: x link: 1 is down

TwiX

Renowned Member
Feb 3, 2015
311
23
83
Hello,

I just built 6 new pmx v6 nodes (uptodate) with same hardware.
Had 2 links per node (2 lacp bond on 2 Intel X520) :
bond0 : 2x10 Gb (Management and VMs prod - MTU : 1500)
bond1 : 2x10 Gb (Ceph Storage - MTU : 9000)

root@dc-prox-23:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1

bond0 is declared as primary corosync link (link 0), bond1 as link 1

Code:
root@dc-prox-25:~# corosync-cfgtool -s
Printing link status.
Local node ID 1
LINK ID 0
        addr    = 10.192.5.59
        status:
                nodeid  1:      link enabled:1  link connected:1
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1
                nodeid  4:      link enabled:1  link connected:1
                nodeid  5:      link enabled:1  link connected:1
                nodeid  6:      link enabled:1  link connected:1
LINK ID 1
        addr    = 10.199.0.59
        status:
                nodeid  1:      link enabled:0  link connected:1
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1
                nodeid  4:      link enabled:1  link connected:1
                nodeid  5:      link enabled:1  link connected:1
                nodeid  6:      link enabled:1  link connected:1

On another node :
Code:
root@dc-prox-23:~# corosync-cfgtool -s
Printing link status.
Local node ID 4
LINK ID 0
        addr    = 10.192.5.57
        status:
                nodeid  1:      link enabled:1  link connected:1
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1
                nodeid  4:      link enabled:1  link connected:1
                nodeid  5:      link enabled:1  link connected:1
                nodeid  6:      link enabled:1  link connected:1
LINK ID 1
        addr    = 10.199.0.57
        status:
                nodeid  1:      link enabled:1  link connected:1
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1
                nodeid  4:      link enabled:0  link connected:1
                nodeid  5:      link enabled:1  link connected:1
                nodeid  6:      link enabled:1  link connected:1

Sometimes I have the following lines on syslog :

Nov 4 09:40:03 dc-prox-23 corosync[34237]: [KNET ] link: host: 1 link: 1 is down
Nov 4 09:40:03 dc-prox-23 corosync[34237]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 4 09:40:06 dc-prox-23 corosync[34237]: [KNET ] rx: host: 1 link: 1 is up
Nov 4 09:40:06 dc-prox-23 corosync[34237]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)

There is no VM on this cluster yet.

Seems that there is something wrong with the LAG with MTU=9000.

I have 2 links so I don't worry but I don't know if it could be an issue and where to look further.
I stressed the LAG with iperf for more than 10 min but nothing happened in syslog.

My corosync.conf :
root@dc-prox-23:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: dc-prox-22
nodeid: 3
quorum_votes: 1
ring0_addr: 10.192.5.56
ring1_addr: 10.199.0.56
}
node {
name: dc-prox-23
nodeid: 4
quorum_votes: 1
ring0_addr: 10.192.5.57
ring1_addr: 10.199.0.57
}
node {
name: dc-prox-24
nodeid: 2
quorum_votes: 1
ring0_addr: 10.192.5.58
ring1_addr: 10.199.0.58
}
node {
name: dc-prox-25
nodeid: 1
quorum_votes: 1
ring0_addr: 10.192.5.59
ring1_addr: 10.199.0.59
}
node {
name: dc-prox-26
nodeid: 5
quorum_votes: 1
ring0_addr: 10.192.5.60
ring1_addr: 10.199.0.60
}
node {
name: dc-prox-27
nodeid: 6
quorum_votes: 1
ring0_addr: 10.192.5.61
ring1_addr: 10.199.0.61
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: CDV6Cluster
config_version: 6
interface {
linknumber: 0
}
interface {
linknumber: 1
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

Thanks in advanced for your help if you find something to test !

Antoine
 
Sometimes I have the following lines on syslog :

How often is "sometimes" here?

I stressed the LAG with iperf for more than 10 min but nothing happened in syslog.

I mean, iperf probably doesn't care to much if the NIC/bond is "flapping", but knet may detect this and fail over to the other link temporarily.

Does the message shows up on all nodes at the same time, or just on a single or some of the nodes?
 
  • Like
Reactions: glowfisch
Hi,

Some nodes never showed up this message (for example dc-prox-22, dc-prox-24 and dc-prox-26).
For the other nodes, seems that it never shows up at the same time as you can see :

Nov 05 03:41:33 dc-prox-23 corosync[2164]: [KNET ] link: host: 1 link: 1 is down
Nov 05 03:41:33 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 03:41:35 dc-prox-23 corosync[2164]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 03:41:35 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 05:55:05 dc-prox-23 corosync[2164]: [KNET ] link: host: 1 link: 1 is down
Nov 05 05:55:05 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 05:55:07 dc-prox-23 corosync[2164]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 05:55:07 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 08:02:08 dc-prox-23 corosync[2164]: [KNET ] link: host: 1 link: 1 is down
Nov 05 08:02:08 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 08:02:10 dc-prox-23 corosync[2164]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 08:02:10 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 08:27:36 dc-prox-23 corosync[2164]: [KNET ] link: host: 1 link: 1 is down
Nov 05 08:27:36 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 08:27:38 dc-prox-23 corosync[2164]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 08:27:38 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 09:02:48 dc-prox-23 corosync[2164]: [KNET ] link: host: 1 link: 1 is down
Nov 05 09:02:48 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 09:02:50 dc-prox-23 corosync[2164]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 09:02:50 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 09:18:08 dc-prox-23 corosync[2164]: [KNET ] link: host: 1 link: 1 is down
Nov 05 09:18:08 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 09:18:10 dc-prox-23 corosync[2164]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 09:18:10 dc-prox-23 corosync[2164]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)

Nov 05 04:04:55 dc-prox-25 corosync[2244]: [KNET ] link: host: 6 link: 1 is down
Nov 05 04:04:55 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 04:04:57 dc-prox-25 corosync[2244]: [KNET ] rx: host: 6 link: 1 is up
Nov 05 04:04:57 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 04:05:31 dc-prox-25 corosync[2244]: [KNET ] link: host: 4 link: 1 is down
Nov 05 04:05:31 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 04:05:33 dc-prox-25 corosync[2244]: [KNET ] rx: host: 4 link: 1 is up
Nov 05 04:05:33 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 05:32:27 dc-prox-25 corosync[2244]: [KNET ] link: host: 4 link: 1 is down
Nov 05 05:32:27 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 05:32:30 dc-prox-25 corosync[2244]: [KNET ] rx: host: 4 link: 1 is up
Nov 05 05:32:30 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 06:48:09 dc-prox-25 corosync[2244]: [KNET ] link: host: 6 link: 1 is down
Nov 05 06:48:09 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 06:48:11 dc-prox-25 corosync[2244]: [KNET ] rx: host: 6 link: 1 is up
Nov 05 06:48:11 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 07:02:50 dc-prox-25 corosync[2244]: [KNET ] link: host: 6 link: 1 is down
Nov 05 07:02:50 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 07:02:52 dc-prox-25 corosync[2244]: [KNET ] rx: host: 6 link: 1 is up
Nov 05 07:02:52 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 07:19:50 dc-prox-25 corosync[2244]: [KNET ] link: host: 4 link: 1 is down
Nov 05 07:19:50 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 07:19:52 dc-prox-25 corosync[2244]: [KNET ] rx: host: 4 link: 1 is up
Nov 05 07:19:52 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 08:00:22 dc-prox-25 corosync[2244]: [KNET ] link: host: 4 link: 1 is down
Nov 05 08:00:22 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 08:00:24 dc-prox-25 corosync[2244]: [KNET ] rx: host: 4 link: 1 is up
Nov 05 08:00:24 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 09:05:58 dc-prox-25 corosync[2244]: [KNET ] link: host: 6 link: 1 is down
Nov 05 09:05:58 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 09:06:00 dc-prox-25 corosync[2244]: [KNET ] rx: host: 6 link: 1 is up
Nov 05 09:06:00 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 09:10:35 dc-prox-25 corosync[2244]: [KNET ] link: host: 6 link: 1 is down
Nov 05 09:10:35 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 09:10:38 dc-prox-25 corosync[2244]: [KNET ] rx: host: 6 link: 1 is up
Nov 05 09:10:38 dc-prox-25 corosync[2244]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 05 09:15:09 dc-prox-25 corosync[2244]: [KNET ] link: host: 4 link: 1 is down
Nov 05 09:15:09 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 05 09:15:10 dc-prox-25 corosync[2244]: [KNET ] rx: host: 4 link: 1 is up
Nov 05 09:15:10 dc-prox-25 corosync[2244]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)

Nov 05 03:42:44 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 03:42:44 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 03:42:46 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 03:42:46 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 03:56:34 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 03:56:34 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 03:56:36 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 03:56:36 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 04:29:25 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 04:29:25 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 04:29:27 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 04:29:27 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 04:53:46 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 04:53:46 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 04:53:48 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 04:53:48 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 04:58:19 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 04:58:19 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 04:58:21 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 04:58:21 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 07:08:37 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 07:08:37 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 07:08:39 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 07:08:39 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 07:54:52 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 07:54:52 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 07:54:54 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 07:54:54 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 08:50:37 dc-prox-27 corosync[2213]: [KNET ] link: host: 1 link: 1 is down
Nov 05 08:50:37 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 05 08:50:39 dc-prox-27 corosync[2213]: [KNET ] rx: host: 1 link: 1 is up
Nov 05 08:50:39 dc-prox-27 corosync[2213]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
 
Has this ever been solved?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!