corosync udp flood

Sascha72036 · Mar 22, 2021

Hello,

we are running a 42 node proxmox cluster with ceph.
Our nodes are connected via Intel X520-DA2 (2x 10G) to two seperated Arista 7050QX Switches.
Corosync and Ceph are seprated in two different vlans.
The normal traffic of the VMs run over the onboard NIC.

We have big problems with corosync. For an unexplained cause several nodes are starting with udp flood (more than 1M pps and >800 MB/s) via the corosync port. As a result, some network cards of the nodes block the entire traffic.
Our pveversion:
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-7
pve-kernel-helper: 6.3-7
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-6
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-3
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve2

We have also heard from other operators of such a large cluster who have the same problem. They limited it via iptables rules so that the udp flood from corosync does not escalate too much so that the NICs go offline.
In our case stopping corosync on all nodes and starting corosync one after the other is the solution.
The main problem arises that the corosync flood also disrupts the ceph traffic.

Does anyone have any idea why corosync is suddenly sending so much traffic so that we can find the cause.

Best regards
Sascha

spirit · Mar 24, 2021

Hi,
I known 1 user reported this on the forum some months ago, I thinked it was a pve-cluster bug at this moment (as they was really a bug in pmxcfs hanging corosync).

but it seem you can reproduce with last packages. (are you sure that all your nodes are updated, and have corosync restart with last version ?)

42 nodes is really a huge cluster (personnally, I'm doing multiples cluster under 20 nodes),
do you have done special tuning in corosync.conf ?

maybe can you also report your bug to corosync github
https://github.com/corosync/corosync/issues

I think that corosync dev could help too.

(maybe if you have some .pcap network trace it could help too)

robertb · Apr 22, 2021

Hello everyone,

we have seen a similar problem with corosync. Once a cluster member is rebooted and rejoins the cluster, the NICs (ixgbe) on other nodes will reset and shutdown their links with messages like so:

Code:

[4692542.687464] ixgbe 0000:43:00.0 eth2: Reset adapter
[4692547.806838] ixgbe 0000:43:00.1 eth3: initiating reset due to tx timeout
[4692552.926839] ixgbe 0000:43:00.1 eth3: initiating reset due to tx timeout
[4692557.794835] ixgbe 0000:43:00.1 eth3: initiating reset due to tx timeout
[4692560.550945] bond1: (slave eth2): speed changed to 0 on port 1

The nodes that lose their connection seem to be random and what helps is to finally stop all corosync and pve-cluster services on all nodes and restart them one by one again.
Is there any possibility for corosync to interfere with NIC settings or link state?
We have observed that corosync tries to do a path mtu discovery and starts by using mtu 65000+ and also claims to set that globally.

Using other transports like sctp for corosync did not solve the issue.

spirit · Apr 22, 2021

robertb said:
Hello everyone,

we have seen a similar problem with corosync. Once a cluster member is rebooted and rejoins the cluster, the NICs (ixgbe) on other nodes will reset and shutdown their links with messages like so:

Code:

[4692542.687464] ixgbe 0000:43:00.0 eth2: Reset adapter [4692547.806838] ixgbe 0000:43:00.1 eth3: initiating reset due to tx timeout [4692552.926839] ixgbe 0000:43:00.1 eth3: initiating reset due to tx timeout [4692557.794835] ixgbe 0000:43:00.1 eth3: initiating reset due to tx timeout [4692560.550945] bond1: (slave eth2): speed changed to 0 on port 1

The nodes that lose their connection seem to be random and what helps is to finally stop all corosync and pve-cluster services on all nodes and restart them one by one again.
Is there any possibility for corosync to interfere with NIC settings or link state?
We have observed that corosync tries to do a path mtu discovery and starts by using mtu 65000+ and also claims to set that globally.

Using other transports like sctp for corosync did not solve the issue.

sound like this node was flooding other nodes, and network interfaces couldn't handle too much packets.
what is your versions ? #pveversion -v ?

Sascha72036 · Apr 22, 2021

We switched the transport to sctp. After that the problems with corosync flood in the cluster no longer occurred.
A few days ago we upgraded all servers to the latest pveversion. During the upgrade process, however, the network cards suddenly switched off on random hosts again with the same error as RobertB, while corosync was restarted after the upgrade.
The error also occurs after a cluster member is rebooted and rejoins the cluster.

Meanwhile, the syslog:

Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 27 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 26 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 25 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 24 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 23 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 22 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 20 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 19 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 10 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 453 to 65397
Apr 22 10:11:09 prox14 corosync[3844190]: [KNET ] pmtud: Global data MTU changed to: 65397

pveversion -v:

proxmox-ve: 6.3-1 (running kernel: 5.4.78-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
ceph: 14.2.19-pve1
ceph-fuse: 14.2.19-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-1
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

spirit · May 3, 2021

@Sascha72036

I don't have too much experienc with sctp, but I have found on the corosync github that it's possible to increase some sctp counter at kernel level,
on increase retransmit.
not sure It could help in your case.

sysctl -w net.sctp.association_max_retrans=100
sysctl -w net.sctp.path_max_retrans=50

Search

Search

corosync udp flood

Sascha72036

Renowned Member

spirit

Distinguished Member

robertb

Active Member

spirit

Distinguished Member

Sascha72036

Renowned Member

spirit

Distinguished Member

We value your privacy