48 node pve cluster

Sascha72036 · Apr 29, 2021

Hello everyone,

we're running a 48 node pve cluster with this setup:
AMD EPYC 7402P, 512GB Memory, Intel X520-DA2 or Mellanox Connect X3 NIC, Ceph Pool with only NVMe, 2x 10Gbit/s interfaces (for cluster traffic) + 2x 1G (for public traffic).
As a few others have recently reported in the forum, there are massive problems with larger clusters (>36 nodes).
The main problem is (probably) a bug in corosync. All nodes start flooding each other with udp floods via the corosync port.
Changing transport to sctp in corosync.conf doesn't seem to be the solution. It resolves the udp flood of course but we run in other problems.

Before we split our cluster: Does anyone have some idea what else we could do?
Or is splitting the cluster the currently best solution?

Best regards
Sascha

spirit · Apr 29, 2021

pveversion -v ?

do you have tried to increase the token timeout value ? something like 10000 for example. (token: 10000).

do you have dedicated link for corosync ? or is it mixed with other traffic ?

does the flood happen only sometime ? all nodes are flooding ? or a specific node is flooding the other ?

Sascha72036 · Apr 29, 2021

We have one Dual 10G NIC each node. The 10G NIC is shared between corosync & ceph traffic (seperated in two different vlans).
The floods start for no apparent reason. With sctp instead of knet we have no more floods but after we restart corosync, some NICs in our cluster are resetting due to tx timeout (maybe a problem with mtu discovery).

Not all nodes are flooding at the same time and every time different nodes are blocking the nics because of floods from some other nodes.

Thank you very much for your help. I will try increasing the amount of token, token_coefficient and send_join. What would be a good value here?

pveversion -v:

proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
ceph: 14.2.19-pve1
ceph-fuse: 14.2.19-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-1
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

corosync.conf:

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: pvecluster
config_version: 47
interface {
knet_transport: sctp
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
token_coefficient: 3000
version: 2
netmtu: 1500
}

spirit · Apr 30, 2021

be carefull with toek_coefficient, because the real token timeout is compute like this:

https://manpages.debian.org/buster/corosync/corosync.conf.5.en.html

Code:

 real token timeout is then computed as token + (number_of_nodes - 2) * token_coefficient.

default token value is 3000 since last corosync update (it was 1000 previously).

but with your token_coefficient:3000 && 48 nodes, you are already around: 81000ms... that seem pretty high. (not sure about the impact).

corosync dev recommand to only change base token value, something between 3000-10000 should be enough. (I'm running 20 nodes clusters with token:1000 without any problem)

Search

Search

48 node pve cluster

Sascha72036

Renowned Member

spirit

Distinguished Member

Sascha72036

Renowned Member

spirit

Distinguished Member

We value your privacy