Best practices for network config: BCM Nic and Cisco Switch

Oct 8, 2024
1
1
3
Hello,
are there any best practices/hints for an optimal network setup with ceph (block storage ) available
for servers with following cards

BCM 5708 1 x ( 2 Port 100 GB ) used for ceph
BCM 57504 2 x ( 4 Port 100 GB )


TOR Switches 2 x Cisco 9364c-gx .

We want to use bonding with LACP 802.3 e.g 2 port 100 GB over 2 switches.



As there are a lot of parameters that can be tuned it would help to have some hints/best practices
as this is not an exotic HW config ( BCM, Cisco ) .


=== 1 set some values in Nic bios ===

flow offload enabled, perf profile RoCE, Link FEC CL91, NIC RDMA mode enabled

=== 2 Applied the bcm linux driver ===
installed niccli and set:

setoption -name firmware_link_speed_d0 -value 6 -scope 0
setoption -name firmware_link_speed_d0 -value 6 -scope 1
setoption -name firmware_link_speed_d3 -value 6 -scope 0
setoption -name firmware_link_speed_d3 -value 6 -scope 1

=== 3 interfaces config: set flow control and mtu size ===

auto ens3f0np0
iface ens3f0np0 inet manual
mtu 9216
post-up ethtool $IFACE rx 2047 tx 2047 tx flow-control on rx flow-control on

auto ens3f1np1
iface ens3f1np1 inet manual
mtu 9216
post-up ethtool $IFACE rx 2047 tx 2047 tx flow-control on rx flow-control on

auto bond0
iface bond0 inet static
address 172.16.10.10/24
bond-slaves ens3f0np0 ens3f1np1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9216


== added sysctl /etc/sysctl.d# cat 40-bcm57508.conf ===

# allow TCP with buffers up to 2GB (Max allowed in Linux is 2GB-1)

net.core.rmem_max=2147483647

net.core.wmem_max=2147483647

# increase TCP autotuning buffer limits.

net.ipv4.tcp_rmem=4096 131072 1073741824

net.ipv4.tcp_wmem=4096 16384 1073741824

# recommended for hosts with jumbo frames enabled

net.ipv4.tcp_mtu_probing=1

# recommended to enable 'fair queueing'

net.core.default_qdisc = fq

# need to increase this to use MSG_ZEROCOPY

net.core.optmem_max = 1048576

The iperf ( iperf -P 8 -t 60 ) measure between interfaces show still some results with variation:


LOG.1:[SUM] 0.0000-60.0001 sec 1.30 TBytes 191 Gbits/sec
LOG.10:[SUM] 0.0000-60.0021 sec 1.32 TBytes 193 Gbits/sec
LOG.2:[SUM] 0.0000-60.0001 sec 1.33 TBytes 195 Gbits/sec
LOG.3:[SUM] 0.0000-60.0045 sec 1.25 TBytes 183 Gbits/sec
LOG.4:[SUM] 0.0000-60.0062 sec 1.26 TBytes 185 Gbits/sec
LOG.5:[SUM] 0.0000-60.0023 sec 1.30 TBytes 191 Gbits/sec
LOG.6:[SUM] 0.0000-60.0026 sec 1.20 TBytes 176 Gbits/sec
LOG.7:[SUM] 0.0000-60.0030 sec 1.30 TBytes 191 Gbits/sec
LOG.8:[SUM] 0.0000-60.0001 sec 1.09 TBytes 159 Gbits/sec
LOG.9:[SUM] 0.0000-60.0000 sec 1.32 TBytes 193 Gbits/sec




Thanks and best regards
 
Last edited:
  • Like
Reactions: morik_proxmox