Hello,
are there any best practices/hints for an optimal network setup with ceph (block storage ) available
for servers with following cards
BCM 5708 1 x ( 2 Port 100 GB ) used for ceph
BCM 57504 2 x ( 4 Port 100 GB )
TOR Switches 2 x Cisco 9364c-gx .
We want to use bonding with LACP 802.3 e.g 2 port 100 GB over 2 switches.
As there are a lot of parameters that can be tuned it would help to have some hints/best practices
as this is not an exotic HW config ( BCM, Cisco ) .
=== 1 set some values in Nic bios ===
flow offload enabled, perf profile RoCE, Link FEC CL91, NIC RDMA mode enabled
=== 2 Applied the bcm linux driver ===
installed niccli and set:
setoption -name firmware_link_speed_d0 -value 6 -scope 0
setoption -name firmware_link_speed_d0 -value 6 -scope 1
setoption -name firmware_link_speed_d3 -value 6 -scope 0
setoption -name firmware_link_speed_d3 -value 6 -scope 1
=== 3 interfaces config: set flow control and mtu size ===
auto ens3f0np0
iface ens3f0np0 inet manual
mtu 9216
post-up ethtool $IFACE rx 2047 tx 2047 tx flow-control on rx flow-control on
auto ens3f1np1
iface ens3f1np1 inet manual
mtu 9216
post-up ethtool $IFACE rx 2047 tx 2047 tx flow-control on rx flow-control on
auto bond0
iface bond0 inet static
address 172.16.10.10/24
bond-slaves ens3f0np0 ens3f1np1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9216
== added sysctl /etc/sysctl.d# cat 40-bcm57508.conf ===
# allow TCP with buffers up to 2GB (Max allowed in Linux is 2GB-1)
net.core.rmem_max=2147483647
net.core.wmem_max=2147483647
# increase TCP autotuning buffer limits.
net.ipv4.tcp_rmem=4096 131072 1073741824
net.ipv4.tcp_wmem=4096 16384 1073741824
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
# recommended to enable 'fair queueing'
net.core.default_qdisc = fq
# need to increase this to use MSG_ZEROCOPY
net.core.optmem_max = 1048576
The iperf ( iperf -P 8 -t 60 ) measure between interfaces show still some results with variation:
LOG.1:[SUM] 0.0000-60.0001 sec 1.30 TBytes 191 Gbits/sec
LOG.10:[SUM] 0.0000-60.0021 sec 1.32 TBytes 193 Gbits/sec
LOG.2:[SUM] 0.0000-60.0001 sec 1.33 TBytes 195 Gbits/sec
LOG.3:[SUM] 0.0000-60.0045 sec 1.25 TBytes 183 Gbits/sec
LOG.4:[SUM] 0.0000-60.0062 sec 1.26 TBytes 185 Gbits/sec
LOG.5:[SUM] 0.0000-60.0023 sec 1.30 TBytes 191 Gbits/sec
LOG.6:[SUM] 0.0000-60.0026 sec 1.20 TBytes 176 Gbits/sec
LOG.7:[SUM] 0.0000-60.0030 sec 1.30 TBytes 191 Gbits/sec
LOG.8:[SUM] 0.0000-60.0001 sec 1.09 TBytes 159 Gbits/sec
LOG.9:[SUM] 0.0000-60.0000 sec 1.32 TBytes 193 Gbits/sec
Thanks and best regards
are there any best practices/hints for an optimal network setup with ceph (block storage ) available
for servers with following cards
BCM 5708 1 x ( 2 Port 100 GB ) used for ceph
BCM 57504 2 x ( 4 Port 100 GB )
TOR Switches 2 x Cisco 9364c-gx .
We want to use bonding with LACP 802.3 e.g 2 port 100 GB over 2 switches.
As there are a lot of parameters that can be tuned it would help to have some hints/best practices
as this is not an exotic HW config ( BCM, Cisco ) .
=== 1 set some values in Nic bios ===
flow offload enabled, perf profile RoCE, Link FEC CL91, NIC RDMA mode enabled
=== 2 Applied the bcm linux driver ===
installed niccli and set:
setoption -name firmware_link_speed_d0 -value 6 -scope 0
setoption -name firmware_link_speed_d0 -value 6 -scope 1
setoption -name firmware_link_speed_d3 -value 6 -scope 0
setoption -name firmware_link_speed_d3 -value 6 -scope 1
=== 3 interfaces config: set flow control and mtu size ===
auto ens3f0np0
iface ens3f0np0 inet manual
mtu 9216
post-up ethtool $IFACE rx 2047 tx 2047 tx flow-control on rx flow-control on
auto ens3f1np1
iface ens3f1np1 inet manual
mtu 9216
post-up ethtool $IFACE rx 2047 tx 2047 tx flow-control on rx flow-control on
auto bond0
iface bond0 inet static
address 172.16.10.10/24
bond-slaves ens3f0np0 ens3f1np1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9216
== added sysctl /etc/sysctl.d# cat 40-bcm57508.conf ===
# allow TCP with buffers up to 2GB (Max allowed in Linux is 2GB-1)
net.core.rmem_max=2147483647
net.core.wmem_max=2147483647
# increase TCP autotuning buffer limits.
net.ipv4.tcp_rmem=4096 131072 1073741824
net.ipv4.tcp_wmem=4096 16384 1073741824
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
# recommended to enable 'fair queueing'
net.core.default_qdisc = fq
# need to increase this to use MSG_ZEROCOPY
net.core.optmem_max = 1048576
The iperf ( iperf -P 8 -t 60 ) measure between interfaces show still some results with variation:
LOG.1:[SUM] 0.0000-60.0001 sec 1.30 TBytes 191 Gbits/sec
LOG.10:[SUM] 0.0000-60.0021 sec 1.32 TBytes 193 Gbits/sec
LOG.2:[SUM] 0.0000-60.0001 sec 1.33 TBytes 195 Gbits/sec
LOG.3:[SUM] 0.0000-60.0045 sec 1.25 TBytes 183 Gbits/sec
LOG.4:[SUM] 0.0000-60.0062 sec 1.26 TBytes 185 Gbits/sec
LOG.5:[SUM] 0.0000-60.0023 sec 1.30 TBytes 191 Gbits/sec
LOG.6:[SUM] 0.0000-60.0026 sec 1.20 TBytes 176 Gbits/sec
LOG.7:[SUM] 0.0000-60.0030 sec 1.30 TBytes 191 Gbits/sec
LOG.8:[SUM] 0.0000-60.0001 sec 1.09 TBytes 159 Gbits/sec
LOG.9:[SUM] 0.0000-60.0000 sec 1.32 TBytes 193 Gbits/sec
Thanks and best regards
Last edited: