100G Mellanox Connect-5 on AMD Epyc 7302P

Feb 4, 2024
47
2
8
Hello All,

i run an LACP Bond (layer 2 logic) 2 x 100GB with different VLAns. i want to Use vlan 230 with MTU of 9000 enabled for linstor/drbd. But we only get a bandwith of around 21GB not matter if i use iperf or iperf3 (also with 4 simultaniously threads).

i amended the cpu states like this:
https://forum.proxmox.com/threads/mellanox-connectx-5-en-100g-running-at-40g.106095/

updated the mellanox firmware to latest available like this:
https://www.thomas-krenn.com/de/wiki/Mellanox_Firmware_Tools_-_Firmware_Upgrade_unter_Linux

also installed these helpers to use rdma
apt install -y infiniband-diags opensm ibutils rdma-core rdmacm-utils &&
modprobe ib_umad &&
modprobe ib_ipoib

switchwise also LACP Bond for sure with mtu 9000 and PFC enabled for loosless communication.

ethtool bond0
Settings for bond0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 200000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: yes

attached a picture of the systemload, so there is not a single thread limiting.

root@pve2:~# iperf --bind 192.168.230.76 -c 192.168.230.72 -p 5240 -t 10
------------------------------------------------------------
Client connecting to 192.168.230.72, TCP port 5240
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.230.76 port 42397 connected with 192.168.230.72 port 5240 (icwnd/mss/irtt=87/8948/183)
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-10.0140 sec 21.0 GBytes 18.0 Gbits/sec


and here is my network setting:

iface enp6s0f1 inet manual

iface enp5s0 inet manual

auto eno1
iface eno1 inet manual
#Quorum

auto eno2
iface eno2 inet manual
#Quorum

iface enxbe3af2b6059f inet manual

auto bond0
iface bond0 inet manual
bond-slaves enp1s0f0np0 enp1s0f1np1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2
mtu 9000

auto bond1
iface bond1 inet static
address 10.10.10.76/24
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
#quorum

auto vmbr1
iface vmbr1 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9000
#100G_Bond_Bridge_Vlan

auto vmbr1.160
iface vmbr1.160 inet manual
mtu 1500
#CNC_160

auto vmbr1.170
iface vmbr1.170 inet manual
mtu 1500
#Video_170

auto vmbr1.180
iface vmbr1.180 inet manual
mtu 1500
#DMZ_180

auto vmbr1.199
iface vmbr1.199 inet manual
mtu 1500
#VOIP_199

auto vmbr1.201
iface vmbr1.201 inet static
address 192.168.201.76/24
gateway 192.168.201.254
mtu 1500
#MGMT_201

auto vmbr1.202
iface vmbr1.202 inet manual
mtu 1500
#WLAN_202

auto vmbr1.1
iface vmbr1.1 inet manual
mtu 1500
#IINTERN

auto vmbr1.230
iface vmbr1.230 inet static
address 192.168.230.76/24
mtu 9000
#CEPH_LINBIT_230


also with iperf3 we see a lot of retries:
root@pve2:/etc/pve# iperf3 --bind 192.168.230.76 -c 192.168.230.72 -p 22222 -t 10
Connecting to host 192.168.230.72, port 22222
[ 5] local 192.168.230.76 port 44521 connected to 192.168.230.72 port 22222
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.11 GBytes 18.1 Gbits/sec 732 1.08 MBytes
[ 5] 1.00-2.00 sec 1.99 GBytes 17.1 Gbits/sec 874 516 KBytes
[ 5] 2.00-3.00 sec 1.99 GBytes 17.1 Gbits/sec 992 577 KBytes
[ 5] 3.00-4.00 sec 1.93 GBytes 16.6 Gbits/sec 703 481 KBytes
[ 5] 4.00-5.00 sec 1.86 GBytes 16.0 Gbits/sec 986 524 KBytes
[ 5] 5.00-6.00 sec 1.99 GBytes 17.2 Gbits/sec 762 533 KBytes
[ 5] 6.00-7.00 sec 2.01 GBytes 17.3 Gbits/sec 715 516 KBytes
[ 5] 7.00-8.00 sec 2.04 GBytes 17.5 Gbits/sec 1007 498 KBytes
[ 5] 8.00-9.00 sec 2.11 GBytes 18.1 Gbits/sec 917 446 KBytes
[ 5] 9.00-10.00 sec 2.17 GBytes 18.6 Gbits/sec 469 402 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 20.2 GBytes 17.4 Gbits/sec 8157 sender
[ 5] 0.00-10.00 sec 20.2 GBytes 17.4 Gbits/sec receiver



funny enough on the same mellanox ConnectX-5 CArd but with only 2 x 25GBE i hardly see any retries.
 

Attachments

  • Bildschirmfoto 2024-06-20 um 08.02.41.png
    Bildschirmfoto 2024-06-20 um 08.02.41.png
    280.8 KB · Views: 2
Last edited:
iperf3 is not multithreaded (until recently before 3.16), so maybe you could be core limited.

(Try to use iperf2 with -P option or launch multiple iperf3 in //)

and try to increase windows size, to be sure to not be pps limited.
 
iperf2 is not available anymore to install only iperf and and iperf3 (which i used with 8 threads)

root@pve2:~# iperf --bind 192.168.230.76 -c 192.168.230.72 -p 22222 -t 60
------------------------------------------------------------
Client connecting to 192.168.230.72, TCP port 22222
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.230.76 port 57843 connected with 192.168.230.72 port 22222 (icwnd/mss/irtt=87/8948/168)
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-60.0074 sec 127 GBytes 18.2 Gbits/sec
root@pve2:~#


root@pve2:~# iperf3 --bind 192.168.230.76 -c 192.168.230.72 -p 22222 -P 8 -t 60
Connecting to host 192.168.230.72, port 22222
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-60.00 sec 21.0 GBytes 3.01 Gbits/sec 9068 sender
[ 5] 0.00-60.00 sec 21.0 GBytes 3.00 Gbits/sec receiver
[ 7] 0.00-60.00 sec 21.4 GBytes 3.06 Gbits/sec 10411 sender
[ 7] 0.00-60.00 sec 21.4 GBytes 3.06 Gbits/sec receiver
[ 9] 0.00-60.00 sec 21.2 GBytes 3.03 Gbits/sec 10753 sender
[ 9] 0.00-60.00 sec 21.2 GBytes 3.03 Gbits/sec receiver
[ 11] 0.00-60.00 sec 19.5 GBytes 2.79 Gbits/sec 8706 sender
[ 11] 0.00-60.00 sec 19.5 GBytes 2.78 Gbits/sec receiver
[ 13] 0.00-60.00 sec 17.7 GBytes 2.53 Gbits/sec 6642 sender
[ 13] 0.00-60.00 sec 17.7 GBytes 2.53 Gbits/sec receiver
[ 15] 0.00-60.00 sec 17.7 GBytes 2.53 Gbits/sec 7015 sender
[ 15] 0.00-60.00 sec 17.7 GBytes 2.53 Gbits/sec receiver
[ 17] 0.00-60.00 sec 19.9 GBytes 2.85 Gbits/sec 8613 sender
[ 17] 0.00-60.00 sec 19.9 GBytes 2.85 Gbits/sec receiver
[ 19] 0.00-60.00 sec 18.9 GBytes 2.70 Gbits/sec 7483 sender
[ 19] 0.00-60.00 sec 18.8 GBytes 2.70 Gbits/sec receiver
[SUM] 0.00-60.00 sec 157 GBytes 22.5 Gbits/sec 68691 sender
[SUM] 0.00-60.00 sec 157 GBytes 22.5 Gbits/sec receiver
 
Hi Spirit, i found the failure for whatever reason the supermicro board we use is degrading performance on the pci express lanes used for the 100G adapter.

root@pve1:~# lspci | grep ConnectX-5
c2:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
c2:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
root@pve1:~# lspci -s c2:00.1 -vvv | grep Speed
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
LnkSta: Speed 8GT/s, Width x4 (downgraded)
root@pve1:~# lspci -s c2:00.0 -vvv | grep Speed
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
LnkSta: Speed 8GT/s, Width x4 (downgraded)
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
root@pve1:~#


root@pve4:~# lspci | grep ConnectX-5
c1:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
c1:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
root@pve4:~# lspci -s c1:00.1 -vvv | grep Speed
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
LnkSta: Speed 8GT/s, Width x16
root@pve4:~# lspci -s c1:00.0 -vvv | grep Speed
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
LnkSta: Speed 8GT/s, Width x16
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
root@pve4:~#


root@pve2:~# lspci -s c1:00.1 -vvv | grep Speed
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
LnkSta: Speed 2.5GT/s (downgraded), Width x16
root@pve2:~# lspci -s c1:00.0 -vvv | grep Speed
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
LnkSta: Speed 2.5GT/s (downgraded), Width x16
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
root@pve2:~#


root@pve3:~# lspci -s c1:00.1 -vvv | grep Speed
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
LnkSta: Speed 8GT/s, Width x16
root@pve3:~# lspci -s c1:00.0 -vvv | grep Speed
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
LnkSta: Speed 8GT/s, Width x16
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
root@pve3:~#
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!