Infiniband 40gbps capped at 12.5gbps

wahmed

Famous Member
Oct 28, 2012
1,119
46
113
Calgary, Canada
www.symmcom.com
To all my fellow Infiniband users,
Is there a reason why the link between two 40gbps Infiniband hosts are capped no more than 12.5gbps? I setup Infiniband in Proxmox following this wiki https://pve.proxmox.com/wiki/Infiniband

iperf tests always shows between 12.1 to 12.5 gbps.

Missing some configuration?
 
Since Infiniband is 8/10 encoded your total usable bandwidth will be 32gbps, you are pretty far from that.

I've only use 10gbps Infiniband but here are my guesses:
1. TCP/IP stack can't go any faster, is a CPU core pegged during the iperf operation?
2. Maybe you need to increase buffers in iperf
3. Did you set the MTU to maximum 65520?
4. Maybe you need to tune sysctl variables more to get that throughput
5. Maybe they are not connecting at 40gbps even tho they are 40gbps cards?
 
Since Infiniband is 8/10 encoded your total usable bandwidth will be 32gbps, you are pretty far from that.

I've only use 10gbps Infiniband but here are my guesses:
1. TCP/IP stack can't go any faster, is a CPU core pegged during the iperf operation?
2. Maybe you need to increase buffers in iperf
3. Did you set the MTU to maximum 65520?
4. Maybe you need to tune sysctl variables more to get that throughput
5. Maybe they are not connecting at 40gbps even tho they are 40gbps cards?

1. The nodes i am testing on are blank and has no VMs on them. CPU is Xeon E5-2620 with 64GB RAM
2. Tried to increase iperf buffer multiple times, no effect. The only difference if i run 2 parallel iperf with iperf -c x.x.x.x -P 2 command total bandwidth shows 17.6gbps.
3. MTU is set to 65520
4. i have increased sysctl values 7 times more than what it was, no effect.
5. Have 7 nodes with identical infiniband cards, all shows the same result. They are supermicro blade m/board. Confirmed with supermicro that they are indeed 40gbps infiniband.

I would be happy to even get 25gbps out of it.
 
You have studied the following links?
http://pkg-ofed.alioth.debian.org/howto/infiniband-howto.html

From what I have been able to read you need to get the latest drivers from Mellanox. Newest drivers for Debian 7.5 is dated September 2014:
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
The above link also have some tuning reference document as well.

Also 40gbps seems to require special termination adapters. Eg. you can not use cables for 10 or 20gbps adapters if you want to have full 40gbps.
 
Last edited:
You have studied the following links?
http://pkg-ofed.alioth.debian.org/howto/infiniband-howto.html

From what I have been able to read you need to get the latest drivers from Mellanox. Newest drivers for Debian 7.5 is dated September 2014:
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
The above link also have some tuning reference document as well.

Also 40gbps seems to require special termination adapters. Eg. you can not use cables for 10 or 20gbps adapters if you want to have full 40gbps.

I did read the howto from pkg-ofed, but have not installed latest drivers from Mellanox. So i went ahead and downloaded the latest package from Mellanox site and installed on one of the node. After i rebooted the node it is stuck at the following screen:
dev.png

But the node itself seems to be working fine. I can SSH into, migrate/run VMs. Just the node itself is stuck at this prompt. Does not goto login screen. Any idea? The driver installation went fine without errors. The Inifniband cards are QDR and i am using FDR rated cables which can handle 40GBPS easily.
 
Do you see the same if you boot in fail-safe mode?

It looks as the newly applied Mellanox driver causes the udev init script to fail. You are sure that you have black listed the default infiniband drivers?
 
Last edited:
Do you see the same if you boot in fail-safe mode?

It looks as the newly applied Mellanox driver causes the udev init script to fail. You are sure that you have black listed the default infiniband drivers?
It does not look it boots in fail-safe mode either.
And..umm.. no. i did not blacklist the default infiniband drivers before installing new Mellanox ones. Before installing i should have disabled mlnx4_ib, ib_umad and ib_ipoib in /etc/modules?
 
It could easily be that udev tries to create devices for both set of drivers in which case udev breaks when trying to create the second set of devices which will class.
 
I know what I will say is not the solution, but perhaps it can serve someone ...

1. TCP/IP stack can't go any faster, is a CPU core pegged during the iperf operation?

I have tested iperf with Intel ethernet X520-DA2 10 Gb/s Dual Port with bonding "RR" (NIC-to-NIC), with the tuning the result was 19.9 Gb/s of speed of transfer, so i don't believe that iperf have this limitation, maybe the driver be the problem.

On the other hand, soon i will test with Intel Ethernet X520-QDA1 40 Gb/s with bonding "RR" (NIC-to-NIC), the driver is the same that for the Intel X520-DA2 ("ixgbe", officially updated by PVE), so i guess that i will get 80 Gb/s of speed of connection. If you want to know my results, i can tell you.

Best regards
Cesar
 
I know what I will say is not the solution, but perhaps it can serve someone ...
I have tested iperf with Intel ethernet X520-DA2 10 Gb/s Dual Port with bonding "RR" (NIC-to-NIC), with the tuning the result was 19.9 Gb/s of speed of transfer, so i don't believe that iperf have this limitation, maybe the driver be the problem.

On the other hand, soon i will test with Intel Ethernet X520-QDA1 40 Gb/s with bonding "RR" (NIC-to-NIC), the driver is the same that for the Intel X520-DA2 ("ixgbe", officially updated by PVE), so i guess that i will get 80 Gb/s of speed of connection. If you want to know my results, i can tell you.

Best regards
Cesar
Ethernet seems to work much easier than IB. It does not have configuration and performance overhead as Infiniband. We have invested too much into IB now to turn back to Ethernet.

After trying many things it seems to me that IP over IB is the root cause of the problem. No matter what i try, IPoIB will always have performance issue. I am leaning towards rebuilding Mellanox IB with suggested drivers on top of an OS. If i remove IB drivers came with Proxmox completely and try to install Mellanox provided OFED drivers would it cause major issue to my Proxmox node?
Can i try to setup Infiniband anything other than IPoIB?
 
Where do you running the subnet manager? On a switch or as a software daemon on a server. The software subnet manager can cause performance issues because it requires lot's of performance if the process reached 100% cpu limit.
 
Where do you running the subnet manager? On a switch or as a software daemon on a server. The software subnet manager can cause performance issues because it requires lot's of performance if the process reached 100% cpu limit.
Subnet Manager opensm is running on a server node. As far as i can tell opensm is not even using half of the Xeon CPU resource available in the node.
 
Is one opensm process running at 100% of one core? The connection slow down when one process hits the 100% cpu. More cores doesn't help here.
I think at QDR a hardware subnet manager is recommend to get the complete performance. We using infiniband qdr to with mellanox cards and mellanox switches and they do not have such an issue. We have tried opensm but this limit the performance.
 
Is one opensm process running at 100% of one core? The connection slow down when one process hits the 100% cpu. More cores doesn't help here.
I think at QDR a hardware subnet manager is recommend to get the complete performance. We using infiniband qdr to with mellanox cards and mellanox switches and they do not have such an issue. We have tried opensm but this limit the performance.
Hmmm worth trying to setup the hardware subnet manager. Which switch are you using? We have Mellanox IS5022 8 port.
Are you using IPoIB or udma/sdp?
 
We using NFS over RDMA, but IPoIB have nearly the same performance. Take a look here: http://forum.proxmox.com/threads/9628-Anyone-using-NFS-over-Infiniband
We have IS5035 and using the hardware subnet manager on the switch. Your switch also have the hardware subnet manager, disable opensm, activate the subnet managet agent on the switch and try it again.

Connected mode is on?
Probably a silly question. But how do i activate subnet manager in the infiniband switch? The user manual for IS5022 does not seem to mention a thing about it other than detailed instruction how to update firmware.
Yes, connected mode is on.
 
I figured out part of the problem. The initial Infiniband card i installed was actually Connect-X QDR which Mellanox site has no updated driver for. I recently got ConnectX-2 dual port 40gbps Infiniband cards and results looks much more promising. With this card iperf shows me 19gbps bandwidth. All sysctl tweaking has been applied already.
Now i am trying to install recommended Mellanox drivers fro the card. Starting with firmware update. The Mellanox firmware burning tool MFT is trying to install RPM package with dependencies error for rpm-build, kernel-devel. After checking further it appears that the auto installer of the tools only creates RPM package. Is there a way to install tool to create RPM package in Proxmox/Debian then convert them to DEB by using alien?
What other ways i can install latest firmware and driver for infiniband on Proxmox?
 
apt-cache show librpmbuild3
Package: librpmbuild3
Source: rpm
Version: 4.10.0-5+deb7u1
Installed-Size: 1070
Maintainer: Michal Čihař <nijel@debian.org>
Architecture: amd64
Depends: libc6 (>= 2.4), libmagic1, libpopt0 (>= 1.14), librpm3 (>= 4.10.0), librpmio3 (>= 4.10.0)
Description-en: RPM build shared library
The RPM Package Manager (RPM) is a command-line driven package
management system capable of installing, uninstalling, verifying,
querying, and updating computer software packages.
.
This library provides an interface for building RPM packages.
Homepage: http://rpm.org/

The following package has the binaries but the former is needed for the binary rpmbuild
apt-cache show rpm
Package: rpm
Version: 4.10.0-5+deb7u1
Installed-Size: 1292
Maintainer: Michal Čihař <nijel@debian.org>
Architecture: amd64
Replaces: manpages-pl (<< 20051017-1)
Depends: libc6 (>= 2.4), libelf1 (>= 0.131), libpopt0 (>= 1.14), librpm3 (>= 4.10.0), librpmbuild3 (>= 4.10.0), librpmio3 (>= 4.10.0), librpmsign1 (>= 4.10.0), perl, rpm2cpio, rpm-common (= 4.10.0-5+deb7u1)
Suggests: alien, elfutils, rpm-i18n
Breaks: man-db (<< 2.5.0-1), manpages-pl (<< 20051017-1)
Description-en: package manager for RPM
The RPM Package Manager (RPM) is a command-line driven package
management system capable of installing, uninstalling, verifying,
querying, and updating computer software packages.
.
On Debian and derived systems it is recommended to use "alien" to
convert RPM packages into .deb format instead of bypassing the Debian
package management system by installing them directly with rpm.
 
apt-cache show librpmbuild3
Package: librpmbuild3
Source: rpm
Version: 4.10.0-5+deb7u1
Installed-Size: 1070
Maintainer: Michal Čihař <nijel@debian.org>
Architecture: amd64
Depends: libc6 (>= 2.4), libmagic1, libpopt0 (>= 1.14), librpm3 (>= 4.10.0), librpmio3 (>= 4.10.0)
Description-en: RPM build shared library
..........................................................

Thank you Mir as always, i now have updated Firmware on all Infiniband cards. Now a new situation. I have downloaded latest OFED drivers from Mellanox but it seem to require debian 7.5 or older. Whenever i try to install the package it says this MLNX_OFED_LINUX is intended for debian 7.5. This node is updated to Proxmox 3.3 which comes with debian 7.6. Anyway to install latest IB driver?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!