Proxmox 6.1 working with older Mellanox ConnectX-2 cards in IPoIB and IB modes

ndroftheline

Well-Known Member
Jun 17, 2017
35
18
48
36
Hey all,

I recently purchased a couple of these bad boys for $13 https://www.ebay.com/itm/375-3696-01-SUN-ORACLE-40GBPS-QDR-DUAL-PORT-INFINIBAND-ADAPTER-MHQH29B-XSR/223856765950?ssPageName=STRK:MEBIDX:IT&_trksid=p2057872.m2749.l2649

Kinda freaked me out to get a dual-qsfp card with 40gb ib and 10gb IPoIB and RDMA for $13. I flashed them with the Mellanox firmware, mostly covered here: https://forums.servethehome.com/ind...2-vpi-dual-port-adapter-49-99-obo.5232/page-4

1) Install WinMFT, latest version worked for me
1.1) Download bin file from random stranger (at your peril - seems to have worked ok for me) http://www.landynamics.com/pool/Applications/Drivers/MHQH29B-XTR/MHQH29B-2.10.720.bin
2) Open up a command prompt as an administrator, and CD to the directory you copied the 2.10.720
image to. Run the following command:
flint -allow_psid_change -d mt26428_pci_cr0 -i MHQH29B-2.10.720.bin burn
3) Reboot

other notes:

https://www.mellanox.com/support/firmware/identification has a brief tutorial on using MST and Flint to determine the name of the cards and the firmware version on them.

  1. install MST
  2. in admin powershell window, run mst status
  3. flint -d <devIdentifier> query
    1. e.g., if mst status returned "mt26428_pci_cr0" you would type: flint -d mt26428_pci_cr0 query
    2. in linux, it would be a dev such as /dev/mt26428_pci_cr0
in windows, to switch to ethernet mode, install any driver (so far, tested up to v 5.50.50000) later than 4.8 (because we're using a newer firmware version than that driver understands), then can adjust the setting in Windows 10 under device manager > System devices (NOT Network adapters!) > mellanox card > Port Protocol tab > set to ETH instead of Auto
NOTE: open under System devices , NOT under Network adapters!

1584214904730.png

NOTE: for the pcie cards to work, need to have a driver *later* than 4.8 (used MLNX_VPI_WinOF-5_50_50000_All_Win10_1803_x64 because that's the version of windows 10 i had running) to support the updated firmware.

to get them on ipoib in proxmox, this reddit thread was a huge help https://www.reddit.com/r/homelab/comments/cf0ae3/still_rocking_the_connectx2_qdr_and_proxmox_ve_v60/

here's my steps:
cd ~
wget http://content.mellanox.com/ofed/ML..._OFED_LINUX-5.0-1.0.0.0-debian10.0-x86_64.tgz # or whatever the latest version is
tar xvf MLNX_OFED_LINUX-5.0-1.0.0.0-debian10.0-x86_64.tgz
dpkg -x MLNX_OFED_LINUX-5.0-1.0.0.0-debian10.0-x86_64/DEBS/COMMON/mlnx-ofed-kernel-utils_5.0-OFED.5.0.1.0.0.0.1.g34c46d3_amd64.deb ./mlnx-ofed-kernel-utils/
mkdir /etc/infiniband
touch /etc/infiniband/connectx.conf
cp ~/mlnx-ofed-kernel-utils/sbin/connectx_port_config /sbin/
chmod +x /sbin/connectx_port_config
./sbin/connectx_port_config

output looks like this:

ConnectX PCI devices :
|----------------------------|
| 1 0000:05:00.0 |
|----------------------------|

Before port change:
ib
ib

|----------------------------|
| Possible port modes: |
| 1: Infiniband |
| 2: Ethernet |
| 3: AutoSense |
|----------------------------|
Select mode for port 1 (1,2,3): 2
Select mode for port 2 (1,2,3): 2

After port change:
eth
eth

at this point you should be able to set an ip address, up the link, and ping other hosts on the network. i can, and can move files between this new test proxmox node and a windows client directly attached at >1 GB/s (!)

# pv mp1/864794_001_spp-2016.04.0-SPP2016040.2016_0317.20.iso > ~/mp2/spp.iso
5.74GiB 0:00:05 [1.06GiB/s] [========================================>] 100%

however it doesn't persist on reboot, which i'm still working out...when rebooting, i get a network service failed because the interface doesn't exist until i log on and run the port config commands.

my next interest is getting IB working. i know i have to install opensm on both nodes. will try to report back asap.

EDIT: removed irrelevant comment inmy notes
 
  • Like
Reactions: Cussy and bender99
quick update here, so i'm trying to understand more about infiniband and have gotten this far, but probably need help

on both nodes, have connected first ports together directly via copper qsfp dac, and also connected second ports together. running proxmox 6.1 on one and Windows 10 on the other.

i have used the process i described above to get IPoIB working fine, operating at line speed between the nodes. however i want to use the infiniband protocol because the adapters i have "only" do 10g on IPoIB but 40g on native IB, and I want to see my 3000 MB/s SSDs maxed out by my network.

on the windows side I appear to have been able to switch the second port to IB and assigned an IP address on a totally different subnet (10.1.1.4). on the linux side i can't figure out how to assign an IP address.

i'm able to use ibping to ping between the hosts

several issues I'm currently having:
- link is currently only showing 10g instead of 40g, which it was showing before i started playing with the actual link
- can't figure out how to assign IP address to IB interface in proxmox (is this even necessary?)
- help on how to create any kind of link between the infiniband nodes to test performance.
 
another few quick updates here:
  • figured out how to make the port configuration survive reboots. it's discussed here: https://bugs.centos.org/view.php?id=14419
    • basically, have to add an option line to /etc/modprobe.d/mlx4.conf with this:
    • options mlx4_core port_type_array=2,2
    • that sets both ports to option 2, which is Ethernet, on module load.
  • figured out that if either of the ports in either card is set to eth, the other port is limited to 10g. if i switch both to infiniband, they go back to 40g link speed as indicated by ibstat output.

issues i'm still seeking help with:
- can't figure out how to assign IP address to IB interface, trying to understand if this is even necessary
- not sure how to push data over IB instead of IPoIB. open to any helpful tips.
 
  • Like
Reactions: bender99
This exactly what I was looking for... are you doing full PCI passthrough? I'm planning to do the same setup and my cards are ConnectX-2 VPI dual port (one 10gbps & one 40gbps IB) MHZH29-XTR.
 
Hey Bender, I haven't figured out what I'm going to to exactly, yet. But, another user showed me a more elegant way of working with these cards that I think is superior to what I've posted here. Check out https://forum.proxmox.com/threads/40gbs-mellanox-infinityband.57118/page-2#post-301411 and/or https://forum.proxmox.com/threads/u...scsi-or-otherwise-read-lvs.60113/#post-301390

It simpler and I think uses more of the newer approaches to this using "real" IPoIB instead of putting the card into an alternate mode.

However I can't seem to get more than 10gbps out of my cards in any configuration, so I'm surely doing something wrong still.
 
Also I have slightly different cards, my dual-ports are both 40gbps IB that can be switched to 10gbps Ethernet mode, so YMMV of course: MHQH29B-XSR
 
I'm about to install a fresh Debian Buster and then Proxmox 6.1 via apt-get. I wanted to be sure Debian can detect the regular eth and the ib interfaces first.

I guess after that, Proxmox would use them with no problem but I'm not sure if I can do everything as a regular eth interface (like bridge or aggregation). Last time I checked the max version of Debian supported for ConnectX-2 was 9.X.

What are you using between servers? I'm going to use this switch:

https://www.mellanox.com/related-docs/voltaire_ib_switch_systems/Grid-Director-4036-WEB-071510.pdf

Which in theory can support IPoIB or pure RDMA with no problem, I'll keep you posted.
 
I am keen to know how you get on, because I'm quite new to IPoIB so I'm not sure either if you can do things like bridge or aggregation either - keep us posted as you explore.

I'm not using a switch, this is just in my lab, I have cheap, short, copper QSFP DACs connecting 3 hosts so far.
 
I have connect-x (1) cards and 1 or 2 connectx-2 cards.... they are older with a connector called CX4
anywho.. i have it working on win10 perfect... does work in esxi 6.7 and also freenas 11.
ESXI 7 and Freenas 12+ does not see it or support it.
got Proxmox installed the other day and lspci sees it - n00b linux guy here.. mainly windows...

so proxmox sees it?
lspci -k | grep Mellanox



04:00.0 Network controller: Mellanox Technologies MT25408A0-FCC-GI ConnectX, Dual Port 20Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s In... (rev a0)
Subsystem: Mellanox Technologies MT25408A0-FCC-GI ConnectX, Dual Port 20Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s Interface

so i guess i need to sit back and follow your directions for setting to eth???
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!