Unable to see Infiniband interfaces in Proxmox?

swartz

Member
Oct 30, 2014
80
3
6
I've got a pair of infiniband HCAs. They are dual-mode and support both infiniband and native 10GigE (not IPoIB but actual 10gigE).

I tested them and updated firmware to latest available on a CentOS 7.1 box. HCAs are good.
But I am unable to bring them up in Proxmox with neither IPoIB nor 10gigE working.

I can see it
Code:
>lspci | grep Infiniband
0d:00.0 InfiniBand: Mellanox Technologies ....

#dmesg | grep mlx4
mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
mlx4_core: Initializing 0000:0d:00.0
mlx4_core 0000:0d:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
mlx4_core 0000:0d:00.0: setting latency timer to 64
mlx4_core 0000:0d:00.0: irq 85 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 86 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 87 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 88 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 89 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 90 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 91 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 92 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 93 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 94 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 95 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 96 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 97 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 98 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 99 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 100 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 101 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 102 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 103 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 104 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 105 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 106 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 107 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 108 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 109 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 110 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 111 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 112 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 113 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 114 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 115 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 116 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 117 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 118 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 119 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 120 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 121 for MSI/MSI-X
mlx4_core 0000:0d:00.0: irq 122 for MSI/MSI-X
mlx4_core 0000:0d:00.0: command 0xc failed: fw status = 0x40
mlx4_core 0000:0d:00.0: command 0xc failed: fw status = 0x40

There are no devices listed for this HCA in either /sys/class/infiniband/ or /sys/class/net/


Am I missing some extra packages/libraries or config settings? How do I get it to work?
 
Last edited:
Re: Unable to see Infiniband interfaces on Proxmox?

Mine shows:
lspci | grep InfiniBand
01:00.0 InfiniBand: Mellanox Technologies MT25408 [ConnectX VPI - IB SDR / 10GigE] (rev a0)

dmesg |grep mlx4
[ 2.206800] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[ 2.206989] mlx4_core: Initializing 0000:01:00.0
[ 4.538385] mlx4_core 0000:01:00.0: PCIe link speed is 2.5GT/s, device supports 2.5GT/s
[ 4.538629] mlx4_core 0000:01:00.0: PCIe link width is x8, device supports x8
[ 4.538835] mlx4_core 0000:01:00.0: irq 79 for MSI/MSI-X
[ 4.538844] mlx4_core 0000:01:00.0: irq 80 for MSI/MSI-X
[ 4.538853] mlx4_core 0000:01:00.0: irq 81 for MSI/MSI-X
[ 4.538863] mlx4_core 0000:01:00.0: irq 82 for MSI/MSI-X
[ 4.538872] mlx4_core 0000:01:00.0: irq 83 for MSI/MSI-X
[ 4.538881] mlx4_core 0000:01:00.0: irq 84 for MSI/MSI-X
[ 4.538892] mlx4_core 0000:01:00.0: irq 85 for MSI/MSI-X
[ 4.538900] mlx4_core 0000:01:00.0: irq 86 for MSI/MSI-X
[ 4.538908] mlx4_core 0000:01:00.0: irq 87 for MSI/MSI-X
[ 4.538916] mlx4_core 0000:01:00.0: irq 88 for MSI/MSI-X
[ 4.538925] mlx4_core 0000:01:00.0: irq 89 for MSI/MSI-X
[ 4.538933] mlx4_core 0000:01:00.0: irq 90 for MSI/MSI-X
[ 4.538941] mlx4_core 0000:01:00.0: irq 91 for MSI/MSI-X
[ 4.538950] mlx4_core 0000:01:00.0: irq 92 for MSI/MSI-X
[ 4.578386] mlx4_core 0000:01:00.0: command 0xc failed: fw status = 0x40
[ 4.578696] mlx4_core 0000:01:00.0: command 0xc failed: fw status = 0x40
[ 5.944077] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
 
Re: Unable to see Infiniband interfaces on Proxmox?

What kernel do you use?

This is a vanilla Proxmox v3.4 install with all updates

Code:
# pveversion -v
proxmox-ve-2.6.32: 3.4-156 (running kernel: 2.6.32-39-pve)
pve-manager: 3.4-6 (running version: 3.4-6/102d4547)
pve-kernel-2.6.32-39-pve: 2.6.32-156
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-17
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
Re: Unable to see Infiniband interfaces on Proxmox?

Latest 2.6.32 I was able to get Infiniband working with was pve-kernel-2.6.32-34-pve so you could try that kernel.
 
Re: Unable to see Infiniband interfaces on Proxmox?

Mine shows:
[ 5.944077] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)

I do not have that last line. But that made be think I should see what modules are available.
Looks like there is mlx4_core, which is loaded. But mlx4_ib and mlx4_en are also present on the filesystem but not loaded.

After fresh reboot

Code:
#  lsmod |egrep '(^mlx|^ib)'
ib_iser                42209  0
ib_cm                  37320  1 rdma_cm
ib_sa                  24369  2 ib_cm,rdma_cm
ib_mad                 39879  2 ib_sa,ib_cm
ib_core                81369  6 ib_mad,ib_sa,ib_cm,iw_cm,rdma_cm,ib_iser
ib_addr                 8662  2 ib_core,rdma_cm
mlx4_core             227806  0

Doing modprobe ml4_en does not bring up additional ethernet interfaces. Nothing new is seen in /sys/class/net/.

After doing modprobe mlx4_ib I can see infiniband device as /sys/class/infiniband/mlx4_0/ but still nothing in /sys/class/net/


What are you doing different that you get mlx4_ib autoloaded?
 
Last edited:
Re: Unable to see Infiniband interfaces on Proxmox?

What does this show with you?

lsmod |egrep '(^mlx|^ib)'
ib_iser 51894 0
ib_srp 42571 0
ib_ipoib 91605 0
ib_umad 22166 8
ib_uverbs 42914 4 rdma_ucm
ib_cm 42732 3 rdma_cm,ib_srp,ib_ipoib
mlx4_ib 146240 1
ib_sa 33904 6 rdma_cm,ib_cm,mlx4_ib,ib_srp,rdma_ucm,ib_ipoib
ib_mad 47395 4 ib_cm,ib_sa,mlx4_ib,ib_umad
ib_core 88355 12 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_srp,ib_iser,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
ib_addr 18962 3 rdma_cm,ib_core,rdma_ucm
mlx4_core 255794 1 mlx4_ib
 
Re: Unable to see Infiniband interfaces on Proxmox?

What does this show with you?

lsmod |egrep '(^mlx|^ib)'
...

I updated my last post above to show you what it looks like right after a fresh reboot where neither mlx4_ne nor mlx4_ib are loaded.


Native 10gigE attempt

After doing modprobe mlx4_en I get this:
Code:
ib_iser                42209  0
ib_cm                  37320  1 rdma_cm
ib_sa                  24369  2 ib_cm,rdma_cm
ib_mad                 39879  2 ib_sa,ib_cm
ib_core                81369  6 ib_mad,ib_sa,ib_cm,iw_cm,rdma_cm,ib_iser
ib_addr                 8662  2 ib_core,rdma_cm
mlx4_en               104224  0
mlx4_core             227806  1 mlx4_en

dmesg | grep mlx4 also sees mlx4_en activated
Code:
mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb 2014)
mlx4_en 0000:0d:00.0: UDP RSS is not supported on this device.


But there are no new net interfaces visible anywhere (e.g. /sys/class/net/).

Some Google-ing reveals I need to configure the ports via /etc/rdma/mlx4.conf file (at least on RHEL-based systems). There is no such file on Proxmox, creating it with appropriate values does nothing to enable 10gigE interfaces.


Infiniband (IPoIB) attemp

Fresh reboot (yeah I know I can do rmmod). Then modprobe mlx4_ib this time and I get:
Code:
lsmod |egrep '(^mlx|^ib)'
mlx4_ib               137662  0
ib_iser                42209  0
ib_cm                  37320  1 rdma_cm
ib_sa                  24369  3 ib_cm,rdma_cm,mlx4_ib
ib_mad                 39879  3 ib_sa,ib_cm,mlx4_ib
ib_core                81369  7 ib_mad,ib_sa,ib_cm,iw_cm,rdma_cm,ib_iser,mlx4_ib
ib_addr                 8662  2 ib_core,rdma_cm
mlx4_core             227806  1 mlx4_ib

dmesg | grep mlx4 confirms mlx4_ib loaded
Code:
<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)

And I see /sys/class/infiniband/mlx4_0/ as before. Still no additional ib0/1 interfaces under /sys/class/net/.
 
Last edited:
Ok looking closer and comparing with your output, mir, I just realized I don't have ib_ipoib module loaded.

OK, once I load it, I can see ib0 and ib1 network interfaces. So I was able to bring up IPoIB. Thank you, mir!

Question, how the hell are your mlx4_ib, ib_ipoib, and other ib-related modules autoloaded? CentOS 7.1 did this for me automagically.


What do your /etc/modules and /etc/modprobe.d/* look like?


That still leaves the question how to switch the ports to native 10gigE mode...
 
Last edited:
What do your /etc/modules and /etc/modprobe.d/* look like?
I have this in my /etc/modules file:
# infiniband
# Mellanox ConnectX cards
mlx4_ib

# Protocol modules
# Common modules
rdma_ucm
ib_umad
ib_uverbs
# IP over IB
ib_ipoib
# scsi over IB
ib_srp

That still leaves the question how to switch the ports to native 10gigE mode...
I think you have misunderstood this. There is no such thing as native 10gigE mode as in native 10Gb ethernet mode. What is says is that your infiniband nic is capable of providing 10Gb infiniband.
 
I have this in my /etc/modules file:
# infiniband
# Mellanox ConnectX cards
mlx4_ib

# Protocol modules
# Common modules
rdma_ucm
ib_umad
ib_uverbs
# IP over IB
ib_ipoib
# scsi over IB
ib_srp
Oh I see. You added it manually. Well, good to know.

I think you have misunderstood this. There is no such thing as native 10gigE mode as in native 10Gb ethernet mode. What is says is that your infiniband nic is capable of providing 10Gb infiniband.

Negative, comrade. We both have such cards (well, mine is faster with DDR infiniband :)) .
Mellanox VPI cards are both Infiniband and 10gigE in one package.
Later ConnectX- (2 or 3) also support up to 40gigE. And latest ConnectX-4 goes up to 100 gigE.
I'm willing to bet my entire annual salary on that.
 
Last edited:
In fact I just found how to enable native 10gigE mode.

Code:
modprobe mlx4_en
echo eth > /sys/bus/pci/devices/0000\:0d\:00.0/mlx4_port2

And I get 2 native ethernet interfaces in /sys/class/net/.


I have to echo to port2, otherwise I get an error back
"-bash: echo: write error: Invalid argument"

With this in dmesg
"mlx4_core 0000:0d:00.0: Only same port types supported on this HCA, aborting."

That tells me that my firmware isn't configured to support DPDP.
Some board revisions (or I should say firmwares) have DPDP enabled and allow each port to be independently configured as ib or eth.


There has to be a way to configure this via config files. I can't find any docs that specify which files mlx4_en will look at.
Everything I found is for RHEL and mentions /etc/rdma/mlx4.conf but I've tried creating it with no effect.
Any ideas?
 
Last edited:
After doing some reading up, this is how to set port type at boot time.
By specifying port_type_array parameter (0 = auto, 1 = ib, 2 = eth)

So I've tried to add this line to /etc/modprobe.d/mlx4.conf
Code:
options mlx4_core port_type_array=2,2

However above option gets ignored at boot time and I can confirm that it stays at default 0,0.
Code:
cat /sys/module/mlx4_core/parameters/port_type_array
0,0
Huh?

But I can successfully load it set it from command line like this:
Code:
rmmod mlx4_en mlx4_core
modprobe mlx4_core port_type_array=2,2
modprobe mlx4_en
And everything works.

What am i missing?
 
Try:
options mlx4_core port_type_array="2,2"

Just tried adding quotes. Didn't work either.


Also tried in /etc/modprobe.d/mlx4.conf

Code:
install mlx4_core /sbin/modprobe mlx4_core port_type_array="2,2"
 
Just for "fun" I've tried setting few other of mlx4_core params in /etc/modprobe.d/mlx4.conf: probe_vf, num_vfs, hpn, debug_level.
First 2 are undefined by default. Other two are both 0 by default.

Code:
options mlx4_core port_type_array=2,2 num_vfs=8,0,0 probe_vf=8,0,0 hpn=1 debug_level=6
Tried both with and without quotes for array values. No change. I'm unable to change any of these at boot either.

It's as though mlx4_core ignores params at boot. Once system boots I can do modprobe and successfully reload and change params.
What the?
 
Last edited:
Re: Unable to see Infiniband interfaces on Proxmox?

Well, I guess I got it to work. (edit: not yet.)All it took was to add the module and its params to /etc/modules. Somehow missed the fact I could specify params there too (edit: nope, params still ignored).

Code:
cat /etc/modules
mlx4_core  port_type_array=2,2
mlx4_en


I still don't understand why I couldn't use the /etc/modprobe.d/ facilities via
options mlx4_core port_type_array="2,2"

If someone could explain why such tactic works in CentOS but not in Debian (proxmox) I'd very much like to hear that.


Update:
The above method of inserting params into /etc/modules didn't actually work.
But something did make it load. Yes, I did run update-initramfs -u but only when changing /etc/modprobe.d/mlx4.conf
 
Last edited:
Re: Unable to see Infiniband interfaces on Proxmox?

did you remember to update the initrd after you changed /etc/modprobe.d/mlx4.conf ?
 
Re: Unable to see Infiniband interfaces on Proxmox?

did you remember to update the initrd after you changed /etc/modprobe.d/mlx4.conf ?

I did run that as I changed modprob.d. But it appears I've made too many changes in too many places. I had to re-trace my steps.

I'm now confident this method for enabling native 10gigE mode works:

Code:
#cat /etc/modules
mlx4_en

Code:
#cat /etc/modprobe.d/mlx4.conf
options mlx4_core port_type_array="2,2"

Followed by:
Code:
update-initramfs -u

Right after a reboot:
Code:
# ethtool eth4
Settings for eth4:
        Supported ports: [ TP ]
        Supported link modes:   10000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: No
        Advertised link modes:  10000baseT/Full

Ta-da!
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!