infiniband only 10 gbps warning group

bonkersdeluxe

Renowned Member
Jan 20, 2014
28
3
68
Hi @all

I have an issue and i dont know to solve it.
I connect mellanox 40 Gbit infinitiband Card to an infinitband switch per 40 Gbit Cables.

Output ibstatus
Code:
root@vsrv2:~# ibstatus
Infiniband device 'mlx4_0' port 1 status:
    default gid:     fe80:0000:0000:0000:0002:c903:000a:60e9
    base lid:     0x2
    sm lid:         0x1
    state:         4: ACTIVE
    phys state:     5: LinkUp
    rate:         40 Gb/sec (4X QDR)
    link_layer:     InfiniBand

Infiniband device 'mlx4_0' port 2 status:
    default gid:     fe80:0000:0000:0000:0002:c903:000a:60ea
    base lid:     0x3
    sm lid:         0x1
    state:         4: ACTIVE
    phys state:     5: LinkUp
    rate:         40 Gb/sec (4X QDR)
    link_layer:     InfiniBand

There shines all fine.

My /etc/network/interfaces
ib0 ib1
Code:
auto ib0
iface ib0 inet static
        address  10.10.15.2
        netmask  255.255.255.0
        pre-up modprobe ib_ipoib
        pre-up echo connected > /sys/class/net/ib0/mode
        mtu 65520

auto ib1
iface ib1 inet static
        address  10.10.15.20
        netmask  255.255.255.0
        pre-up modprobe ib_ipoib
        pre-up echo connected > /sys/class/net/ib1/mode
        mtu 65520


But on ibdiagnet

Code:
ibdiagnet
Loading IBDIAGNET from: /usr/lib/x86_64-linux-gnu/ibdiagnet1.5.7
-W- Topology file is not specified.
    Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib/x86_64-linux-gnu/ibdm1.5.7
-W- A few ports of local device are up.
    Since port-num was not specified (-p option), port 1 of device 1 will be
    used as the local port.
-I- Discovering ... 3 nodes (1 Switches & 2 CA-s) discovered.


-I---------------------------------------------------
-I- Bad Guids/LIDs Info
-I---------------------------------------------------
-I- No bad Guids were found

-I---------------------------------------------------
-I- Links With Logical State = INIT
-I---------------------------------------------------
-I- No bad Links (with logical state = INIT) were found

-I---------------------------------------------------
-I- General Device Info
-I---------------------------------------------------

-I---------------------------------------------------
-I- PM Counters Info
-I---------------------------------------------------
-I- No illegal PM counters values were found

-I---------------------------------------------------
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---------------------------------------------------
-I-    PKey:0x7fff Hosts:4 full:4 limited:0

-I---------------------------------------------------
-I- IPoIB Subnets Check
-I---------------------------------------------------
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00
-W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps

-I---------------------------------------------------
-I- Bad Links Info
-I- No bad link were found
-I---------------------------------------------------
----------------------------------------------------------------
-I- Stages Status Report:
    STAGE                                    Errors Warnings
    Bad GUIDs/LIDs Check                     0      0    
    Link State Active Check                  0      0    
    General Devices Info Report              0      0    
    Performance Counters Report              0      0    
    Partitions Check                         0      0    
    IPoIB Subnets Check                      0      1    

Please see /var/cache/ibutils/ibdiagnet.log for complete log
----------------------------------------------------------------
 
-I- Done. Run time was 1 seconds.

This Warning:
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00
-W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps

How can i solve it? I think its the group rate. 10 Gbps not 40 gbps.
My switch is an voliare 4036 infiniband switch.
SM Manager on the switch is master and enabled

Log of sm-info show

Code:
4036-5A04# sm-info show
subnet manager info is:
         sweep_interval:               15
         max_wire_smps:                16
         lmc:                          0
         max_op_vls:                   5
         transaction_timeout:          150
         head_of_queue_lifetime:       16
         leaf_head_of_queue_lifetime:  16
         packet_life_time:             18
         sminfo_polling_timeout:       5000
         polling_retry_number:         12
         reassign_lids:                disable
         babbling_port_policy:         disable
         routing_engine_names:         minhop
         log_flags:                    7
         force_link_speed:             0
         polling_rate:                 30
         mode:                         enable
         state:                        master
         sm_priority:                  15

When i check iperf i only get 8.66 Gbits/sec


Code:
Client connecting to 10.10.15.21, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 10.10.15.2 port 41372 connected with 10.10.15.21 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.1 GBytes  8.66 Gbits/sec
root@vsrv2:~#


I hope anybody can help me. I stuck at this point with headache.

Thank you!

Sincerely Bonkersdeluxe
 
So i guess i found it.

lspci | grep Mellanox
05:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)

ib_ipoib This card speed ist 10GBE. How can i use the 40 GBE Infiniti protocol under ceph?
Thank you!

Sincerely Bonkersdeluxe