Proxmox Installer does not see Mellanox ConnectX-3 card at all?

victorhooi

Well-Known Member
Apr 3, 2018
250
20
58
38
I am trying to install Proxmox 5.1 on a Dell R720XD, with a Mellanox ConnectX-3 card in it.

However, when I boot from the installer, it only sees the four inbuilt Gigabit Ethernet ports - it does not see the Mellanox card at all.

I saw this earlier post, which talks about changing the card between ib (Infiniband?) and Ethernet mode.

Would that explain why the Proxmox installer does not see the card at all?

I already have a current Proxmox installation on the machine - this is a re-install. When I use lspci-k, I see:

Code:
root@gcc-proxmox:~# lspci -k | grep Mell
41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
    Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]

So it seems the card is detected by the default kernel at any rate.

What's the easiest way of getting these MFT (Mellanox Firmware Tools) so that I can set the right mode permanently, and then reinstall Proxmox?
 
I was able to find the Mellanox management tools from here:

http://www.mellanox.com/page/management_tools

I downloaded the file for Debian x64, untamed it, and ran the install.sh script.

Code:
root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# ./install.sh

-I- Removing mft external packages installed on the machine
-I- Installing package: /root/mft-4.10.0-104-x86_64-deb/SDEBS/kernel-mft-dkms_4.10.0-104_all.deb
-I- Installing package: /root/mft-4.10.0-104-x86_64-deb/DEBS/mft-4.10.0-104.amd64.deb
-I- In order to start mst, please run "mst start".
I then started mst and looked for my card:
Code:
root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4099_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:41:00.0 addr.reg=88 data.reg=92
                                   Chip revision is: 01
/dev/mst/mt4099_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:41:00.0 bar=0xd4f00000 size=0x100000
                                   Chip revision is: 01

There's some information here about changing the connection type:

https://community.mellanox.com/docs/DOC-2755

I tried to read the device configuration, but it's telling me the firmware is too old.
Code:
root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# mlxconfig -d /dev/mst/mt4099_pciconf0 q
-E- Failed to open device: /dev/mst/mt4099_pciconf0. Unsupported FW (version 2.31.5000 or above required for CX3/PRO)

Does anybody know how to upgrade the firmware on this card easily?
 
Ok, so I was able to download the firmware from here:

http://www.mellanox.com/page/firmware_table_ConnectX3EN

I wasn't sure what the "OPN" field is - however, by checking them all, I was able to find one that matched my PSID.

I got my PSID like so (http://www.mellanox.com/page/firmware_HCA_FW_identification):

Code:
root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# flint -d /dev/mst/mt4099_pci_cr0 query
Image type:            FS2
FW Version:            2.10.4290
Rom Info:              type=PXE version=3.4.0 proto=VPI
Device ID:             4099
Description:           Node             Port1            Port2            Sys image
GUIDs:                 0002c90300056aa8 0002c90300056aa9 0002c90300056aaa 0002c90300056aab
MACs:                                       0002c9237720     0002c9237721
VSD:
PSID:                  MT_1170110023

I also saved a bunch of information from the card, per this page -
http://mitrocketscience.blogspot.com/2018/05/flashing-rebranded-mellanox-infiniband.html

Code:
root@gcc-proxmox:~# flint -d /dev/mst/mt4099_pci_cr0 query full > flint_query.txt
root@gcc-proxmox:~# flint -d /dev/mst/mt4099_pci_cr0 hw query > flint_hwinfo.txt
root@gcc-proxmox:~# flint -d /dev/mst/mt4099_pci_cr0 ri orig_firmware.bin
root@gcc-proxmox:~# flint -d /dev/mst/mt4099_pci_cr0 dc orig_firmware.ini
root@gcc-proxmox:~# flint -d /dev/mst/mt4099_pci_cr0 rrom orig_rom.bin
root@gcc-proxmox:~# mlxburn -d /dev/mst/mt4099_pciconf0 -vpd > orig_vpd.txt

Now we write the new firmware:

Code:
root@gcc-proxmox:~# flint -d /dev/mst/mt4099_pci_cr0 -i fw-ConnectX3-rel-2_42_5000-MCX311A-XCA_Ax-FlexBoot-3.4.752.bin burn

    Current FW version on flash:  2.10.4290
    New FW version:               2.42.5000

Burn process will not be failsafe. No checks will be performed.
ALL flash, including the Invariant Sector will be overwritten.
If this process fails, computer may remain in an inoperable state.

 Do you want to continue ? (y/n) [n] : y
Burning FS2 FW image without signatures - OK
Restoring signature                     - OK

I rebooted the server, and tried to check the interface type again:

Unfortunately - it gives me a "Failed to query device current configuration" error:
Code:
root@gcc-proxmox:~# mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
root@gcc-proxmox:~# mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4099_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:41:00.0 addr.reg=88 data.reg=92
                                   Chip revision is: 01
/dev/mst/mt4099_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:41:00.0 bar=0xd4f00000 size=0x100000
                                   Chip revision is: 01
root@gcc-proxmox:~# mlxconfig -d /dev/mst/mt4099_pciconf0 q

Device #1:
----------

Device type:    ConnectX3
Device:         /dev/mst/mt4099_pciconf0

Configurations:                              Next Boot
-E- Failed to query device current configuration

I also tried resetting it, as per https://community.mellanox.com/thread/4160:

Code:
root@gcc-proxmox:~# mlxconfig -d /dev/mst/mt4099_pciconf0 reset

 Reset configuration for device /dev/mst/mt4099_pciconf0?  ? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

But I got the same error after a reboot.

Has anybody had any luck with these cards?
 
  • Like
Reactions: paschtin
Hi,

I can only tell you that it works on ConnectX-4 and ConnectX-5 without problems.
Check the logs if you see an error.
 
Syslog and dmesg.
If it is a driver problem you should see something.
 
Ok, I might have been barking up the wrong tree.

According to here, that model of ConnectX-3 only does Ethernet anyhow.

ethanol seems to show the card (do

Code:
root@gcc-proxmox:~/MLNX_OFED_LINUX-4.4-2.0.7.0-debian9.4-x86_64# ethtool -i enp65s0
driver: mlx4_en
version: 4.0-0
firmware-version: 2.42.5000
expansion-rom-version:
bus-info: 0000:41:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Also, I can see it in the Proxmox UI:

4mkw3Zc.png

Yet I just booted up the Proxmox 5.1 installer, and it did not see that device - only eno1-4.

Is there a reason the installer isn't picking it up? Could it be a bug in the installer?

As a workaround - what's the easiest way of changing this running install to using the Mellanox as it's main interface.

Can I simply edit vmbr0 to change the bridge port from eno1 to enp65s0, or is it a bit more complicated than that?
 
In the PVE 5.2 installer the diver mlx4_en is present and also used.
Please try with PVE 5.2 we do not debug old versions.
 
Hi,

Sorry, I misspoke before - I'm actually using Proxmox 5.2. I checked the
Code:
file on the USB:

Code:
Release Notes:
--------------

16.05.2018: Proxmox Virtual Environment 5.2 (ISO release 1)

  - Kernel 4.15.17

  - Ceph v12.2.5 (stable)

  - ZFS 0.7.8

  - Qemu 2.11.1

  - LXC 3.0.0

  - Misc. bug fixes and improvements

So you're saying that the installer should see the Mellanox card, when it's letting me pick a network device?

If this was a bug - are there any logs I can grab from the installer system somehow, that would help debug it?

Also - if I install it with one of the inbuilt NIC cards, how hard is it to change the network over to the Mellanox?
 
So you're saying that the installer should see the Mellanox card, when it's letting me pick a network device?
As I say I can't test it but I don't know why it should not work, because the module is known and with the mlx5_core it works.
The only think what I could is that there is a dependency on mlx4_en to another kernel module what can't satisfied.


Also - if I install it with one of the inbuilt NIC cards, how hard is it to change the network over to the Mellanox?
You only need to change the bridge-ports of vmbr0.
 
As I say I can't test it but I don't know why it should not work, because the module is known and with the mlx5_core it works.
The only think what I could is that there is a dependency on mlx4_en to another kernel module what can't satisfied.

I can try to take a screenshot/photo later of the installer, showing the four on-board NICs, but no Mellanox device.

I don't know how helpful that is?

Is there any other diagnostics information I can capture from the installer, that might confirm this, or help find the root cause?

You only need to change the bridge-ports of vmbr0.

I assume it's not safe to change this via the GUI though, right? (As in, the management port runs over that interface, so I assume it will drop my connection once I change it, right?)

Should I be hand-editing the configuration files somewhere?
 
I assume it's not safe to change this via the GUI though, right? (As in, the management port runs over that interface, so I assume it will drop my connection once I change it, right?)
The change requires a reboot so the connection will interrupt anyway.

Should I be hand-editing the configuration files somewhere?
You can also edit it directly in the /etc/network/interfaces and reboot. The GUI does similar the same.
 
Hi,

Sorry, I misspoke before - I'm actually using Proxmox 5.2. I checked the
Code:
file on the USB:

Code:
Release Notes:
--------------

16.05.2018: Proxmox Virtual Environment 5.2 (ISO release 1)

  - Kernel 4.15.17

  - Ceph v12.2.5 (stable)

  - ZFS 0.7.8

  - Qemu 2.11.1

  - LXC 3.0.0

  - Misc. bug fixes and improvements

So you're saying that the installer should see the Mellanox card, when it's letting me pick a network device?

If this was a bug - are there any logs I can grab from the installer system somehow, that would help debug it?

Also - if I install it with one of the inbuilt NIC cards, how hard is it to change the network over to the Mellanox?

The mellanox modules are not loaded by default, and you dont need them for installation. you can always add them to your modules-load.conf OR install ofed once installation is complete.

You can always edit your network configuration at any time. the only exception to this is once you set up your clusters; then you just need to exersize care to make sure your cluster traffic is visible to the changed interfaces.
 
The mellanox modules are not loaded by default, and you dont need them for installation. you can always add them to your modules-load.conf OR install ofed once installation is complete.

You can always edit your network configuration at any time. the only exception to this is once you set up your clusters; then you just need to exersize care to make sure your cluster traffic is visible to the changed interfaces.

Ah - thanks for explaining.

So the installer won't detect the Mellanox cards, because the installer doesn't load the drivers by default. (But it will get detected when I boot up afterwards).

Is there any way to load them up in the installer? I'm just thinking it'll be easier to provision our machines correctly from the start, than changing the interface afterwards on every one.

Although yeah, we can change afterwards. If we do it via the GUI - once we hit save in the network GUI, I assume that will break our connectivity to the web interface. Do we then need to reboot the machine?

And yes, we will do all this before connecting all the machines in a cluster.
 
When you make a change to network interfaces in https pve screen, you will not loose connectivity .

a temporary interfaces file is made which gets activated on reboot.

so make the changes on the screen. then reboot when you are ready.

question - do your connectx-3 cards use ethernet or infinniband cables ?
 
Is there any way to load them up in the installer? I'm just thinking it'll be easier to provision our machines correctly from the start, than changing the interface afterwards on every one.

You probably could but it doesnt really serve any purpose- unless it is the only interface in your system. The installer doesnt have any provisions to do anything but your "internet facing" interface; the rest of your network will have to be defined after installation. Generally you would not use your high speed networks for internet service.
 
I was simply going to put everything through this SFP+ port, as it was 10 Gbit, and I assume better latency than the onboard Gigabit NIC.

However, you're right in that I can just use the onboard Gigabit NIC for Proxmox management.

What is the best way of assigning the Mellanox card to VMs? Do I create a new bridge using that interface, and then just assign that bridge to multiple VMs?
 
As I say I can't test it but I don't know why it should not work, because the module is known and with the mlx5_core it works.
The only think what I could is that there is a dependency on mlx4_en to another kernel module what can't satisfied.

Side note / question: Could it be that this does require even older modules?
I know, from Windows though, that up to Connect-X 3 one driver-package is needed. As of Connect-X 4 it is another.

I have not yet tried to get my cards running in Linux ...
With best regards
Thomas
 
Hi,

I am running into some kernel problems when installing the `mft` tool, seen the below:

1669538075108.png

Any ideas?
 
Hi,

I am running into some kernel problems when installing the `mft` tool, seen the below:

View attachment 43785

Any ideas?

You are running an unofficial third-party kernel; so you need to get the corresponding kernel-headers from them.

If you would use the official PVE-kernel, you can install the metapackage for them with: apt install pve-headers
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!