help(?!) building OFED drivers

Republicus

Well-Known Member
Aug 7, 2017
137
22
58
41
I am trying to build OFED drivers from source for my ConnectX3 Pro cards so that I can make one port 40/56 Gb Inifiniband and the second port as 40 Gb Ethernet.
This configuration requires OFED drivers as far as I understand. mlx4_core and mlx4_en wont work simultaneously that I am aware of - -Mellanox website indicates OFED is necessary.

I am following the instructions from https://insujang.github.io/2020-01-25/building-mellanox-ofed-from-source/ to repackage the latest deb packages

When ready to install the deb file I am getting a couple errors:

Code:
root@node05:~# dpkg -i mlnx-ofed-kernel-dkms_4.9-OFED.4.9.5.1.0.1_all.deb
(Reading database ... 220872 files and directories currently installed.)
Preparing to unpack mlnx-ofed-kernel-dkms_4.9-OFED.4.9.5.1.0.1_all.deb ...

------------------------------
Deleting module version: 4.9
completely from the DKMS tree.
------------------------------
Done.
Cleaning up '/usr/src/ofa_kernel/' ...
Unpacking mlnx-ofed-kernel-dkms (4.9-OFED.4.9.5.1.0.1) over (4.9-OFED.4.9.5.1.0.1) ...
Setting up mlnx-ofed-kernel-dkms (4.9-OFED.4.9.5.1.0.1) ...
Loading new mlnx-ofed-kernel-4.9 DKMS files...
First Installation: checking all kernels...
Building only for 5.15.35-3-pve
Building for architecture x86_64
Building initial module for 5.15.35-3-pve
Error! Bad return status for module build on kernel: 5.15.35-3-pve (x86_64)
Consult /var/lib/dkms/mlnx-ofed-kernel/4.9/build/make.log for more information.
dpkg: error processing package mlnx-ofed-kernel-dkms (--install):
 installed mlnx-ofed-kernel-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
 mlnx-ofed-kernel-dkms

root@node05:~# cat /var/lib/dkms/mlnx-ofed-kernel/4.9/build/make.log

This error appears multiple times:
tac: failed to open '/lib/modules/5.15.35-3-pve/build/include/*/autoconf.h' for reading: No such file or directory

It moves along until it hits these snags:
Code:
configure: error: Run make config in /usr/src/linux-headers-5.15.35-3-pve.

Failed executing ./configure/bin/sh: 1: Syntax error: Unterminated quoted string
/bin/sh: 1: [: -lt: unexpected operator
tac: failed to open '/lib/modules/5.15.35-3-pve/build/include/*/autoconf.h' for reading: No such file or directory
tac: failed to open '/lib/modules/5.15.35-3-pve/build/include/*/autoconf.h' for reading: No such file or directory
tac: failed to open '/lib/modules/5.15.35-3-pve/build/include/*/autoconf.h' for reading: No such file or directory
/bin/sh: 1: Syntax error: Unterminated quoted string
/bin/sh: 1: [: -lt: unexpected operator

...

Code:
make[1]: Entering directory '/usr/src/linux-headers-5.15.35-3-pve'

  ERROR: Kernel configuration is invalid.
         include/generated/autoconf.h or include/config/auto.conf are missing.
         Run 'make oldconfig && make prepare' on kernel src to fix it.

...

Code:
Copying build sources from '/var/lib/dkms/mlnx-ofed-kernel/4.9/build/../build' to '/usr/src/ofa_kernel/5.15.35-3-pve' ...
/bin/cp: cannot stat 'Module*.symvers': No such file or directory


I tried running make oldconfig && make prepare in both /usr/src/ofa_kernel/5.15.35-3-pve

Code:
root@node05:/usr/src/ofa_kernel/5.15.35-3-pve# make oldconfig && make prepare
make: *** No rule to make target 'oldconfig'.  Stop.

and in /usr/src/linux-headers-5.15.35-3-pve

Code:
root@node05:/usr/src/linux-headers-5.15.35-3-pve# make oldconfig && make prepare
lib/Kconfig.debug:2641: can't open file "Documentation/Kconfig"
make[1]: *** [scripts/kconfig/Makefile:77: oldconfig] Error 1
make: *** [Makefile:622: oldconfig] Error 2

I also reinstalled linux-headers.

Anyone have some suggestions?
Thanks for reading!
 
Same issue here.

Trying to update these old drivers (mlx4_core in my case) proves to be extremely difficult or even impossible on Proxmox 7.2-11 (kernel 5.15.60-2-pve). I have tried all sorts of tricks and tutorials, but I always have a dependency issue at minimum.

The side effects of using the 4.0.0 old driver module are multiple :
-Windows VM cannot provision and install properly the corresponding VF driver even though the VF device is properly passed through and seen by WIndows device manager.
-I have some strange routing limitations trying to communicate from a VM using the physical interface through a basic Linux bridge, and a VM using a VF device based on the same physical interface. I have to run a specific hookscript at lauch of the VMs if I want them to be able communicate on the same subnet internaly to the host.
-This script being only instanciated when starting the VM from scratch, it fails to allow communication when using HA between several cluster nodes as it is not called and actuated when migrating VM from one node to the other. There may be some Proxmox trick to solve this but I haven't searched yet.
-Let's be clear that this routing issue may not be corrected in newer versions of the driver module as it may well be a "feature"... But new versions do seem to bring improvements to the Windows VM compatibility issue.

This is quite frustrating. If someone has the magic recipe for upgrading this driver that would be great.
 
I stumbled upon this thread a few weeks ago as I was running into similar issues, figured I'd share with the both of you that I've had some success using NVIDIA's Mellanox repo and version 23.04 of the MLNX_OFED drivers with PVE 7.4 and kernel 5.15.107-2. DKMS builds perfectly for me - a first in my last several weeks of trying. Not super useful if you still run ConnectX-3 or earlier cards, but if you're on ConnectX-4+ you may have some mileage.

https://linux.mellanox.com/public/repo/mlnx_ofed/23.04-0.5.3.3/
 
I briefly attempted to patch the DKMS driver to support higher kernel versions. I made more progress than my original post...
I made some modifications that was said to work with Proxmox and kernel 5.10.

You can have a look at what I've done at MLNX_OFED 4.9-4.1.7.0 LTS for Debian 11

I could use some help to make it work if anyone has any experience or wants to spin up a VM to help compile a working driver.

Additionally,
I have wondered if I can make two cards work. One where both ports are using inbox drivers for Infiniband (mlx4_ib) and the second card to use inbox Ethernet driver (mlx4_eth). I have no idea if this will work since the inbox drivers prefer one or the other and requires OFED for VPI or both fprotocols-- but will they both work if on different devices????
 
I briefly attempted to patch the DKMS driver to support higher kernel versions. I made more progress than my original post...
I made some modifications that was said to work with Proxmox and kernel 5.10.

You can have a look at what I've done at MLNX_OFED 4.9-4.1.7.0 LTS for Debian 11

I could use some help to make it work if anyone has any experience or wants to spin up a VM to help compile a working driver.

Additionally,
I have wondered if I can make two cards work. One where both ports are using inbox drivers for Infiniband (mlx4_ib) and the second card to use inbox Ethernet driver (mlx4_eth). I have no idea if this will work since the inbox drivers prefer one or the other and requires OFED for VPI or both fprotocols-- but will they both work if on different devices????
Any luck? I'm also looking for working deb package but couldn't find any working solution, so far I'm using kernel builtin drivers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!