[SOLVED] pve 6.3 vs. mellanox ofed

have you tried to run?
maybe the list is out of date =). It's ubuntu, it's debian 10.5 so I'd really be surprised if it doesn't work on debian 10.6
 
Ok, I have called the install script with these options:
Code:
root@pve-1:/media# ./mlnxofedinstall --skip-distro-check --skip-unsupported-devices-check
which results to this which is actually not what we want here?
Code:
Do you want to continue?[y/N]:y


Removing old packages...

Error: One or more packages depends on MLNX_OFED_LINUX.
Those packages should be removed before uninstalling MLNX_OFED_LINUX:

python-rbd glusterfs-common python-rados librados2-perl python-cephfs libiscsi7 libcephfs2 pve-manager pve-container qemu-server ceph-fuse proxmox-ve libradosstriper1 librados2 glusterfs-client ceph-common librbd1 spiceterm libpve-storage-perl pve-ha-manager pve-qemu-kvm libpve-guest-common-perl
 
Good morning, just one more and important update here: in order to get full functionality and possibilities to manipulate with unbind, bind, port, policy and node functions (driver level), one has to have mlnx-ofed-kernel-dkms_*-*-*_all.deb package installed on the pve machine (package delivered by mlnx (I grabbed the last stable version from mlnx deb repo and ignored promox vs mlnx version incompatibilities)). With help of dkms, pve-headers-some_numbers-pve and one dependency package one is able to install the package above and build the modules against current pve kernel ... after reboot one gets all functions that are needed. I chose only the package above (and not full ofed stack) to avoid potential incompatibilities in the future (one package broken is easier to troubleshoot than 50+ packages). I don't want to run full ofed client on pve level at the moment although it would probably be wise to use this network for pve cluster operations .... hope it helps someone and perhaps saves a bit of time ....
 
Last edited:
Good morning, just one more and important update here: in order to get full functionality and possibilities to manipulate with unbind, bind, port, policy and node functions (driver level), one has to have mlnx-ofed-kernel-dkms_*-*-*_all.deb package installed on the pve machine (package delivered by mlnx (I grabbed the last stable version from mlnx deb repo and ignored promox vs mlnx version incompatibilities)). With help of dkms, pve-headers-some_numbers-pve and one dependency package one is able to install the package above and build the modules against current pve kernel ... after reboot one gets all functions that are needed. I chose only the package above (and not full ofed stack) to avoid potential incompatibilities in the future (one package broken is easier to troubleshoot than 50+ packages). I don't want to run full ofed client on pve level at the moment although it would probably be wise to use this network for pve cluster operations .... hope it helps someone and perhaps saves a bit of time ....
Can you provide the dependencies?
Did you install make?

Kind regards,
Daniel
 
i try following:
Add mellanox repository
Code:
cd /etc/apt/sources.list.d/
wget https://linux.mellanox.com/public/repo/mlnx_ofed/latest/debian10.5/mellanox_mlnx_ofed.list
wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | sudo apt-key add -

apt-get remove libipathverbs1 librdmacm1 libibverbs1 libmthca1 libopenmpi-dev openmpi-bin openmpi-common openmpi-doc libmlx4-1 rdmacm-utils ibverbs-utils infiniband-diags ibutils perftest

aptitude install mlnx-ofed-basic

but had some issues with kernel-headers.
 
Hi Daniel, I have not tried via mlnx repo, I just grabbed the latest iso file, mount'ed it, cd'ed into deb dir and dpkg -i 'ed it. You will either have to enable pve enterprise repo or 'no subscription' repo and from there install proper pve-headers package (this is crucial, without that you won't be able to move ahead) ... one of my colleagues documented his procedure step by step here https://www.hpc.ntnu.no/archives/1333
 
I‘ll try it with mellanox repo this week and the over-headers. Thanks for the link. I‘ll need the connect-6 cards for ceph storage, cause the 10Gbr are limited the throughput of the 24 Nvmes.

I‘ll post my solution.
 
As I promised, here the solution (Works with connect-x6 card ECAT):
Code:
1. check if mellanox is present:
lspci | grep Mellanox

2. install pve-headers:
aptitude install pve-headers

3. reboot system
reboot

4. create mellanox repo:
cd /etc/apt/sources.list.d/
wget https://linux.mellanox.com/public/repo/mlnx_ofed/latest/debian10.5/mellanox_mlnx_ofed.list
wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | sudo apt-key add -

5. install driver:
aptitude install mlnx-ofed-basic

6. install firmwareupdater
aptitude install mlnx-fw-updater
 
As I promised, here the solution (Works with connect-x6 card ECAT):
Code:
1. check if mellanox is present:
lspci | grep Mellanox

2. install pve-headers:
aptitude install pve-headers

3. reboot system
reboot

4. create mellanox repo:
cd /etc/apt/sources.list.d/
wget https://linux.mellanox.com/public/repo/mlnx_ofed/latest/debian10.5/mellanox_mlnx_ofed.list
wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | sudo apt-key add -

5. install driver:
aptitude install mlnx-ofed-basic

6. install firmwareupdater
aptitude install mlnx-fw-updater
I was able to get to the last step, but when I try to run it I get.

Code:
Initializing...
Attempting to perform Firmware update...

The firmware for this device is not distributed inside Mellanox driver: 01:00.0 (PSID: MT_1170110023)
To obtain firmware for this device, please contact your HW vendor.

Failed to update Firmware.
See /tmp/mlnx_fw_update.log

Not sure if that is just the Firmware on the card itself, and if that is needed, but the card still isn't showing up as an interface option in ip a
Also note I am using a connectx-3 card.
Is there a way to tell if it is installed correctly, or if I just need to manually load drivers?

I want to use the card as my main network interface form vmbro
 
Last edited:
I was able to get to the last step, but when I try to run it I get.

Code:
Initializing...
Attempting to perform Firmware update...

The firmware for this device is not distributed inside Mellanox driver: 01:00.0 (PSID: MT_1170110023)
To obtain firmware for this device, please contact your HW vendor.

Failed to update Firmware.
See

Not sure if that is just the Firmware on the card itself, and if that is needed, but the card still isn't showing up as an interface option in ip a
Also note I am using a connectx-3 card
[/QUOTE]
So as I remember connectx3 cards are not supported. Can you post the log file  /tmp/mlnx_fw_update.log?
 
As I promised, here the solution (Works with connect-x6 card ECAT):
Code:
1. check if mellanox is present:
lspci | grep Mellanox

2. install pve-headers:
aptitude install pve-headers

3. reboot system
reboot

4. create mellanox repo:
cd /etc/apt/sources.list.d/
wget https://linux.mellanox.com/public/repo/mlnx_ofed/latest/debian10.5/mellanox_mlnx_ofed.list
wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | sudo apt-key add -

5. install driver:
aptitude install mlnx-ofed-basic

6. install firmwareupdater
aptitude install mlnx-fw-updater
Does this also work with the latest version of Proxmox (version 7 and up)?
Also do you know if ROCEv2 is supported by Proxmox when these drivers are installed using your method?
 
Does this also work with the latest version of Proxmox (version 7 and up)?
Also do you know if ROCEv2 is supported by Proxmox when these drivers are installed using your method?
Hello,

I used the Proxmox 7 default driver instead of original Mellanox driver, cause the buildin drivers performs better in my case (switchless 3 node cluster, Ethernet mode). Latency is nearly the same between both driver versions.

Install the latest firmware of Ubuntu 21.xxx on the cards.
 
And use routed network instead of broadcast. I had a lot of trouble with broadcast for froh if a node goes down for update or maintenance. The ceph storage freezes with slow ops.
 
Ok, just for clarity you are saying ROCEv2 is fully supported on Proxmox using built in drivers for the Mellanox ConnectX cards, correct? I saw another user saying ROCEv2 wasn't supported on Proxmox...
 
So I checked for buildin drivers, RoCev2 is not enabled, but the default driver performs for my setup better than the original Mellanox drivers. Latency is below 0.035ms and throughput roudabout 96GBps. I think that is ok. I have no issues with the virtual machines and ceph, no freezing, no delay, no slow ops.

And don't forget to run your cpu at maximum performance without powersaving, c-state, speedstep and so on to geht maximum performance of the mellanox cards.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!