[TUTORIAL] Proxmox 8 Mellanox Infiniband and SR-IOV

jamesthetechie

New Member
Apr 20, 2024
29
6
3
Here is how i was able to get proxmox working with Infiniband and SR-IOV.

Hardware used is a mellanox switch (sx6036) and a mellanox Cx-4 100gbps EDR dual (or single) port card. Make sure the firmware is latest.

AS FAR AS IM AWARE, THIS WILL NOT WORK WITH OPENSM AND MUST HAVE A MELLANOX SWITCH FOR SR-IOV. on your switch, enable SM and virtualization, then restart the SM service by toggling it on or off, or rebooting the switch.
- ib sm enable
- ib sm virt enable
- configuration write

once that is configured you need to set up IOMMU and enable SRIOV on the hardware, this process will change depending on hardware so it will not be covered in this tutorial, the proxmox IOMMU config can be done by following this:
https://pve.proxmox.com/wiki/PCI(e)_Passthrough

once that is completed you will want to install the following packages:
apt install -y infiniband-diags ibutils rdma-core rdmacm-utils mstflint
check for link, nodes and run diags:
  • ibstat - MAKE NOTE OF THE HCA HERE
  • ibnodes
  • ibdiagnet

Identifying the mellanox card bus and querying it:
lspci | grep -i mellanox
mstflint -d <bus id here> q

enable SRIOV and 4 VF's (or however many you want).
mstconfig -d <bus id here> set SRIOV_EN=1 NUM_OF_VFS=4

Next steps are courtesy of Jose-d:
vim /etc/systemd/system/mellanox_initvf.service

paste in the following, MAKE SURE TO UPDATE THE HCA:
Code:
[Unit]
After=network.target

[Service]
Type=oneshot
# note: change according to your hardware:
ExecStart=/bin/bash -c "/usr/bin/echo 4 > /sys/class/infiniband/<HCA HERE>/device/sriov_numvfs"
ExecStart=/usr/local/bin/initIbGuids.sh
StandardOutput=journal
TimeoutStartSec=60
RestartSec=60

[Install]
WantedBy=multi-user.target

Now enable the service:
systemctl enable mellanox_initvf.service

Next we will create the script:
vim /usr/local/bin/initIbGuids.sh

Paste in the following:
Code:
#!/bin/bash

first_dev=$(ibstat --list_of_cas | head -n 1)

node_guid=$(ibstat ${first_dev} | grep "Node GUID" | cut -d ':' -f 2 | xargs | cut -d 'x' -f 2)
port_guid=$(ibstat ${first_dev} | grep "Port GUID" | cut -d ':' -f 2 | xargs | cut -d 'x' -f 2)

echo "first dev: $first_dev"
echo "node guid: $node_guid"
echo "port_guid: $port_guid"

if ip link show $first_dev &> /dev/null ; then
  for vf in {0..3}; do
    vf_guid=$(echo "${port_guid::-5}cafe$((vf+1))" | sed 's/..\B/&:/g')
    echo "vf_guid for vf $vf is $vf_guid"
    ip link set dev ${first_dev} vf $vf port_guid ${vf_guid}
    ip link set dev ${first_dev} vf $vf node_guid ${vf_guid}
    ip link set dev ${first_dev} vf $vf state auto
  done
fi

make sure you change permissions for the file:
chmod 777 /usr/local/bin/initIbGuids.sh

now SR-IOV is configured and ready to attach to the VM.
1714775432382.png

From the VM we can now see full link on the SR-IOV device and it is in an active state:
1714775570023.png
 
Last edited:
Thank you, this is working for me with Cx6 200gbps, two comments:

- It's working with opensm, service only needs to be restarted.
- mstconfig -d <bus id here> set SRIOV_EN=1 NUM_OF_VFS=4 (SRIOV_EN=1 is not accepted).

Why are you creating this service btw, wouldn't it not be enough to add the echo to the ibs4 up in interfaces?
 
Last edited:
Glad opensm service is working, i didnt test it since i already have a switch and couldnt find any confirmation online if it did or not.

I checked and this is working:
- mstconfig -d <bus id here> set SRIOV_EN=1 NUM_OF_VFS=4 (SRIOV_EN=1 is not accepted).
1716868240537.png

I'm using a service as opposed to running it when the interface is up due to some stability issues i had doing it that way, 1/4 reboots seemed to just not work, but when i made it a service its worked every time without issue.
 
Glad opensm service is working, i didnt test it since i already have a switch and couldnt find any confirmation online if it did or not.

I checked and this is working:

View attachment 68895

I'm using a service as opposed to running it when the interface is up due to some stability issues i had doing it that way, 1/4 reboots seemed to just not work, but when i made it a service its worked every time without issue.
Scratch that, must have been a typo, SRIOV_EN=1 is fine.....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!