I get new HP DL380 with Mellanox Connectx-6 card for CEPH, but they don't work. There are'nt recognized by Proxmox at startup. I found them:
"
root@aratua:/mnt/cdrom# lspci -v | grep Mellanox
26:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies MT28908 Family [ConnectX-6]
26:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies MT28908 Family [ConnectX-6]
root@aratua:/mnt/cdrom# dmesg | grep 26:00
[ 2.162388] pci 0000:26:00.0: [15b3:101b] type 00 class 0x020700 PCIe Endpoint
[ 2.162497] pci 0000:26:00.0: BAR 0 [mem 0x20bffc000000-0x20bffdffffff 64bit pref]
[ 2.162728] pci 0000:26:00.0: ROM [mem 0x00000000-0x000fffff pref]
[ 2.163326] pci 0000:26:00.0: PME# supported from D3cold
[ 2.164291] pci 0000:26:00.1: [15b3:101b] type 00 class 0x020700 PCIe Endpoint
[ 2.164401] pci 0000:26:00.1: BAR 0 [mem 0x20bffa000000-0x20bffbffffff 64bit pref]
[ 2.164631] pci 0000:26:00.1: ROM [mem 0x00000000-0x000fffff pref]
[ 2.165092] pci 0000:26:00.1: PME# supported from D3cold
[ 2.211995] pci 0000:26:00.0: ROM [mem 0xa7800000-0xa78fffff pref]: assigned
[ 2.211996] pci 0000:26:00.1: ROM [mem 0xa7900000-0xa79fffff pref]: assigned
[ 2.222839] pci 0000:26:00.0: Adding to iommu group 29
[ 2.222905] pci 0000:26:00.1: Adding to iommu group 30
[ 3.516254] mlx5_core 0000:26:00.0: firmware version: 20.42.1000
[ 3.516278] mlx5_core 0000:26:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[ 3.778516] mlx5_core 0000:26:00.0: Port module event: module 0, Cable plugged
[ 3.778972] mlx5_core 0000:26:00.0: mlx5_pcie_event:292
pid 11): PCIe slot power capability was not advertised.
[ 3.784185] mlx5_core 0000:26:00.0: is_dpll_supported:213
pid 407): Missing SyncE capability
[ 3.787922] mlx5_core 0000:26:00.1: firmware version: 20.42.1000
[ 3.787946] mlx5_core 0000:26:00.1: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[ 4.053897] mlx5_core 0000:26:00.1: Port module event: module 1, Cable plugged
[ 4.054112] mlx5_core 0000:26:00.1: mlx5_pcie_event:292
pid 11): PCIe slot power capability was not advertised.
[ 4.057004] mlx5_core 0000:26:00.1: is_dpll_supported:213
pid 407): Missing SyncE capability
"
But they don't exist as a device. I supose that it is necessary to install the Nvidia driver. I tryed using the Debian and the Ubuntu drivers but I get errors when installing it, using the Nvidia manual. I tryed some diferent installing commands:
"
root@aratua:/mnt/cdrom# ./mlnxofedinstall --without-dkms --add-kernel-support --without-fw-update --force
Error: The current MLNX_OFED_LINUX is intended for ubuntu24.04
root@aratua:/mnt/cdrom# ./mlnxofedinstall --without-dkms --add-kernel-support --kernel proxmox-kernel-6.8.12-4-pve-signed --without-fw-update --force
Provide path to the kernel sources for proxmox-kernel-6.8.12-4-pve-signed kernel.
root@aratua:/mnt/cdrom# ./mlnxofedinstall --skip-distro-check --without-depcheck --force
Logs dir: /tmp/MLNX_OFED_LINUX.234446.logs
General log file: /tmp/MLNX_OFED_LINUX.234446.logs/general.log
Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):
ofed-scripts
mlnx-tools
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
iser-dkms
isert-dkms
srp-dkms
rdma-core
libibverbs1
ibverbs-utils
ibverbs-providers
libibverbs-dev
libibverbs1-dbg
libibumad3
libibumad-dev
ibacm
librdmacm1
rdmacm-utils
librdmacm-dev
ibdump
libibmad5
libibmad-dev
libopensm
opensm
opensm-doc
libopensm-devel
libibnetdisc5
infiniband-diags
mft
kernel-mft-dkms
perftest
ibutils2
ibsim
ibsim-doc
ucx
sharp
hcoll
knem-dkms
knem
openmpi
mpitests
xpmem-dkms
xpmem
libxpmem0
libxpmem-dev
dpcp
srptools
mlnx-ethtool
mlnx-iproute2
rshim
ibarr
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.
Removing old packages...
Failed command: apt-get remove -y librdmacm1 libibverbs1 proxmox-ve pve-manager spiceterm qemu-server pve-ha-manager pve-container pve-qemu-kvm libpve-guest-common-perl libpve-storage-perl ceph-common python3-rgw python3-cephfs python3-rados librgw2 python3-rbd librbd1 libradosstriper1 librados2-perl libcephfs2 ceph-fuse librados2 libiscsi7
See /tmp/MLNX_OFED_LINUX.234446.logs/general.logroot@aratua:/mnt/cdrom#
"
Has anyone get Mellanox ConnectX-6 working? How you do that??
Thanks.
Xavier.
"
root@aratua:/mnt/cdrom# lspci -v | grep Mellanox
26:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies MT28908 Family [ConnectX-6]
26:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies MT28908 Family [ConnectX-6]
root@aratua:/mnt/cdrom# dmesg | grep 26:00
[ 2.162388] pci 0000:26:00.0: [15b3:101b] type 00 class 0x020700 PCIe Endpoint
[ 2.162497] pci 0000:26:00.0: BAR 0 [mem 0x20bffc000000-0x20bffdffffff 64bit pref]
[ 2.162728] pci 0000:26:00.0: ROM [mem 0x00000000-0x000fffff pref]
[ 2.163326] pci 0000:26:00.0: PME# supported from D3cold
[ 2.164291] pci 0000:26:00.1: [15b3:101b] type 00 class 0x020700 PCIe Endpoint
[ 2.164401] pci 0000:26:00.1: BAR 0 [mem 0x20bffa000000-0x20bffbffffff 64bit pref]
[ 2.164631] pci 0000:26:00.1: ROM [mem 0x00000000-0x000fffff pref]
[ 2.165092] pci 0000:26:00.1: PME# supported from D3cold
[ 2.211995] pci 0000:26:00.0: ROM [mem 0xa7800000-0xa78fffff pref]: assigned
[ 2.211996] pci 0000:26:00.1: ROM [mem 0xa7900000-0xa79fffff pref]: assigned
[ 2.222839] pci 0000:26:00.0: Adding to iommu group 29
[ 2.222905] pci 0000:26:00.1: Adding to iommu group 30
[ 3.516254] mlx5_core 0000:26:00.0: firmware version: 20.42.1000
[ 3.516278] mlx5_core 0000:26:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[ 3.778516] mlx5_core 0000:26:00.0: Port module event: module 0, Cable plugged
[ 3.778972] mlx5_core 0000:26:00.0: mlx5_pcie_event:292
[ 3.784185] mlx5_core 0000:26:00.0: is_dpll_supported:213
[ 3.787922] mlx5_core 0000:26:00.1: firmware version: 20.42.1000
[ 3.787946] mlx5_core 0000:26:00.1: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[ 4.053897] mlx5_core 0000:26:00.1: Port module event: module 1, Cable plugged
[ 4.054112] mlx5_core 0000:26:00.1: mlx5_pcie_event:292
[ 4.057004] mlx5_core 0000:26:00.1: is_dpll_supported:213
"
But they don't exist as a device. I supose that it is necessary to install the Nvidia driver. I tryed using the Debian and the Ubuntu drivers but I get errors when installing it, using the Nvidia manual. I tryed some diferent installing commands:
"
root@aratua:/mnt/cdrom# ./mlnxofedinstall --without-dkms --add-kernel-support --without-fw-update --force
Error: The current MLNX_OFED_LINUX is intended for ubuntu24.04
root@aratua:/mnt/cdrom# ./mlnxofedinstall --without-dkms --add-kernel-support --kernel proxmox-kernel-6.8.12-4-pve-signed --without-fw-update --force
Provide path to the kernel sources for proxmox-kernel-6.8.12-4-pve-signed kernel.
root@aratua:/mnt/cdrom# ./mlnxofedinstall --skip-distro-check --without-depcheck --force
Logs dir: /tmp/MLNX_OFED_LINUX.234446.logs
General log file: /tmp/MLNX_OFED_LINUX.234446.logs/general.log
Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):
ofed-scripts
mlnx-tools
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
iser-dkms
isert-dkms
srp-dkms
rdma-core
libibverbs1
ibverbs-utils
ibverbs-providers
libibverbs-dev
libibverbs1-dbg
libibumad3
libibumad-dev
ibacm
librdmacm1
rdmacm-utils
librdmacm-dev
ibdump
libibmad5
libibmad-dev
libopensm
opensm
opensm-doc
libopensm-devel
libibnetdisc5
infiniband-diags
mft
kernel-mft-dkms
perftest
ibutils2
ibsim
ibsim-doc
ucx
sharp
hcoll
knem-dkms
knem
openmpi
mpitests
xpmem-dkms
xpmem
libxpmem0
libxpmem-dev
dpcp
srptools
mlnx-ethtool
mlnx-iproute2
rshim
ibarr
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.
Removing old packages...
Failed command: apt-get remove -y librdmacm1 libibverbs1 proxmox-ve pve-manager spiceterm qemu-server pve-ha-manager pve-container pve-qemu-kvm libpve-guest-common-perl libpve-storage-perl ceph-common python3-rgw python3-cephfs python3-rados librgw2 python3-rbd librbd1 libradosstriper1 librados2-perl libcephfs2 ceph-fuse librados2 libiscsi7
See /tmp/MLNX_OFED_LINUX.234446.logs/general.logroot@aratua:/mnt/cdrom#
"
Has anyone get Mellanox ConnectX-6 working? How you do that??
Thanks.
Xavier.