Conectx-3 Card not showing up in network devices

beraval

New Member
Jun 28, 2022
2
0
1
I recently purchased 2 Mellanox CX354As to install in my 2 node proxmox cluster. I installed them and the first one worked fine and the port showed up in network devices after attaching it to my switch. However, the second card in another server is not appearing in network devices. I tried installing mst tools and manually switching the port to ethernet mode but mlxconfig cannot query the card. As shown below.
root@pve1:~# mlxconfig -d /dev/mst/mt4099_pciconf0 q -E- Failed to open device: /dev/mst/mt4099_pciconf0. Cannot perform operation, Driver might be down.

I did some digging and I believe the issue is that the card does not have a kernal driver associated with it as seen bellow.
root@pve1:~# lspci -k 04:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] Kernel modules: mlx4_core 05:00.0 VGA compatible controller:
This is the output for the same command on the server with the working card.
root@pve0:~# lspci -k 04:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] Kernel driver in use: mlx4_core Kernel modules: mlx4_core

This is the output for mlxup which is identical for both the working and not working cards.
root@pve1:~# ./mlxup Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX3 Part Number: MCX354A-FCB_A2-A5 Description: ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6 PSID: MT_1090120019 PCI Device Name: 0000:04:00.0 Port1 MAC: 0010e088fdb5 Port2 MAC: 0010e088fdb6 Versions: Current Available FW 2.42.5000 2.42.5000 PXE 3.4.0752 3.4.0752 Status: Up to date

Does anyone know if the missing kernel driver is the cause of my issues and how to fix the issue?
 
Proxmox VE has mlx4 driver out of the box. Perhaps there are some issues with hardware. Take a look at output of dmesg | grep mlx

Check with lsmod | grep mlx that mlx4_core, mlx4_en, mlx4_ib modules are loaded.
You can try to unload the driver with modprobe -r mlx4_core and load it with modprobe -v mlx4_core again.
 
So I ran the lsmod commands and only core was loaded not en or ib. I noticed the kernel versions was ahead on the not working server so I booted into the same version that was running on the working server and it worked. simply removing core and addig it back aswell as en and ib did not work on its own.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!