Proxmox - ConnectX5 - Only one port (network device) showing

btyeung

New Member
Oct 15, 2020
11
0
1
44
Hi,

I have the following card (ConnectX-5 Ex VPI Adapter Card), and while having two ports, only one is shown as active, also it shows that the mlx5_core driver only loads for one of the PCI devices (both are showing properly). The below is more detail after running:
mlxconfig -d /dev/mst/mt4121_pciconf0 q

Code:
Device type:    ConnectX5
Name:           7359059_MCX556A-EDAS_C14_XD_Ax
Description:    ConnectX-5 Ex VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16; tall bracket; ROHS R6
Device:         /dev/mst/mt4121_pciconf0

Question:
Is there a configuration file I'm missing or some additional setup necessary so that BOTH network ports have the mlx5_core kernel driver loaded? Thanks in advance!!

See below for output from lspci -nnk

Code:
46:00.0 Ethernet controller [0200]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019]
    Subsystem: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:0016]
    Kernel modules: mlx5_core
46:00.1 Ethernet controller [0200]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019]
    Subsystem: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:0016]
    Kernel driver in use: mlx5_core
    Kernel modules: mlx5_core

Output from mst status:

Code:
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded


MST devices:
------------
/dev/mst/mt4121_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:46:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
 
What's the output of ip addr? And the obligatory question when it comes to Mellanox NICs, do you have the latest firmware installed? :)
 
What's the output of ip addr? And the obligatory question when it comes to Mellanox NICs, do you have the latest firmware installed? :)

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: enp70s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN group default qlen 1000
    link/ether 50:6b:4b:cb:bf:87 brd ff:ff:ff:ff:ff:ff
    inet 15.15.15.5/24 scope global enp70s0f1
       valid_lft forever preferred_lft forever
4: enp66s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether d0:50:99:db:89:2f brd ff:ff:ff:ff:ff:ff
5: enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether d0:50:99:db:89:30 brd ff:ff:ff:ff:ff:ff
8: tap100i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0v111 state UNKNOWN group default qlen 1000
    link/ether 82:6f:1d:93:f7:9b brd ff:ff:ff:ff:ff:ff
9: vmbr0v111: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d0:50:99:db:89:30 brd ff:ff:ff:ff:ff:ff
10: enp66s0f1.111@enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0v111 state UP group default qlen 1000
    link/ether d0:50:99:db:89:30 brd ff:ff:ff:ff:ff:ff
11: tap101i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0v111 state UNKNOWN group default qlen 1000
    link/ether f2:41:4b:87:ed:3b brd ff:ff:ff:ff:ff:ff
12: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d0:50:99:db:89:30 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.74/24 scope global vmbr0
       valid_lft forever preferred_lft forever

On the above enp70s0f1 is the network adapter... I was expecting two to come online similar to the built in 10g NIC I have. On firmware, unsure if I have latest (didn't know if it's that critical for operations here).

Thanks in advance!
 
Some background, I installed the Debian version 10.5 basic drivers from the Mellanox site (via aptitude). It appears to talk to the device properly, but missing a port.
 
46:00.0 Ethernet controller -> 46 hex is 70 in decimal. That's how one can easily determine which NIC is which PCI device, as long as they are following the enpXXX naming scheme.
But yeah, there should be a enp70s0f0 as well. Anything in the output of dmesg that might show something?
Some background, I installed the Debian version 10.5 basic drivers from the Mellanox site (via aptitude). It appears to talk to the device properly, but missing a port.
Any reason? What PVE version and kernel do you use?

It could be possible, that the kernel and extra driver might not work that well. As a fun fact, I just installed a fresh PVE 7 on machines with ConnectX-4 cards and I see both ports nicely.
 
You got me inspired to install PVE 7. What is the right way to clear out the installed drivers and then upgrade to PVE 7? Or will the new built in Kernel drivers just supercede anything there??

Let me also check dmsg

Thanks and appreciate the help!
BY
 
@aaron Here is the dmesg output. I tried to blacklist the fm10k driver (but still shows up for some reason...blacklisted uio and fm10k...LMK if this is the right approach)

Code:
[    2.394015] pci 0000:01:00.0: [15b3:1019] type 00 class 0x020000
[    2.394252] pci 0000:01:00.0: reg 0x10: [mem 0x38000000000-0x3807fffffff 64bit pref]
[    2.394458] pci 0000:01:00.0: reg 0x30: [mem 0xfd300000-0xfd3fffff pref]
[    2.395209] pci 0000:01:00.0: PME# supported from D3cold
[    2.395463] pci 0000:01:00.0: reg 0x1a4: [mem 0x00000000-0x7fffffff 64bit pref]
[    2.395464] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0xbffffffff 64bit pref] (contains BAR0 for 24 VFs)
[    2.396737] pci 0000:01:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16 GT/s x8 link at 0000:00:01.1 (capable of 252.048 Gb/s with 16 GT/s x16 link)
[    2.396857] pci 0000:01:00.1: [15b3:1019] type 00 class 0x020000
[    2.397058] pci 0000:01:00.1: reg 0x10: [mem 0x37f80000000-0x37fffffffff 64bit pref]
[    2.397261] pci 0000:01:00.1: reg 0x30: [mem 0xfd200000-0xfd2fffff pref]
[    2.397878] pci 0000:01:00.1: PME# supported from D3cold
[    2.398112] pci 0000:01:00.1: reg 0x1a4: [mem 0x00000000-0x7fffffff 64bit pref]
[    2.398113] pci 0000:01:00.1: VF(n) BAR0 space: [mem 0x00000000-0xbffffffff 64bit pref] (contains BAR0 for 24 VFs)
[    2.418647] pnp 00:00: disabling [mem 0xe0000000-0xefffffff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.418648] pnp 00:00: disabling [mem 0xe0000000-0xefffffff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419963] pnp 00:05: disabling [mem 0xfec00000-0xfec00fff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419965] pnp 00:05: disabling [mem 0xfedc0000-0xfedc0fff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419965] pnp 00:05: disabling [mem 0xfee00000-0xfee00fff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419966] pnp 00:05: disabling [mem 0xfed80000-0xfed814ff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419967] pnp 00:05: disabling [mem 0xfed81900-0xfed8ffff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419968] pnp 00:05: disabling [mem 0xfec10000-0xfec10fff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419969] pnp 00:05: disabling [mem 0xff000000-0xffffffff] because it overlaps 0000:01:00.0 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419971] pnp 00:05: disabling [mem 0xfec00000-0xfec00fff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419972] pnp 00:05: disabling [mem 0xfedc0000-0xfedc0fff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419973] pnp 00:05: disabling [mem 0xfee00000-0xfee00fff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419974] pnp 00:05: disabling [mem 0xfed80000-0xfed814ff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419975] pnp 00:05: disabling [mem 0xfed81900-0xfed8ffff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419976] pnp 00:05: disabling [mem 0xfec10000-0xfec10fff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.419977] pnp 00:05: disabling [mem 0xff000000-0xffffffff disabled] because it overlaps 0000:01:00.1 BAR 7 [mem 0x00000000-0xbffffffff 64bit pref]
[    2.426309] pci 0000:01:00.0: BAR 7: no space for [mem size 0xc00000000 64bit pref]
[    2.426310] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0xc00000000 64bit pref]
[    2.426311] pci 0000:01:00.1: BAR 7: no space for [mem size 0xc00000000 64bit pref]
[    2.426312] pci 0000:01:00.1: BAR 7: failed to assign [mem size 0xc00000000 64bit pref]
[    2.426376] pci 0000:01:00.0: BAR 0: assigned [mem 0x28100000000-0x2817fffffff 64bit pref]
[    2.426437] pci 0000:01:00.0: BAR 7: assigned [mem 0x28180000000-0x28d7fffffff 64bit pref]
[    2.426470] pci 0000:01:00.1: BAR 0: assigned [mem 0x28d80000000-0x28dffffffff 64bit pref]
[    2.426532] pci 0000:01:00.1: BAR 7: assigned [mem 0x28e00000000-0x299ffffffff 64bit pref]
[    3.069313] pci 0000:01:00.0: Adding to iommu group 78
[    3.069654] pci 0000:01:00.1: Adding to iommu group 79
[    3.682910] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)
[    3.683639] mlx5_core 0000:01:00.0: firmware version: 16.27.1900
[    3.683678] mlx5_core 0000:01:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16 GT/s x8 link at 0000:00:01.1 (capable of 252.048 Gb/s with 16 GT/s x16 link)
[    3.978545] mlx5_core 0000:01:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    3.979652] mlx5_core 0000:01:00.0: E-Switch: total vports 26, per vport: max uc(1024) max mc(16384)
[    3.982026] mlx5_core 0000:01:00.0: Port module event: module 0, Cable unplugged
[    3.982286] mlx5_core 0000:01:00.0: mlx5_pcie_event:306:(pid 462): PCIe slot advertised sufficient power (75W).
[    3.989629] mlx5_core 0000:01:00.0: mlx5_fw_tracer_start:817:(pid 393): FWTracer: Ownership granted and active
[    3.994568] mlx5_core 0000:01:00.1: enabling device (0000 -> 0002)
[    3.995505] mlx5_core 0000:01:00.1: firmware version: 16.27.1900
[    3.995564] mlx5_core 0000:01:00.1: 126.024 Gb/s available PCIe bandwidth, limited by 16 GT/s x8 link at 0000:00:01.1 (capable of 252.048 Gb/s with 16 GT/s x16 link)
[    4.262548] mlx5_core 0000:01:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    4.263710] mlx5_core 0000:01:00.1: E-Switch: total vports 26, per vport: max uc(1024) max mc(16384)
[    4.266326] mlx5_core 0000:01:00.1: Port module event: module 1, Cable unplugged
[    4.266506] mlx5_core 0000:01:00.1: mlx5_pcie_event:306:(pid 440): PCIe slot advertised sufficient power (75W).
[    4.280869] mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[    4.451582] mlx5_core 0000:01:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    4.460925] mlx5_core 0000:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[    4.639179] mlx5_core 0000:01:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    4.648965] mlx5_core 0000:01:00.0 enp1s0f0np0: renamed from eth0
[    4.667872] mlx5_core 0000:01:00.1 enp1s0f1np1: renamed from eth1
[   17.949242] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[   17.949423] ipmi_si IPI0001:00: ipmi_platform: [io  0x0ca2] regsize 1 spacing 1 irq 0
[   18.045344] mlx5_core 0000:01:00.0: alloc_uars_page:131:(pid 2247): mlx5_cmd_alloc_uar() failed, -5
[   18.109515] mlx5_core 0000:01:00.0 enp1s0f0: renamed from enp1s0f0np0
[   18.164088] mlx5_core 0000:01:00.1 enp1s0f1: renamed from enp1s0f1np1
[   18.345242] ipmi_si IPI0001:00: Error clearing flags: ff
[   18.364375] ipmi_si IPI0001:00: IPMI message handler: The GUID response from the BMC was too short, it was 15 but should have been 17.  Assuming GUID is not available.
[   18.375680] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x00c1d6, prod_id: 0x0202, dev_id: 0x20)
[   18.461240] ipmi_si IPI0001:00: IPMI kcs interface initialized
[   18.619808] mlx5_core 0000:01:00.0: mlx5_wait_for_pages:774:(pid 2304): Skipping wait for vf pages stage
[   18.713515] mlx5_core 0000:01:00.1: mlx5_fw_tracer_start:817:(pid 520): FWTracer: Ownership granted and active
[   20.219971] mlx5_core 0000:01:00.0: E-Switch: cleanup
[   20.314739] fm10k 0000:01:00.0: reset_hw failed: -7
[   20.315370] fm10k: probe of 0000:01:00.0 failed with error -7
[   31.349824] mlx5_core 0000:01:00.1 enp1s0f1: Link down
 
You got me inspired to install PVE 7. What is the right way to clear out the installed drivers and then upgrade to PVE 7? Or will the new built in Kernel drivers just supercede anything there??
Removing the drivers would be done by removing the package you installed with apt remove <package> and for good measure apt purge <package>.

You can also install a more recent 5.11 kernel in PVE 6 by installing the pve-kernel-5.11 package. The upgrade guide to PVE 7 is located in the PVE wiki: https://intranet.proxmox.com/index.php/PVE_Upgrade_from_6.x_to_7.0


Did the mellanox card move to PCI address 01 now? That is the only one I see in that DMESG output, not at 46 anymore.
You could also try to run ip link selt dev enp1s0f1 up

Other than that, I would also download the mlxup utility from Mellanox to get the NICs updated with the latest firmware. It might help, it might not help with this particular problem, but having a current firmware installed is definitely a good thing.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!