Proxmox VE 8 and mlx5_core link is always down but works on Ubuntu 22.04

Kdml Ctrl

New Member
Aug 16, 2023
3
0
1
We've got multiple Mellanox MT27710 ConnectX-4 Lx for proxmox nodes and DB nodes (Ubuntu/DRBD)

Now we are trying to connect everything, same cards, same SFP modules, same switch on the other side.

On PVE cards are detected, config are readable and writeable, but links are permanently down and nothing helps.
In the same time it works in Ubuntu (5.15.0-78) flawlessly.

We tried:
* Used cables and SFPs from Ubuntu servers. No luck
* Compared config by mlxconfig. Identical
* Turned on debug on modules. Too much noise, nothing usable
* Install melanox en module. Outdated and only for Debian 11.3 which is not the case already.

It registers, and marks the link down (see dmesg below). It happens on every PVE node, so hardly a hardware fault.

I'm asking for any directions to investigate, please. Stuck otherwise.

The only thing that bugs me is ethtool output. On PVE it shows zero recieve. Ubuntu works with the same hardware but debug is turned off, so I can't see the values:

Code:
# ethtool -m ens22f0np0
    Identifier                                : 0x03 (SFP)
    Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
    Connector                                 : 0x07 (LC)
    Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    Transceiver type                          : 10G Ethernet: 10G Base-SR
    Encoding                                  : 0x06 (64B/66B)
    BR, Nominal                               : 10300MBd
    Rate identifier                           : 0x00 (unspecified)
    Length (SMF,km)                           : 0km
    Length (SMF)                              : 0m
    Length (50um)                             : 80m
    Length (62.5um)                           : 20m
    Length (Copper)                           : 0m
    Length (OM3)                              : 300m
    Laser wavelength                          : 850nm
    Vendor name                               : LR-LINK
    Vendor OUI                                : 00:02:c9
    Vendor PN                                 : LRXP8510-X3ATL
    Vendor rev                                : B4
    Option values                             : 0x00 0x1a
    Option                                    : RX_LOS implemented
    Option                                    : TX_FAULT implemented
    Option                                    : TX_DISABLE implemented
    BR margin, max                            : 0%
    BR margin, min                            : 0%
    Vendor SN                                 : L224400296
    Date code                                 : 221027
    Optical diagnostics support               : Yes
    Laser bias current                        : 5.976 mA
    Laser output power                        : 0.5475 mW / -2.62 dBm
    Receiver signal average optical power     : 0.0001 mW / -40.00 dBm
    Module temperature                        : 29.52 degrees C / 85.14 degrees F
    Module voltage                            : 3.2883 V
    Alarm/warning flags implemented           : Yes


Here are the mandatory listings:

Code:
# lspci | grep Mel
5e:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
5e:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Code:
# dmesg | grep mlx
[    6.391171] mlx5_core 0000:5e:00.0: firmware version: 14.32.1010
[    6.391201] mlx5_core 0000:5e:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[    6.682464] mlx5_core 0000:5e:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[    6.687294] mlx5_core 0000:5e:00.0: Port module event: module 0, Cable plugged
[    7.057387] mlx5_core 0000:5e:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    7.066697] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[    7.068276] mlx5_core 0000:5e:00.1: firmware version: 14.32.1010
[    7.068330] mlx5_core 0000:5e:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[    7.387117] mlx5_core 0000:5e:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[    7.393190] mlx5_core 0000:5e:00.1: Port module event: module 1, Cable plugged
[    7.731082] mlx5_core 0000:5e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    7.744337] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[    7.747024] mlx5_core 0000:5e:00.1 ens22f1np1: renamed from eth1
[    7.772209] mlx5_core 0000:5e:00.0 ens22f0np0: renamed from eth0
[   12.638481] mlx5_core 0000:5e:00.0 ens22f0np0: Link down

Code:
# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:c2:a6:b1 brd ff:ff:ff:ff:ff:ff
    altname enp28s0f0
3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:c2:a6:b2 brd ff:ff:ff:ff:ff:ff
    altname enp28s0f1
4: ens1f2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff
    altname enp28s0f2
5: ens1f3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff permaddr a8:a1:59:c2:a6:b4
    altname enp28s0f3
6: ens22f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether e8:eb:d3:5e:12:70 brd ff:ff:ff:ff:ff:ff
    altname enp94s0f0np0
7: ens22f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether e8:eb:d3:5e:12:71 brd ff:ff:ff:ff:ff:ff
    altname enp94s0f1np1
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff
9: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff

Code:
# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-8-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
proxmox-kernel-6.2: 6.2.16-8
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
Last edited:
Sorted it out. When a hardware says there is no carrier, it is no carrier. Had to reconnect everything from the scratch and found a commutation error.

Please delete the thread to not misguide anyone searching for a solution to the real problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!