We've got multiple Mellanox MT27710 ConnectX-4 Lx for proxmox nodes and DB nodes (Ubuntu/DRBD)
Now we are trying to connect everything, same cards, same SFP modules, same switch on the other side.
On PVE cards are detected, config are readable and writeable, but links are permanently down and nothing helps.
In the same time it works in Ubuntu (5.15.0-78) flawlessly.
We tried:
* Used cables and SFPs from Ubuntu servers. No luck
* Compared config by mlxconfig. Identical
* Turned on debug on modules. Too much noise, nothing usable
* Install melanox en module. Outdated and only for Debian 11.3 which is not the case already.
It registers, and marks the link down (see dmesg below). It happens on every PVE node, so hardly a hardware fault.
I'm asking for any directions to investigate, please. Stuck otherwise.
The only thing that bugs me is ethtool output. On PVE it shows zero recieve. Ubuntu works with the same hardware but debug is turned off, so I can't see the values:
Here are the mandatory listings:
Now we are trying to connect everything, same cards, same SFP modules, same switch on the other side.
On PVE cards are detected, config are readable and writeable, but links are permanently down and nothing helps.
In the same time it works in Ubuntu (5.15.0-78) flawlessly.
We tried:
* Used cables and SFPs from Ubuntu servers. No luck
* Compared config by mlxconfig. Identical
* Turned on debug on modules. Too much noise, nothing usable
* Install melanox en module. Outdated and only for Debian 11.3 which is not the case already.
It registers, and marks the link down (see dmesg below). It happens on every PVE node, so hardly a hardware fault.
I'm asking for any directions to investigate, please. Stuck otherwise.
The only thing that bugs me is ethtool output. On PVE it shows zero recieve. Ubuntu works with the same hardware but debug is turned off, so I can't see the values:
Code:
# ethtool -m ens22f0np0
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 10G Ethernet: 10G Base-SR
Encoding : 0x06 (64B/66B)
BR, Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 0km
Length (SMF) : 0m
Length (50um) : 80m
Length (62.5um) : 20m
Length (Copper) : 0m
Length (OM3) : 300m
Laser wavelength : 850nm
Vendor name : LR-LINK
Vendor OUI : 00:02:c9
Vendor PN : LRXP8510-X3ATL
Vendor rev : B4
Option values : 0x00 0x1a
Option : RX_LOS implemented
Option : TX_FAULT implemented
Option : TX_DISABLE implemented
BR margin, max : 0%
BR margin, min : 0%
Vendor SN : L224400296
Date code : 221027
Optical diagnostics support : Yes
Laser bias current : 5.976 mA
Laser output power : 0.5475 mW / -2.62 dBm
Receiver signal average optical power : 0.0001 mW / -40.00 dBm
Module temperature : 29.52 degrees C / 85.14 degrees F
Module voltage : 3.2883 V
Alarm/warning flags implemented : Yes
Here are the mandatory listings:
Code:
# lspci | grep Mel
5e:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
5e:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
Code:
# dmesg | grep mlx
[ 6.391171] mlx5_core 0000:5e:00.0: firmware version: 14.32.1010
[ 6.391201] mlx5_core 0000:5e:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 6.682464] mlx5_core 0000:5e:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 6.687294] mlx5_core 0000:5e:00.0: Port module event: module 0, Cable plugged
[ 7.057387] mlx5_core 0000:5e:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 7.066697] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[ 7.068276] mlx5_core 0000:5e:00.1: firmware version: 14.32.1010
[ 7.068330] mlx5_core 0000:5e:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 7.387117] mlx5_core 0000:5e:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 7.393190] mlx5_core 0000:5e:00.1: Port module event: module 1, Cable plugged
[ 7.731082] mlx5_core 0000:5e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 7.744337] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[ 7.747024] mlx5_core 0000:5e:00.1 ens22f1np1: renamed from eth1
[ 7.772209] mlx5_core 0000:5e:00.0 ens22f0np0: renamed from eth0
[ 12.638481] mlx5_core 0000:5e:00.0 ens22f0np0: Link down
Code:
# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether a8:a1:59:c2:a6:b1 brd ff:ff:ff:ff:ff:ff
altname enp28s0f0
3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether a8:a1:59:c2:a6:b2 brd ff:ff:ff:ff:ff:ff
altname enp28s0f1
4: ens1f2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff
altname enp28s0f2
5: ens1f3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff permaddr a8:a1:59:c2:a6:b4
altname enp28s0f3
6: ens22f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether e8:eb:d3:5e:12:70 brd ff:ff:ff:ff:ff:ff
altname enp94s0f0np0
7: ens22f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether e8:eb:d3:5e:12:71 brd ff:ff:ff:ff:ff:ff
altname enp94s0f1np1
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP mode DEFAULT group default qlen 1000
link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff
9: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a8:a1:59:c2:a6:b3 brd ff:ff:ff:ff:ff:ff
Code:
# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-8-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
proxmox-kernel-6.2: 6.2.16-8
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
Last edited: