Mellanox ConnectX3-Pro Link Down and never comes up again

vinicius_trev

New Member
Feb 2, 2024
1
0
1
I am facing a strange issue with my OPNsense + Proxmox. I am using an ODI DFP-34X-2C3 XPON ONU with a MCX312B-XCC_Ax and sometimes (quite frequently tbh) when rebooting the VM (OPNsense) or the entire Proxmox, the mlx4_en: enp3s0d1: Link Down attached to the OPNsense bridge goes down and never comes up again, sometimes it takes more than 5min for the link to come up again automatically. Has anyone faced simular issues?

mlxfwmanager
Code:
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX3Pro
  Part Number:      MCX312B-XCC_Ax
  Description:      ConnectX-3 Pro EN network interface card; 10GigE; dual-port SFP+; PCIe3.0 x8 8GT/s; RoHS R6
  PSID:             MT_1200111023
  PCI Device Name:  /dev/mst/mt4103_pci_cr0
  Port1 MAC:        248axxxxxxxx
  Port2 MAC:        248axxxxxxxx
  Versions:         Current        Available     
     FW             2.42.5000      N/A           
     PXE            3.4.0752       N/A           

  Status:           No matching image found

pveversion
Code:
pve-manager/8.3.4/65224a0f9cd294a3 (running kernel: 6.8.12-8-pve)

ethtool -m enp3s0d1
Code:
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x01 (SC)
        Transceiver codes                         : 0x00 0x00 0x00 0x02 0x22 0x00 0x01 0x00 0x00
        Transceiver type                          : Ethernet: 1000BASE-LX
        Transceiver type                          : FC: intermediate distance (I)
        Transceiver type                          : FC: Longwave laser (LC)
        Transceiver type                          : FC: Single Mode (SM)
        Encoding                                  : 0x01 (8B/10B)
        BR, Nominal                               : 1300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 20km
        Length (SMF)                              : 20000m
        Length (50um)                             : 0m
        Length (62.5um)                           : 0m
        Length (Copper)                           : 0m
        Length (OM3)                              : 0m
        Laser wavelength                          : 1310nm
        Vendor name                               : ODI
        Vendor OUI                                : 00:00:00
        Vendor PN                                 : DFP-34X-2C3
        Vendor rev                                :
        Option values                             : 0x00 0x1a
        Option                                    : RX_LOS implemented
        Option                                    : TX_FAULT implemented
        Option                                    : TX_DISABLE implemented
        BR margin, max                            : 0%
        BR margin, min                            : 0%
        Vendor SN                                 : XPON24110587
        Date code                                 : 241120

ethtool enp3s0d1 (when link is up)
Code:
Settings for enp3s0d1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
        Link detected: yes

ethtool enp3s0d1 (when link is down after rebooting OPNsense VM for example)
Code:
Settings for enp3s0d1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Auto-negotiation: off
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
        Link detected: no

/etc/network/interfaces
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
#Motherboard

iface enp3s0 inet manual
#Glass Panel SFP+

iface enp3s0d1 inet manual

auto vmbr2
iface vmbr2 inet static
        address 10.10.0.3/24
        gateway 10.10.0.1
        bridge-ports eno1 enp3s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 10 20 30 40 50
#OPNSense LAN

auto vmbr1
iface vmbr1 inet manual
        bridge-ports enp3s0d1
        bridge-stp off
        bridge-fd 0
        up ethtool -s enp3s0d1 speed 1000 duplex full
#OPNSense WAN

I did some tests, rebooting the VM and the entire Proxmox with that transceiver connected to a switch (vmbr2 - enp3s0) and didn't have any problems with the OPNsense PPPoE connection, but as I am using it for WAN I don't want to mix it with the switch.