SFP Optic not working anymore

RaV001

Active Member
Apr 7, 2017
11
0
41
34
Hi

After a proxmox update to newest versions on Wednesday I am unable to bring up an SFP+ interface with an SFP optic in it.

I think it has something todo with the new kernel that was installed with this update: 4.15.15-1-pve #1 SMP PVE 4.15.15-6 (Mon, 9 Apr 2018 12:24:42 +0200) x86_64 GNU/Linux

The optic is correctly seen by ethtool:

ethtool -m eno7
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x00 0x00 0x00 0x02 0x12 0x00 0x01 0x01
Transceiver type : Ethernet: 1000BASE-LX
Transceiver type : FC: long distance (L)
Transceiver type : FC: Longwave laser (LC)
Transceiver type : FC: Single Mode (SM)
Transceiver type : FC: 100 MBytes/sec
Encoding : 0x01 (8B/10B)
BR, Nominal : 1300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 10km
Length (SMF) : 10000m
Length (50um) : 0m
Length (62.5um) : 0m
Length (Copper) : 0m
Length (OM3) : 0m
Laser wavelength : 1310nm
Vendor name : FLEXOPTIX
Vendor OUI : 38:86:02
Vendor PN : S.1312.10.D
Vendor rev : A
Option values : 0x00 0x1a
Option : RX_LOS implemented
Option : TX_FAULT implemented
Option : TX_DISABLE implemented
BR margin, max : 0%
BR margin, min : 0%
Vendor SN : F7854J0
Date code : 140909
Optical diagnostics support : Yes
Laser bias current : 22.912 mA
Laser output power : 0.1812 mW / -7.42 dBm
Receiver signal average optical power : 0.2641 mW / -5.78 dBm
Module temperature : 56.19 degrees C / 133.14 degrees F
Module voltage : 3.2656 V
Alarm/warning flags implemented : Yes
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser output power high alarm : Off
Laser output power low alarm : Off
Laser output power high warning : Off
Laser output power low warning : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser rx power high alarm : Off
Laser rx power low alarm : Off
Laser rx power high warning : Off
Laser rx power low warning : Off
Laser bias current high alarm threshold : 80.000 mA
Laser bias current low alarm threshold : 2.000 mA
Laser bias current high warning threshold : 70.000 mA
Laser bias current low warning threshold : 3.000 mA
Laser output power high alarm threshold : 0.7943 mW / -1.00 dBm
Laser output power low alarm threshold : 0.1000 mW / -10.00 dBm
Laser output power high warning threshold : 0.6310 mW / -2.00 dBm
Laser output power low warning threshold : 0.1259 mW / -9.00 dBm
Module temperature high alarm threshold : 110.00 degrees C / 230.00 degrees F
Module temperature low alarm threshold : -45.00 degrees C / -49.00 degrees F
Module temperature high warning threshold : 95.00 degrees C / 203.00 degrees F
Module temperature low warning threshold : -42.00 degrees C / -43.60 degrees F
Module voltage high alarm threshold : 3.6000 V
Module voltage low alarm threshold : 3.0000 V
Module voltage high warning threshold : 3.5000 V
Module voltage low warning threshold : 3.0500 V
Laser rx power high alarm threshold : 0.6310 mW / -2.00 dBm
Laser rx power low alarm threshold : 0.0050 mW / -23.01 dBm
Laser rx power high warning threshold : 0.5012 mW / -3.00 dBm
Laser rx power low warning threshold : 0.0063 mW / -22.01 dBm

The port/optic was working until the point I did restart the system to apply the updates.

Any help is appreciated.
 
It is definitly an issue with the 4.15 kernel. Booting with the 4.13 kernel get's the port back on line
 
does the 4.15.17-1-pve kernel also exhibit this issue?
 
What kind of NIC is the SFP+ module plugged into, and which driver does it use?
The ixgbe module (lspci -vv should tell you which driver the NIC uses) have a module parameter
allow_unsupported_sfp, which might be needed You can set them in:
Code:
/etc/modprobe.d/ixgbe.conf
 
Sorry for my late reply was a busy day and thank you for your help.

Yes, with 4.15.17-1-pve the issue is still there.

Here the output from lspci -vv:

04:00.0 Ethernet controller: Intel Corporation Ethernet Connection X552 10 GbE SFP+
Subsystem: Super Micro Computer Inc Ethernet Connection X552 10 GbE SFP+
Physical Slot: 0-3
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 16
NUMA node: 0
Region 0: Memory at fbc00000 (64-bit, prefetchable) [size=2M]
Region 4: Memory at fbe04000 (64-bit, prefetchable) [size=16K]
Expansion ROM at 90100000 [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Device Serial Number 00-00-c9-ff-ff-00-00-00
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
VF offset: 128, stride: 2, Device ID: 15a8
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000090200000 (64-bit, non-prefetchable)
Region 3: Memory at 0000000090300000 (64-bit, non-prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1b0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [1c0 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: ixgbe
Kernel modules: ixgbe

The module is programmed for intel compatibility and worked until the update of my system. I tried the allow_unsupported_sfp command but it does not help.
 
I have manualy added version 5.3.7 of the ixgbe driver which fixes my issue. Looks like there is something wrong in the new kernel versions with it. How can this be fixed for newer kernels?
 
I have manualy added version 5.3.7 of the ixgbe driver which fixes my issue. Looks like there is something wrong in the new kernel versions with it. How can this be fixed for newer kernels?

we'd need to check the in-tree module vs. the out of tree one provided by Intel. unfortunately this is quite cumbersome, as the out of tree ones are released as tar balls usually without any meaningful changelog.

do you get any errors in the logs or when attempting to bring up the interface with the non-working in-tree module?
 
Did only see in dmesg that the interface came up and then goes down after a few seconds. What log do you want to see? I should be able to look into them when you tell me which one it should be
 
Did only see in dmesg that the interface came up and then goes down after a few seconds. What log do you want to see? I should be able to look into them when you tell me which one it should be

yes, the dmesg output would be potentially helpful
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!