Hello, everyone. This thread is a crosspost from a different forum, seeing as how my thread there seems to have died.
I’ve spent many, many days trying to solve this issue to no avail, which is why I’ve resorted to making this post. There’s a lot to unpack, so please bear with me.
I’m running Proxmox VE 9 on
Turning on MSI-X is important due to the fact that I cannot passthrough the card otherwise. This is due to the fact that other devices live on the same IRQ line. Furthermore, it simply shouldn’t be using legacy interrupts unless it has a “good” reason to, which I, personally, deem that it doesn’t. The error in question is the following:
Following is a list of actions I’ve tried to no avail:
Additional notes:
At one point, on a previous install of the same environment, I managed to get MSI-X to be enabled in an unbound state. That setup even worked when starting the guest. Through a series of trial and error to identify the solution, I thought I had discovered it (which was blacklisting
I assume the problem boils down to the CX-3 itself. From what I’ve gathered from various forum threads, people seem to have tons of issues both with passing through NICs, but also with this card in particular.
At the time of writing, I’m looking into the differences between how
I appreciate any and all responses that can help me figure out this issue. I’ve spent countless hours trying to fix this, and I’m not giving up until I get it working. If there are any details that I’ve missed, I’m more than happy to provide them.
I’ve spent many, many days trying to solve this issue to no avail, which is why I’ve resorted to making this post. There’s a lot to unpack, so please bear with me.
I’m running Proxmox VE 9 on
trixie where I’m trying to passthrough my Mellanox CX-3 (MCX312A-XCBT) NIC to a select guest. The problem I’m facing is that the card is turning off MSI-X and falling back to using legacy interrupts whenever it’s bound by vfio-pci, or not bound at all. No matter what I do, I cannot get MSI-X to turn on unless mlx4_core binds to it.Turning on MSI-X is important due to the fact that I cannot passthrough the card otherwise. This is due to the fact that other devices live on the same IRQ line. Furthermore, it simply shouldn’t be using legacy interrupts unless it has a “good” reason to, which I, personally, deem that it doesn’t. The error in question is the following:
genirq: Flags mismatch irq 16. 00200000 (vfio-intx(...)) vs. 00000080 (i801_smbus)Following is a list of actions I’ve tried to no avail:
setpcion the MSI-X enable byte- Blacklisting
mlx4_core,mlx4_ibandmlx4_en - Binding
vfio-pciearly through kernel parameters pci=realloc=(on|off),vfio_pci.nointxmasking=(0|1),vfio_pci.disable_idle_d3=(0|1)and other kernel/module parameters- Updating NIC firmware
- Unbinding, binding, rescanning in various orders
- Playing around with relevant BIOS settings
- Trying other distros
Additional notes:
- I’m not interested in doing SR-IOV. I want a full passthrough of the NIC
- I’m not interested in detaching the SMBus and letting the card continue using legacy interrupts
- The firmware config doesn’t contain anything pertaining to MSI-X
- The NIC lives in its own IOMMU group
- I’m running on the Gigabyte MS03-CE0
lspcioutput is at the bottom
At one point, on a previous install of the same environment, I managed to get MSI-X to be enabled in an unbound state. That setup even worked when starting the guest. Through a series of trial and error to identify the solution, I thought I had discovered it (which was blacklisting
mlx4_core and mlx4_en) and therefore decided to reinstall in order to start on a clean slate. Unfortunately, my theory was very much wrong, as proven by the difficulties I’m facing now. I won’t be repeating that mistake anytime soon, at least.I assume the problem boils down to the CX-3 itself. From what I’ve gathered from various forum threads, people seem to have tons of issues both with passing through NICs, but also with this card in particular.
At the time of writing, I’m looking into the differences between how
mlx4_core and vfio-pci implement MSI-X setup. This will probably lead me nowhere, but I’ve got nothing left to lose (except for buying a new NIC and calling it a day, but I ain’t a quitter).I appreciate any and all responses that can help me figure out this issue. I’ve spent countless hours trying to fix this, and I’m not giving up until I get it working. If there are any details that I’ve missed, I’m more than happy to provide them.
Code:
05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies Device 0049
Physical Slot: 2
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 16
NUMA node: 0
IOMMU group: 30
Region 0: Memory at d1600000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at 203fff000000 (64-bit, prefetchable) [size=8M]
Expansion ROM at d1500000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: CX312A - ConnectX-3 SFP+
Read-only fields:
[PN] Part number: MCX312A-XCBT
[EC] Engineering changes: A9
[SN] Serial number: XXXXXXXXXXXX (redacted)
[V0] Vendor specific: PCIe Gen3 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 105 byte(s) free
[RW] Read-write area: 253 byte(s) free (repeated 14x)
End
Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 116W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 512 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff (redacted)
Capabilities: [154 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [18c v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq- IntMsgNum 0
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: 1004
Supported Page Size: 000007ff, System Page Size: 00000001
Region 2: Memory at 0000203ffb000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Kernel modules: mlx4_core