Mellanox ConnectX-3 VFIO passthrough issues

automatic

New Member
Nov 20, 2025
1
0
1
Hello, everyone. This thread is a crosspost from a different forum, seeing as how my thread there seems to have died.

I’ve spent many, many days trying to solve this issue to no avail, which is why I’ve resorted to making this post. There’s a lot to unpack, so please bear with me.

I’m running Proxmox VE 9 on trixie where I’m trying to passthrough my Mellanox CX-3 (MCX312A-XCBT) NIC to a select guest. The problem I’m facing is that the card is turning off MSI-X and falling back to using legacy interrupts whenever it’s bound by vfio-pci, or not bound at all. No matter what I do, I cannot get MSI-X to turn on unless mlx4_core binds to it.

Turning on MSI-X is important due to the fact that I cannot passthrough the card otherwise. This is due to the fact that other devices live on the same IRQ line. Furthermore, it simply shouldn’t be using legacy interrupts unless it has a “good” reason to, which I, personally, deem that it doesn’t. The error in question is the following: genirq: Flags mismatch irq 16. 00200000 (vfio-intx(...)) vs. 00000080 (i801_smbus)

Following is a list of actions I’ve tried to no avail:
  • setpci on the MSI-X enable byte
  • Blacklisting mlx4_core, mlx4_ib and mlx4_en
  • Binding vfio-pci early through kernel parameters
  • pci=realloc=(on|off), vfio_pci.nointxmasking=(0|1), vfio_pci.disable_idle_d3=(0|1) and other kernel/module parameters
  • Updating NIC firmware
  • Unbinding, binding, rescanning in various orders
  • Playing around with relevant BIOS settings
  • Trying other distros

Additional notes:
  • I’m not interested in doing SR-IOV. I want a full passthrough of the NIC
  • I’m not interested in detaching the SMBus and letting the card continue using legacy interrupts
  • The firmware config doesn’t contain anything pertaining to MSI-X
  • The NIC lives in its own IOMMU group
  • I’m running on the Gigabyte MS03-CE0
  • lspci output is at the bottom

At one point, on a previous install of the same environment, I managed to get MSI-X to be enabled in an unbound state. That setup even worked when starting the guest. Through a series of trial and error to identify the solution, I thought I had discovered it (which was blacklisting mlx4_core and mlx4_en) and therefore decided to reinstall in order to start on a clean slate. Unfortunately, my theory was very much wrong, as proven by the difficulties I’m facing now. I won’t be repeating that mistake anytime soon, at least.

I assume the problem boils down to the CX-3 itself. From what I’ve gathered from various forum threads, people seem to have tons of issues both with passing through NICs, but also with this card in particular.

At the time of writing, I’m looking into the differences between how mlx4_core and vfio-pci implement MSI-X setup. This will probably lead me nowhere, but I’ve got nothing left to lose (except for buying a new NIC and calling it a day, but I ain’t a quitter).

I appreciate any and all responses that can help me figure out this issue. I’ve spent countless hours trying to fix this, and I’m not giving up until I get it working. If there are any details that I’ve missed, I’m more than happy to provide them.

Code:
05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
        Subsystem: Mellanox Technologies Device 0049
        Physical Slot: 2
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 16
        NUMA node: 0
        IOMMU group: 30
        Region 0: Memory at d1600000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at 203fff000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at d1500000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Vital Product Data
                Product Name: CX312A - ConnectX-3 SFP+
                Read-only fields:
                        [PN] Part number: MCX312A-XCBT         
                        [EC] Engineering changes: A9
                        [SN] Serial number: XXXXXXXXXXXX (redacted)
                        [V0] Vendor specific: PCIe Gen3 x8   
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                Read/write fields:
                        [V1] Vendor specific: N/A   
                        [YA] Asset tag: N/A                     
                        [RW] Read-write area: 105 byte(s) free
                        [RW] Read-write area: 253 byte(s) free (repeated 14x)
                End
        Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
                Vector table: BAR=0 offset=0007c000
                PBA: BAR=0 offset=0007d000
        Capabilities: [60] Express (v2) Endpoint, IntMsgNum 0
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 116W TEE-IO-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 512 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis-
                         AtomicOpsCtl: ReqEn-
                         IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
                         10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [148 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff (redacted)
        Capabilities: [154 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [18c v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- IntMsgNum 0
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 1004
                Supported Page Size: 000007ff, System Page Size: 00000001
                Region 2: Memory at 0000203ffb000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Kernel modules: mlx4_core