Hello, I am currently in possession of a foxconn SDX55 5G card and was trying to connect to an ubuntu guest using pci-passthrough however I am facing some issues.
SDX55 would need the mhi-pci-generice driver to be loaded and so I have setup Ubuntu guest 21.04 with linux kernel 5.13 installed from mainstream. Below is the guest config I have:
On proxmox the SDX55 is detected with vfio-pci driver used for it:
In Ubuntu guest vm, the pci device gets detected as well and uses the mhi-pci-generic driver, though certain pci capabilities seem to be hidden by proxmox:
The dmesg output on proxmox shows the hidden pci capabilities:
At beginning the SDX55 would successfully establish internet connectivity and the ubuntu guest can send traffic to public internet. The moment a stress is carried out in the form of speedtest, the ubuntu SDX55 suddenly fails and becomes unresponsive in the ubuntu guest. Both proxmox host and ubuntu guest would still see the SDX55 pci device even though it becomes completely unresponsive. I am wondering if the issue is due to vfio-pci in any way, I plan to test the pci device on an Ubuntu host soon and would appreciate some thoughts on this. Thanks.
SDX55 would need the mhi-pci-generice driver to be loaded and so I have setup Ubuntu guest 21.04 with linux kernel 5.13 installed from mainstream. Below is the guest config I have:
Code:
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 1
efidisk0: vmstore-pcie3:vm-106-disk-1,size=4M
hostpci0: 06:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 4096
name: UbuntuTest
net0: virtio=9A:F1:61:31:49:8B,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: vmstore-pcie3:vm-106-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=3be364bb-8439-4a74-bdb0-e46528b6e495
sockets: 1
tablet: 0
On proxmox the SDX55 is detected with vfio-pci driver used for it:
Code:
root@pve:~# lspci -vvk -s 06:00
06:00.0 Wireless controller [0d40]: Foxconn International, Inc. Device e0ab
Subsystem: Foxconn International, Inc. Device e0ab
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Region 0: Memory at fc301000 (64-bit, non-prefetchable) [size=4K]
Region 2: Memory at fc300000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed unknown, Width x2, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] #19
Capabilities: [168 v1] #26
Capabilities: [18c v1] #27
Capabilities: [19c v1] Transaction Processing Hints
No steering table available
Capabilities: [228 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [230 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=70us PortTPowerOnTime=0us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [240 v1] #25
Kernel driver in use: vfio-pci
In Ubuntu guest vm, the pci device gets detected as well and uses the mhi-pci-generic driver, though certain pci capabilities seem to be hidden by proxmox:
Code:
ahasbini@ubuntu-test:~$ sudo lspci -vvk -s 01:00
01:00.0 Wireless controller [0d40]: Foxconn International, Inc. Device e0ab
Subsystem: Foxconn International, Inc. Device e0ab
Physical Slot: 0
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Region 0: Memory at c2000000 (64-bit, non-prefetchable) [size=4K]
Region 2: Memory at c2001000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp+ ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: Upstream Port
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [19c v1] Transaction Processing Hints
No steering table available
Capabilities: [228 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel modules: mhi_pci_generic
The dmesg output on proxmox shows the hidden pci capabilities:
Code:
[ 492.476173] device tap106i0 entered promiscuous mode
[ 492.490355] fwbr106i0: port 1(fwln106i0) entered blocking state
[ 492.490356] fwbr106i0: port 1(fwln106i0) entered disabled state
[ 492.490395] device fwln106i0 entered promiscuous mode
[ 492.490421] fwbr106i0: port 1(fwln106i0) entered blocking state
[ 492.490422] fwbr106i0: port 1(fwln106i0) entered forwarding state
[ 492.492192] vmbr0: port 5(fwpr106p0) entered blocking state
[ 492.492193] vmbr0: port 5(fwpr106p0) entered disabled state
[ 492.492233] device fwpr106p0 entered promiscuous mode
[ 492.492250] vmbr0: port 5(fwpr106p0) entered blocking state
[ 492.492251] vmbr0: port 5(fwpr106p0) entered forwarding state
[ 492.493886] fwbr106i0: port 2(tap106i0) entered blocking state
[ 492.493887] fwbr106i0: port 2(tap106i0) entered disabled state
[ 492.493939] fwbr106i0: port 2(tap106i0) entered blocking state
[ 492.493939] fwbr106i0: port 2(tap106i0) entered forwarding state
[ 492.923343] vfio-pci 0000:06:00.0: enabling device (0000 -> 0002)
[ 493.950115] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x19@0x148
[ 493.950118] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x26@0x168
[ 493.950120] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x27@0x18c
[ 493.950135] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x1e@0x230
[ 493.950137] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x25@0x240
At beginning the SDX55 would successfully establish internet connectivity and the ubuntu guest can send traffic to public internet. The moment a stress is carried out in the form of speedtest, the ubuntu SDX55 suddenly fails and becomes unresponsive in the ubuntu guest. Both proxmox host and ubuntu guest would still see the SDX55 pci device even though it becomes completely unresponsive. I am wondering if the issue is due to vfio-pci in any way, I plan to test the pci device on an Ubuntu host soon and would appreciate some thoughts on this. Thanks.