Proxmox and Coral TPU M.2 Passthrough Broken on Newer Platform - PCI_NUM_PINS' failed

Seed

Renowned Member
Oct 18, 2019
109
63
68
124
I got some new hardware. z890 chipset and having trouble passing through the Coral TPU. Ive had no issues doing with with various servers, EPYC Milan, Intel Q670, A lenovo P360 with 8.2.7. I have both m.2 and a A+E key that I kind of experiment with. After I setup proxmox 8.2.7 kernel 6.8.12-3-pve and get the apex driver all setup it all seems ok from the proxmox host perspective. IOMMU is isolated to the device but does have one issue:

Code:
[    0.103529] DMAR: IOMMU enabled
[    0.217422] DMAR-IR: IOAPIC id 2 under DRHD base  0xfc810000 IOMMU 1
[    0.399458] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[    0.453737] DMAR: IOMMU feature sc_support inconsistent

Code:
lspci -nn | grep 089a
01:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]

Code:
ls /dev/apex_0
/dev/apex_0

Code:
89:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU (prog-if ff)
    Subsystem: Global Unichip Corp. Coral Edge TPU
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 16
    IOMMU group: 35
    Region 0: Memory at 4000400000 (64-bit, prefetchable) [size=16K]
    Region 2: Memory at 4000300000 (64-bit, prefetchable) [size=1M]
    Capabilities: [80] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25W
        DevCtl:    CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap:    Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x1
            TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp- ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
             EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [d0] MSI-X: Enable+ Count=128 Masked-
        Vector table: BAR=2 offset=00046800
        PBA: BAR=2 offset=00046068
    Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [f8] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
    Capabilities: [108 v1] Latency Tolerance Reporting
        Max snoop latency: 15728640ns
        Max no snoop latency: 15728640ns
    Capabilities: [110 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=30720ns
        L1SubCtl2: T_PwrOn=10us
    Capabilities: [200 v2] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Kernel driver in use: apex
    Kernel modules: apex

When I add the pci device, the attempt to start the VM I get the following error:


Code:
kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
TASK ERROR: start failed: QEMU exited with code 1

DMESG throws:


Code:
[  287.110888] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  287.112455] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  287.112548] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  288.281835] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  288.283603] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  289.314424] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  319.169012] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  319.169114] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible

And now the device is basically gone til I reboot:

Code:
ls /dev/apex_0
ls: cannot access '/dev/apex_0': No such file or directory

Code:
89:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU (prog-if ff)
    Subsystem: Global Unichip Corp. Coral Edge TPU
    !!! Unknown header type 7f
    Interrupt: pin ? routed to IRQ 16
    IOMMU group: 35
    Region 0: Memory at 4000400000 (64-bit, prefetchable) [size=16K]
    Region 2: Memory at 4000300000 (64-bit, prefetchable) [size=1M]
    Kernel driver in use: vfio-pci
    Kernel modules: apex


Ive tried the following kernels. I expect the older ones to likely fail:

Code:
6.11.0-1-pve
6.8.12-3-pve
6.8.4-2-pve

The Kernel driver in use has changed also. I'm not sure what to make of this. I know this is a newer chipeset. M.2 is on the CPU lanes as well. Maybe it's a bug? If anyone has some tips on how to better debug i'm all for it.
 
Last edited:
if you upgraded to these packages this is why.

libpve-common-perl/stable 8.2.6 all [upgradable from: 8.2.5]
pve-firmware/stable,stable 3.14-1 all [upgradable from: 3.13-3]
qemu-server/stable 8.2.5 amd64 [upgradable from: 8.2.4]

downgrade to the version after the string upgradable from.

I ran into pci passthrough issue as soon as i downgraded worked fine.
 
apt-get install libpve-common-perl=8.2.5
apt-get install pve-firmware=3.13-3
apt-get install qemu-server=8.2.4

I don't know exactly which one fixed it but after running these commands everything works again.
 
apt-get install libpve-common-perl=8.2.5
apt-get install pve-firmware=3.13-3
apt-get install qemu-server=8.2.4

I don't know exactly which one fixed it but after running these commands everything works again.
alright ill give it a go, however 8.2.7 worked fine on a different build with the same m.2 card
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!