Dual Edge TPU passthrough crashes host

joshtwc

New Member
May 22, 2024
2
0
1
So long story short, I am setting up a home assistant/frigate vm and I need to pass through the dual edge tpu to frigate. I have come very close and it appears in home assistant and in frigate, but after some time it will crash the host (which is an HP ProLiant DL380 G10) running Proxmox 8.2 with the following error messages (in iLO):

Code:
Uncorrectable Machine Check Exception (Processor 1, APIC ID 0x00000000, Bank 0x00000006, Status 0xBB800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'36000000).
Uncorrectable PCI Express Error Detected. Slot 2 (Segment 0x0, Bus 0x36, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x4000

Here is the lspci information:
Code:
37:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
38:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
38:07.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
39:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Kernel driver in use: vfio-pci
3a:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Kernel driver in use: vfio-pci

My VM config:
Code:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 12
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,size=4M
hostpci0: 0000:3a:00
hostpci1: 0000:39:00
localtime: 1
memory: 65536
meta: creation-qemu=8.1.5,ctime=1712586677
name: #########
numa: 0
ostype: l26
protection: 1
scsi0: local-lvm:vm-100-disk-1,cache=writethrough,discard=on,size=32G,ssd=1
scsihw: virtio-scsi-pci
sockets: 2
tablet: 0
tags:

I am using the dual edge tpu adapter from magic-blue-smoke
It is an dual intel xeon motherboard, the adapter is plugged into a riser card at the back of the unit.
I have tried the following:
- Disabling SR-IOV in bios
- Changing pcie configuration to gen 1 (bios)
- Updating the grub cmdline for iommu (intel_iommu=on, iommu=pt, etc)
- Changing which pcie port it is plugged into

Its strange that it only crashes upon starting frigate, and it runs for a bit (stable) until it crashes suddenly with no useful logs other than those from HP Integrated Lights Out (iLO)
 
Same here with a single m.2 edge tpu just trying to pass it to home assistant witch crashes the host instantly without logs.

Have you found a workaround or fix?
 
Nope, I ended up just using a usb 3 pcie card. Had to get 4 coral USB's for it. Much simpler than the m.2
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!