Hailo 8 ai m.2 card crashes server when using passthrough.

scyto · 2025-05-19T20:27:46+0200

Hi all, I have on obscure sceanrio and i have hit a dead end with my google-fu.

I have a Hailo 8 M.2 AI co-processor.
It is on an M.2 bifurcation card with 3 NVMes, the NVMEs work fine.
The co-processor is in its own IOMMU gtoup.
It works perfectly on the host (of course this is the last thing i tried, lol).

When i pass through the card to a VM and when the driver loads in the VM the whole physical server instantly resets. in the BMC i see a PCI SERR entry.
I dont seen any signs this is a graveful shutdown, it is a hard reset of the server.

Code:

  ID  |      TimeStamp      |    Sensor Name   |             Sensor Type            |                          Description                           
======|=====================|==================|====================================|================================================================
 736  | 05/19/2025 10:29:23 | BIOS             | critical_interrupt                 | PCIe SEL Log - Asserted
      |                     |                  |                                    | Data1: PCI SERR
      |                     |                  |                                    | Data2: PCI bus number for failed device: 0x00
      |                     |                  |                                    | Data3: PCI device number: 0x01 PCI function number: 0x01

things i have tried:

echo null to the reset_methods on the device (this was to supress FLR errors)
add a few vfio option to modprobe - namely
passing the device as pci instead of pcie
changing the PCIE speed in the BIOS of the x16 slot (this was at suggestion of asrock rack who make the mobo)
addng viommu=intel to the machine type
adding this to modprobe.d file options vfio-pci disable_vfio_pci_flr=1

Due to the sudden nature of the issue there is nothing that seems useful in jornactl on the host or on the guest below is the host.

Code:

May 19 10:27:53 pve-nas1 pvedaemon[2680]: start VM 101: UPID:pve-nas1:00000A78:00004CBB:682B6A19:qmstart:101:root@pam:
May 19 10:27:53 pve-nas1 pvedaemon[2008]: <root@pam> starting task UPID:pve-nas1:00000A78:00004CBB:682B6A19:qmstart:101:root@pam:
May 19 10:27:54 pve-nas1 chronyd[1807]: Selected source 73.65.80.137 (2.debian.pool.ntp.org)

thats the last line, it literally is a super hard reset

is there any magic someone has, or is this just one of those things where it doesn't work and i live with it?

Search

Search

Hailo 8 ai m.2 card crashes server when using passthrough.

scyto

Well-Known Member

We value your privacy