Hailo 8 ai m.2 card crashes server when using passthrough.

scyto

Well-Known Member
Aug 8, 2023
539
113
48
Hi all, I have on obscure sceanrio and i have hit a dead end with my google-fu.

  • I have a Hailo 8 M.2 AI co-processor.
  • It is on an M.2 bifurcation card with 3 NVMes, the NVMEs work fine.
  • The co-processor is in its own IOMMU gtoup.
  • It works perfectly on the host (of course this is the last thing i tried, lol).

When i pass through the card to a VM and when the driver loads in the VM the whole physical server instantly resets. in the BMC i see a PCI SERR entry.
I dont seen any signs this is a graveful shutdown, it is a hard reset of the server.

Code:
  ID  |      TimeStamp      |    Sensor Name   |             Sensor Type            |                          Description                           
======|=====================|==================|====================================|================================================================
 736  | 05/19/2025 10:29:23 | BIOS             | critical_interrupt                 | PCIe SEL Log - Asserted
      |                     |                  |                                    | Data1: PCI SERR
      |                     |                  |                                    | Data2: PCI bus number for failed device: 0x00
      |                     |                  |                                    | Data3: PCI device number: 0x01 PCI function number: 0x01

things i have tried:
  • echo null to the reset_methods on the device (this was to supress FLR errors)
  • add a few vfio option to modprobe - namely
  • passing the device as pci instead of pcie
  • changing the PCIE speed in the BIOS of the x16 slot (this was at suggestion of asrock rack who make the mobo)
  • addng viommu=intel to the machine type
  • adding this to modprobe.d file options vfio-pci disable_vfio_pci_flr=1
Due to the sudden nature of the issue there is nothing that seems useful in jornactl on the host or on the guest below is the host.

Code:
May 19 10:27:53 pve-nas1 pvedaemon[2680]: start VM 101: UPID:pve-nas1:00000A78:00004CBB:682B6A19:qmstart:101:root@pam:
May 19 10:27:53 pve-nas1 pvedaemon[2008]: <root@pam> starting task UPID:pve-nas1:00000A78:00004CBB:682B6A19:qmstart:101:root@pam:
May 19 10:27:54 pve-nas1 chronyd[1807]: Selected source 73.65.80.137 (2.debian.pool.ntp.org)

thats the last line, it literally is a super hard reset

is there any magic someone has, or is this just one of those things where it doesn't work and i live with it?