[SOLVED] PCI Passthrough Device causes hotplug event on host and disappears from VM

scyto

Well-Known Member
Aug 8, 2023
542
114
48
I have a Hailo AI card.

I have to have hotplug enabled in the BIOS on the server to prevent this card crashing the server when it does what it does.

When it initalizes it loads FW from a bin file in the VM.
This appears to cause a reset of the device that = a hotplug event on the PCIE bus (it is an M2 card plugged into a PCIE adapter).

Q. is there a way to enable PCI hotplug so the VM sees the devices come back?

logs: Host

Code:
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1: pciehp: Slot(21): Link Down
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1: pciehp: Slot(21): Card not present
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1: pciehp: Slot(21): Card present
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1: pciehp: Slot(21): Link Up
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1: ASPM: current common clock configuration is inconsistent, reconfiguring
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1: PCI bridge to [bus c1-c2]
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1:   bridge window [io  0xd000-0xdfff]
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1:   bridge window [mem 0xd8000000-0xd80fffff]
[Wed May 21 12:19:16 2025] pcieport 0000:c0:01.1:   bridge window [mem 0x10befff00000-0x10beffffffff 64bit pref]

logs: VM
Code:
root@debian-dev-628:~# dmesg -T | grep hailo
[Wed May 21 12:19:13 2025] hailo_pci: loading out-of-tree module taints kernel.
[Wed May 21 12:19:13 2025] hailo_pci: module verification failed: signature and/or required key missing - tainting kernel
[Wed May 21 12:19:13 2025] hailo: Init module. driver version 4.19.0
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing on: 1e60:2864...
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: Allocate memory for device extension, 11632
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: Device enabled
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: mapped bar 0 - (____ptrval____) 16384
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: mapped bar 2 - (____ptrval____) 4096
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: mapped bar 4 - (____ptrval____) 16384
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: Enabled 64 bit dma
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Disabling ASPM L0s
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Successfully disabled ASPM L0s
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Writing file hailo/hailo8_fw.bin
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: firmware: direct-loading firmware hailo/hailo8_fw.bin
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: File hailo/hailo8_fw.bin written successfully
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Writing file hailo/hailo8_board_cfg.bin
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: firmware: failed to load hailo/hailo8_board_cfg.bin (-2)
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: firmware: failed to load hailo/hailo8_board_cfg.bin (-2)
[Wed May 21 12:19:13 2025] Failed to write file hailo/hailo8_board_cfg.bin
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: File hailo/hailo8_board_cfg.bin written successfully
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: Writing file hailo/hailo8_fw_cfg.bin
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: firmware: failed to load hailo/hailo8_fw_cfg.bin (-2)
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: firmware: failed to load hailo/hailo8_fw_cfg.bin (-2)
[Wed May 21 12:19:13 2025] Failed to write file hailo/hailo8_fw_cfg.bin
[Wed May 21 12:19:13 2025] hailo 0000:01:00.0: File hailo/hailo8_fw_cfg.bin written successfully
[Wed May 21 12:19:14 2025] hailo 0000:01:00.0: Firmware loaded successfully
[Wed May 21 12:19:14 2025] hailo 0000:01:00.0: Probing: Added board 1e60-2864, /dev/hailo0
[Wed May 21 12:19:16 2025] hailo 0000:01:00.0: Remove: Releasing board
[Wed May 21 12:19:16 2025] hailo 0000:01:00.0: Remove: Freed board, /dev/hailo0
 
fixed by doing several things:

first was to create a modprobe.d/file that contained
options vfio-pci ids=1e60:2864 disable_vfio_pci_flr=1

the second was to remove the device from being mapped through the gui and use this set of args in the vmid.conf file (the hotplug=off was key)

args: -device pcie-root-port,id=pcie_hailo,slot=10,bus=pcie.0,chassis=10,hotplug=off -device vfio-pci,host=c1:00.0,bus=pcie_hailo,addr=0x0
 
  • Like
Reactions: leesteken