Hi all, I have on obscure sceanrio and i have hit a dead end with my google-fu.
When i pass through the card to a VM and when the driver loads in the VM the whole physical server instantly resets. in the BMC i see a PCI SERR entry.
I dont seen any signs this is a graveful shutdown, it is a hard reset of the server.
things i have tried:
is there any magic someone has, or is this just one of those things where it doesn't work and i live with it?
- I have a Hailo 8 M.2 AI co-processor.
- It is on an M.2 bifurcation card with 3 NVMes, the NVMEs work fine.
- The co-processor is in its own IOMMU gtoup.
- It works perfectly on the host (of course this is the last thing i tried, lol).
When i pass through the card to a VM and when the driver loads in the VM the whole physical server instantly resets. in the BMC i see a PCI SERR entry.
I dont seen any signs this is a graveful shutdown, it is a hard reset of the server.
Code:
ID | TimeStamp | Sensor Name | Sensor Type | Description
======|=====================|==================|====================================|================================================================
736 | 05/19/2025 10:29:23 | BIOS | critical_interrupt | PCIe SEL Log - Asserted
| | | | Data1: PCI SERR
| | | | Data2: PCI bus number for failed device: 0x00
| | | | Data3: PCI device number: 0x01 PCI function number: 0x01
things i have tried:
- echo null to the reset_methods on the device (this was to supress FLR errors)
- add a few vfio option to modprobe - namely
- passing the device as pci instead of pcie
- changing the PCIE speed in the BIOS of the x16 slot (this was at suggestion of asrock rack who make the mobo)
- addng viommu=intel to the machine type
- adding this to modprobe.d file
options vfio-pci disable_vfio_pci_flr=1
Code:
May 19 10:27:53 pve-nas1 pvedaemon[2680]: start VM 101: UPID:pve-nas1:00000A78:00004CBB:682B6A19:qmstart:101:root@pam:
May 19 10:27:53 pve-nas1 pvedaemon[2008]: <root@pam> starting task UPID:pve-nas1:00000A78:00004CBB:682B6A19:qmstart:101:root@pam:
May 19 10:27:54 pve-nas1 chronyd[1807]: Selected source 73.65.80.137 (2.debian.pool.ntp.org)
thats the last line, it literally is a super hard reset
is there any magic someone has, or is this just one of those things where it doesn't work and i live with it?