[SOLVED] HBA Passthough not working after motherboard swap

Rawb

New Member
Jun 19, 2024
1
0
1
Hello,

I have a proxmox machine with a TrueNAS core VM to which I was passing through (raw device) a HBA card with all of my hard drives and everything was working perfectly fine, I had my drives and ZFS pools.

I just swapped out my motherboard from an Asus H510M-A Prime to a Gigabyte B560M DS3H V3 and I can't get the device passthrough to work again (proxmox and all of my other LXC are working fine)
  • VT-d is enabled in the new BIOS
  • I changed the device ID to match the new ID with the new motherboard/PCIE slot
When trying to start the proxmox VM with the passthrough enabled it hangs indefinitely but starts just fine without it

dsmeg output
Code:
[ 3961.638350] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Receiver ID)
[ 3961.639195] pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00200000/00010000
[ 3961.640028] pcieport 0000:00:01.0:    [21] ACSViol                (First)
[ 3962.964161] pcieport 0000:00:01.0: broken device, retraining non-functional downstream link at 2.5GT/s
[ 3963.966168] pcieport 0000:00:01.0: retraining failed
[ 3965.141182] pcieport 0000:00:01.0: broken device, retraining non-functional downstream link at 2.5GT/s
[ 3966.143179] pcieport 0000:00:01.0: retraining failed
[ 3966.144005] vfio-pci 0000:02:00.0: not ready 1023ms after resume; waiting
[ 3967.189192] vfio-pci 0000:02:00.0: not ready 2047ms after resume; waiting
[ 3969.301191] vfio-pci 0000:02:00.0: not ready 4095ms after resume; waiting
[ 3973.909218] vfio-pci 0000:02:00.0: not ready 8191ms after resume; waiting
[ 3982.613247] vfio-pci 0000:02:00.0: not ready 16383ms after resume; waiting
[ 3999.509319] vfio-pci 0000:02:00.0: not ready 32767ms after resume; waiting
[ 4035.349495] vfio-pci 0000:02:00.0: not ready 65535ms after resume; giving up
[ 4035.350340] vfio-pci 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 4037.653485] vfio-pci 0000:02:00.0: not ready 1023ms after DPC; waiting
[ 4038.741363] vfio-pci 0000:02:00.0: not ready 2047ms after DPC; waiting
[ 4040.853375] vfio-pci 0000:02:00.0: not ready 4095ms after DPC; waiting
[ 4045.077517] vfio-pci 0000:02:00.0: not ready 8191ms after DPC; waiting
[ 4053.781494] vfio-pci 0000:02:00.0: not ready 16383ms after DPC; waiting
[ 4070.677628] vfio-pci 0000:02:00.0: not ready 32767ms after DPC; waiting
[ 4104.981770] vfio-pci 0000:02:00.0: not ready 65535ms after DPC; giving up
[ 4104.982576] pcieport 0000:00:01.0: AER: subordinate device reset failed
[ 4104.983401] pcieport 0000:00:01.0: AER: device recovery failed
[ 4104.984287] vfio-pci 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 4104.997337] pcieport 0000:00:01.0: DPC: containment event, status:0x1f11 source:0x0000
[ 4104.998180] pcieport 0000:00:01.0: DPC: unmasked uncorrectable error detected
[ 4104.999015] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Receiver ID)
[ 4104.999890] pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00200000/00010000
[ 4105.000744] pcieport 0000:00:01.0:    [21] ACSViol                (First)
[ 4106.324652] pcieport 0000:00:01.0: broken device, retraining non-functional downstream link at 2.5GT/s
[ 4107.326775] pcieport 0000:00:01.0: retraining failed
[ 4108.501787] pcieport 0000:00:01.0: broken device, retraining non-functional downstream link at 2.5GT/s
[ 4109.504791] pcieport 0000:00:01.0: retraining failed
[ 4109.505618] vfio-pci 0000:02:00.0: not ready 1023ms after resume; waiting
[ 4110.549799] vfio-pci 0000:02:00.0: not ready 2047ms after resume; waiting
[ 4112.661803] vfio-pci 0000:02:00.0: not ready 4095ms after resume; waiting
[ 4117.269708] vfio-pci 0000:02:00.0: not ready 8191ms after resume; waiting

iommu device list
Code:
Group 0:        [8086:4c8b] [R] 00:02.0  VGA compatible controller                RocketLake-S GT1 [UHD Graphics 730]
Group 1:        [8086:4c53]     00:00.0  PCI bridge                               Device 4c53
Group 2:        [8086:4c01] [R] 00:01.0  PCI bridge                               Device 4c01
Group 3:        [8086:06f9]     00:12.0  Signal processing controller             Comet Lake PCH Thermal Controller
Group 4:        [8086:06ed]     00:14.0  USB controller                           Comet Lake USB 3.1 xHCI Host Controller
USB:            [048d:5702]              Bus 001 Device 004                       Integrated Technology Express, Inc. RGB LED Controller
USB:            [1d6b:0002]              Bus 001 Device 001                       Linux Foundation 2.0 root hub
USB:            [1d6b:0003]              Bus 002 Device 001                       Linux Foundation 3.0 root hub
                [8086:06ef]     00:14.2  RAM memory                               Comet Lake PCH Shared SRAM
Group 5:        [8086:06e0]     00:16.0  Communication controller                 Comet Lake HECI Controller
Group 6:        [8086:06d2]     00:17.0  SATA controller                          Comet Lake SATA AHCI Controller
Group 7:        [8086:06c2] [R] 00:1b.0  PCI bridge                               Device 06c2
Group 8:        [8086:06ac] [R] 00:1b.4  PCI bridge                               Comet Lake PCI Express Root Port #21
Group 9:        [8086:06bc] [R] 00:1c.0  PCI bridge                               Device 06bc
Group 10:       [8086:06b0] [R] 00:1d.0  PCI bridge                               Comet Lake PCI Express Root Port #9
                [8086:06b4] [R] 00:1d.4  PCI bridge                               Device 06b4
Group 11:       [8086:0684]     00:1f.0  ISA bridge                               H470 Chipset LPC/eSPI Controller
                [8086:f1c8]     00:1f.3  Audio device                             Device f1c8
                [8086:06a3]     00:1f.4  SMBus                                    Comet Lake PCH SMBus Controller
                [8086:06a4]     00:1f.5  Serial bus controller                    Comet Lake PCH SPI Controller
                [8086:0d4d]     00:1f.6  Ethernet controller                      Ethernet Connection (11) I219-V
Group 12:       [1000:0072] [R] 02:00.0  RAID bus controller                      SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
Group 13:       [10ec:8125] [R] 03:00.0  Ethernet controller                      RTL8125 2.5GbE Controller
Group 14:       [144d:a80a] [R] 04:00.0  Non-Volatile memory controller           NVMe SSD Controller PM9A1/PM9A3/980PRO
Group 15:       [144d:a80a] [R] 05:00.0  Non-Volatile memory controller           NVMe SSD Controller PM9A1/PM9A3/980PRO



EDIT: After switching the HBA card from the top PCIE x16 slot to the bottom one, and removing my two other PCIE devices I got it working just fine, I'm going to investigate further and close this thread for now
 
Last edited: