[SOLVED] I broke HBA passthrough to TrueNAS VM by adding 2nd HBA

tmath250

New Member
Jan 20, 2025
3
0
1
First real post to this forum, so please be gentle.

The Hardware: SuperMicro 846 36xHDD chassis with dual 1250W PS, Gigabyte MZ72-HB2 dual-socket AMD motherboard w 256GB ECC RAM, 2xEPYC 7F52 16-core processors, LSI SAS2308 HBA #1, 4x Samsung PM9A1 512GB SSD boot drive in zfs stripe/mirror config (RAID 10).

Built first ever bare metal server using Proxmox, first time using Linux. Steep learning curve. Booting UEFI. Passed through HBA #1 to TrueNAS VM. Also have Windows 11 VM. Got it all working (thrilled) after months of tinkering. Stored 6+TB in cloud for backup (whew)! Wanted to copy 1TB Onedrive dataset to NAS for archiving, but due to 1 TB SSD boot drive size limitation, was unable to do so, so got to thinking (I know, dangerous for me;)). All HDD's were in the front 24 bays, so since I wasn't using the back 12 HDD bays yet, figured I could just throw in another SAS HBA and route the cable from the rear backplane to the 2nd HBA and have those drives available to PROXMOX. Plan was to put a 16TB data drive back there.
The only problem was, I had only one CPU installed originally, and the only slot available with x8 PCI lanes was tied to the 2nd CPU, so I added the 2nd CPU at the same time as I added the 2nd HBA.

That's when I broke the passthrough of HBA #1. TrueNAS could no longer see any vdevs. Got to looking at the IOMMU groups and discovered the results from pvesh get /nodes/www/hardware/pci --pci-class-blacklist "" showed the ID had changed for the group assigned to HBA #1.

ORIGINAL:
class │ device │ id │ iommugroup │ vendor │ device_name │ mdev │subsystm_device│subsystm_device_name│Ven
│ 0x010700 │ 0x0087 │ 0000:41:00.0 │ 30 │ 0x1000 │ SAS2308 PCI-Express Fusion-MPT SAS-2 │ │ 0x3050 │ SAS9217-8i │ 0x1000

NOW:

│ 0x010700 │ 0x0087 │ 0000:21:00.0 │ 30 │ 0x1000 │ SAS2308 P

│ 0x010700 │ 0x0072 │ 0000:a1:00.0 │ 92 │ 0x1000 │ SAS2008 P (2nd HBA)


It appears the "id" for HBA #1 has changed from 41:00.0 to 21:00.0 and I believe that's where it broke, but don't know how to fix it. Is it possible that adding the 2nd CPU changed the PCI Bus?
 
Last edited:
Hey :)

changing PCIe cards, adding CPUs etc. can always change the PCI ID of any cards.
This is heavily dependent on the Motherboard / BIOS and how this vendor handles such things.

Have you tried simply replacing the raw device in the Hardware configuration of your VM?
1772443784794.png
Under Add you can also add your second HBA.
 
  • Like
Reactions: Johannes S
I very much recommend using a resource mapping instead of a raw device. It's supposed to make changing IOMMU groups safer.
But one would still need to manually adjust the PCI IDs in the ressource mapping (if the hardware changes), right? It does not look for the new device automatically?
Especially if you have multiple identical devices, it would not know which is the right one for each mapping (as of my understanding, it only differentiate by Vendor/Device (+Subsystem).
It would just prevent starting up a VM with a wrong mapping (only for HA?).
Other than that, I do not see a benefit for a single node setup.
 
Last edited:
Yep. The reason I recommend it is that it should be able to prevent node boot failures and other issues like that.
 
  • Like
Reactions: tmath250