How to add new device PCI ids to IOMMU driver?

socvyr

New Member
Oct 5, 2022
5
0
1
I am working on development hardware. Thus, I might not be able to share too many details. I have been trying to get Proxmox GPU passthrough working on it. But, I get stuck at
Code:
TASK ERROR: Cannot open iommu_group: No such file or directory
.

On the Add PCIe Hardware option, I do see a bunch of other PCI devices with their IOMMU groups, but though the AMD GPU has been listed, it hasn't been assigned an IOMMU group yet. So, the error is understandable.

As for the kernel command line I am using it is
Code:
amd_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream,multifunction

Since this is mostly untouched hardware, I believe I will have to add the IOMMU support for this GPU, though its mesa drivers have been compiled. Can someone please direct me to a guide or reference online to get started on the process? Maybe some instruction to add the PCI ids to the kernel's IOMMU driver, for example. Sorry if it's a stupid proposal. Any other help regarding the same is also highly appreciated. Thanks in advance.
 
Please show the output of cat /proc/cmdline to check which parameters are actually used. Maybe you can tell us the make and model of your motherboard, so we can advise on which slot to use for your device.
amd_iommu=on is nonsense because it is on by default, unless it's not enabled in the BIOS. Please don't use pcie_acs_override=downstream,multifunction because the IOMMU group information will be bogus.
Can you show us your IOMMU groups with this command for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done? Feel free to redact the devices you are working on, but please keep all groups and devices.
 
  • Like
Reactions: socvyr
Please show the output of cat /proc/cmdline to check which parameters are actually used. Maybe you can tell us the make and model of your motherboard, so we can advise on which slot to use for your device.
amd_iommu=on is nonsense because it is on by default, unless it's not enabled in the BIOS. Please don't use pcie_acs_override=downstream,multifunction because the IOMMU group information will be bogus.
Can you show us your IOMMU groups with this command for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done? Feel free to redact the devices you are working on, but please keep all groups and devices.
Thanks a lot for your suggestions. This hardware doesn't have UEFI/BIOS access. So, I am using kexec to boot Proxmox.

As for the
Code:
cat /proc/cmdline
output, here it is:-
Code:
panic=0 clocksource=tsc amdgpu.dpm=0 console=tty0 console=ttyS0,115200n8 console=uart8250,mmio32,0xd0340000 video=HDMI-A-1:1920x1080-24@60 consoleblank=0 net.ifnames=0 drm.debug=0 amdgpu.ppfeaturemask=0xffffffff vfio_iommu_type1.allow_unsafe_interrupts=1

As, you can see, I took your suggestion adn removed the acs overrides. So now, the output of
Code:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
is:-
Code:
IOMMU Group 0 00:14.0 System peripheral [0880]: Redacted ACPI [209d:20d6]
IOMMU Group 0 00:14.1 System peripheral [0880]: Redacted Ethernet Controller [209d:20d7]
IOMMU Group 0 00:14.2 System peripheral [0880]: Redacted SATA AHCI Controller [209d:20d8]
IOMMU Group 0 00:14.3 System peripheral [0880]: Redacted SD/MMC Host Controller [209d:20da]
IOMMU Group 0 00:14.4 System peripheral [0880]: Redacted PCI Express Glue and Miscellaneous Devices [209d:20db]
IOMMU Group 0 00:14.5 System peripheral [0880]: Redacted DMA Controller [209d:20dc]
IOMMU Group 0 00:14.6 System peripheral [0880]: Redacted Memory (DDR3/SPM) [209d:20dd]
IOMMU Group 0 00:14.7 System peripheral [0880]: Redacted USB 3.0 xHCI Host Controller [209d:20de]

I had to change the original PCI IDs and redacted the company name. But, as you can see, without acs override, everything is grouped under Group 0 and also misses the GPU at 00:01.0 and the Audio/HDMI controller at 00:01.1.
 
just fyi, the iommu groups are determined by the bios/uefi and this is not really something the kernel can choose/modify. the 'ACS' patch is more or less a hack, so that each device 'appears' to be in it's own group, but it forgoes the isolation that normally exists. if you really want a supported way of passthrough, you'd have to contact the mainboard/platform vendor and check there how to do it
 
  • Like
Reactions: socvyr
just fyi, the iommu groups are determined by the bios/uefi and this is not really something the kernel can choose/modify. the 'ACS' patch is more or less a hack, so that each device 'appears' to be in it's own group, but it forgoes the isolation that normally exists. if you really want a supported way of passthrough, you'd have to contact the mainboard/platform vendor and check there how to do it
Thanks for that. I found out that this was a firmware issue. Fortunately, had a dev who had the proper firmware. Loading that, I do have the GPU and the audio controller in the IOMMU Group 0. Unfortunately, now there's a new error. After adding the GPU, when I launch, I get this -
Code:
kvm: -device vfio-pci,host=0000:00:01.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:00:01.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: No such device

But, I do have to add that using the new firmware, the device loses display, probably because of IOMMU claiming the GPU. But, after setting auto connect to Internet, I am able to access Proxmox's admin page and SSH,

Out of dmesg | grep vfio is
Code:
vfio-pci 0000:00:01.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
vfio_pci: add [xxxx:yyyy[ffffffff:ffffffff]] class 0x000000/00000000
vfio_pci: add [xxxx:yyyy[ffffffff:ffffffff]] class 0x000000/00000000
vfio-pci 0000:00:01.0: can't find IRQ for PCI INT A; please try using pci=biosirq
vfio-pci 0000:00:01.0: vfio_ecap_init: hiding ecap 0x19@0x270
vfio-pci 0000:00:01.0: can't find IRQ for PCI INT A; please try using pci=biosirq
vfio-pci 0000:00:01.0: vfio_ecap_init: hiding ecap 0x19@0x270
vfio-pci 0000:00:01.0: can't find IRQ for PCI INT A; please try using pci=biosirq
vfio-pci 0000:00:01.0: vfio_ecap_init: hiding ecap 0x19@0x270
vfio-pci 0000:00:01.0: can't find IRQ for PCI INT A; please try using pci=biosirq
vfio-pci 0000:00:01.0: vfio_ecap_init: hiding ecap 0x19@0x270
vfio-pci 0000:00:01.0: can't find IRQ for PCI INT A; please try using pci=biosirq
vfio-pci 0000:00:01.0: vfio_ecap_init: hiding ecap 0x19@0x270

The kernel parameters specific to IOMMU that I am using is
Code:
amd_iommu_dump=1 vfio_iommu_type1 allow_unsafe_interrupts=1 kvm.ignore_msrs=1 vfio-pci.ids=xxxx:yyyy,xxxx:yyyy

Please note I had to mask the PCI Ids of the devices, but they do show up properly.

Also note that the GPU is integrated. Any ideas to fix this?
 
can you post the output of 'lspci -vvk' ? is the device bound to any driver before starting the vm ?
also what about doing what the log tells you :

vfio-pci 0000:00:01.0: can't find IRQ for PCI INT A; please try using pci=biosirq
maybe putting 'pci=biosirq' to the kernel commandline helps?
 
can you post the output of 'lspci -vvk' ? is the device bound to any driver before starting the vm ?
also what about doing what the log tells you :


maybe putting 'pci=biosirq' to the kernel commandline helps?
Thanks for the reply.

As for the lspci output of the GPU and soundcard, here it is:-
Code:
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] GPU_NAME_REDACTED (prog-if 00 [VGA controller])
    Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] GPU_NAME_REDACTED
    Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 0
    IOMMU group: 0
    Region 0: Memory at e0000000 (64-bit, prefetchable) [size=64M]
    Region 2: Memory at e4000000 (64-bit, prefetchable) [size=8M]
    Region 4: I/O ports at 6000 [size=256]
    Region 5: Memory at e4800000 (32-bit, non-prefetchable) [size=256K]
    Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
    Capabilities: [50] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
        Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Express (v2) Root Complex Integrated Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0
            ExtTag+ RBE+ FLReset-
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [270 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Kernel driver in use: vfio-pci

00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SND_CARD_REDACTED HDMI/DP Audio Controller
    Subsystem: nCipher Security SND_CARD_REDACTED HDMI/DP Audio Controller
    Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin B routed to IRQ 0
    IOMMU group: 0
    Region 0: Memory at e4840000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: [50] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Express (v2) Root Complex Integrated Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0
            ExtTag+ RBE+ FLReset-
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Kernel driver in use: vfio-pci

I tried adding pci=biosirq but the same error persists.

I also found that the function for this error is in the driver - arch/x86/pci/irq.c. Not sure if it will help, but, anyways posting it here. Please let me know if it's not allowed :-
Code:
static int pirq_enable_irq(struct pci_dev *dev)
{
    u8 pin = 0;

    pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &pin);
    if (pin && !pcibios_lookup_irq(dev, 1)) {
        char *msg = "";

        if (!io_apic_assign_pci_irqs && dev->irq)
            return 0;

        if (io_apic_assign_pci_irqs) {
#ifdef CONFIG_X86_IO_APIC
            struct pci_dev *temp_dev;
            int irq;

            if (dev->irq_managed && dev->irq > 0)
                return 0;

            irq = IO_APIC_get_PCI_irq_vector(dev->bus->number,
                        PCI_SLOT(dev->devfn), pin - 1);
            /*
             * Busses behind bridges are typically not listed in the MP-table.
             * In this case we have to look up the IRQ based on the parent bus,
             * parent slot, and pin number. The SMP code detects such bridged
             * busses itself so we should get into this branch reliably.
             */
            temp_dev = dev;
            while (irq < 0 && dev->bus->parent) { /* go back to the bridge */
                struct pci_dev *bridge = dev->bus->self;

                pin = pci_swizzle_interrupt_pin(dev, pin);
                irq = IO_APIC_get_PCI_irq_vector(bridge->bus->number,
                        PCI_SLOT(bridge->devfn),
                        pin - 1);
                if (irq >= 0)
                    dev_warn(&dev->dev, "using bridge %s "
                         "INT %c to get IRQ %d\n",
                         pci_name(bridge), 'A' + pin - 1,
                         irq);
                dev = bridge;
            }
            dev = temp_dev;
            if (irq >= 0) {
                dev->irq_managed = 1;
                dev->irq = irq;
                dev_info(&dev->dev, "PCI->APIC IRQ transform: "
                     "INT %c -> IRQ %d\n", 'A' + pin - 1, irq);
                return 0;
            } else
                msg = "; probably buggy MP table";
#endif
        } else if (pci_probe & PCI_BIOS_IRQ_SCAN)
            msg = "";
        else
            msg = "; please try using pci=biosirq";

        /*
         * With IDE legacy devices the IRQ lookup failure is not
         * a problem..
         */
        if (dev->class >> 8 == PCI_CLASS_STORAGE_IDE &&
                !(dev->class & 0x5))
            return 0;

        dev_warn(&dev->dev, "can't find IRQ for PCI INT %c%s\n",
             'A' + pin - 1, msg);
    }
    return 0;
}
 
it seems the devices don't get the correct interrupts (both show irq 0)... but i'm afraid i don't know how to fix that.. it appears that this platform is not really suited to do passthrough at all..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!