Problems with GPU Passthrough since 8.2

I was really happy because nvidia announced that GRID v16.9 is compatible with 6.8 kernel so i updated to be able to use my P4s and now i see than AMD and Supermicro broke the PCI passthrough of my T1000 on my M11SDV-8C+-LN4F on that kernel.... Instead of flashing old BIOS or dealing with mobo firmware, i recommend to just pin the latest 6.5 kernel, which is a lot more stable and doesn't have that buggy PCI passthrough thing.

proxmox-boot-tool kernel pin 6.5.13-6-pve
 
I have a Dell T340 running PVE 8.3.3, Kernel 6.8.12-8, a INTEL CPU, and a H330 raid card set to IT mode. When I try to pass the H330 I am getting the vfio-pci 0000:01:00.0: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor. error. When I run dmesg | grep -e DMAR -e IOMMU -e AMD-Vi it shows that there is a firmware bug with RMRR on the device that is having issues. What I don't know is if the bug is in the H330 card firmware or the motherboard/kernel firmware. When I get a list of the iommu groups it looks like everything connected directly to the cpu is coming in, in group 1. When I try to make a mapped device in PVE tells me the "A selected device is not in a separated IOMMU group, Make sure this is intended". I have a HBA330 on order but I have a feeling this won't make a difference in the end since this thread make it sound like it is a issue with the BIOS... I have tried the relax_rmrr but that has not helped. I'm not sure that I am brave enough the build my own kernel Does anyone have any ideas on something else I can try?

Code:
dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[    0.005935] ACPI: DMAR 0x000000006FFD2000 000090 (v01 DELL   PE_SC3   00000002 DELL 00000001)
[    0.005950] ACPI: Reserving DMAR table memory at [mem 0x6ffd2000-0x6ffd208f]
[    0.169706] DMAR: IOMMU enabled
[    0.169706] DMAR: Intel-IOMMU: assuming all RMRRs are relaxable. This can lead to instability or data loss
[    0.424046] DMAR: Host address width 39
[    0.424047] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.424053] DMAR: dmar0: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.424057] DMAR: RMRR base: 0x000000507f6000 end: 0x000000587fdfff
[    0.424061] DMAR: RMRR base: 0x0000006b6ca000 end: 0x0000006b6e9fff
[    0.424064] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 0
[    0.424066] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.424067] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.427218] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.641430] DMAR: [Firmware Bug]: RMRR entry for device 02:00.0 is broken - applying workaround
[    0.641434] DMAR: No ATSR found
[    0.641435] DMAR: No SATC found
[    0.641436] DMAR: dmar0: Using Queued invalidation
[    0.641717] DMAR: Intel(R) Virtualization Technology for Directed I/O


Code:
for d in $(find /sys/kernel/iommu_groups/ -type l | sort -n -k5 -t/); do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf 'IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
done;
IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:3e31] (rev 0d)
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 0d)
IOMMU Group 1 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) [8086:1905] (rev 0d)
IOMMU Group 1 01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)
IOMMU Group 1 01:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)
IOMMU Group 1 02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] [1000:005f] (rev 02)
IOMMU Group 2 00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
IOMMU Group 3 00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Lake PCH Thermal Controller [8086:a379] (rev 10)
IOMMU Group 4 00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
IOMMU Group 4 00:14.2 RAM memory [0500]: Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f] (rev 10)
IOMMU Group 5 00:16.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH HECI Controller [8086:a360] (rev 10)
IOMMU Group 5 00:16.4 Communication controller [0780]: Intel Corporation Cannon Lake PCH HECI Controller #2 [8086:a364] (rev 10)
IOMMU Group 6 00:17.0 SATA controller [0106]: Intel Corporation Cannon Lake PCH SATA AHCI Controller [8086:a352] (rev 10)
IOMMU Group 7 00:1c.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 [8086:a338] (rev f0)
IOMMU Group 8 00:1c.1 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #2 [8086:a339] (rev f0)
IOMMU Group 9 00:1e.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller [8086:a328] (rev 10)
IOMMU Group 10 00:1f.0 ISA bridge [0601]: Intel Corporation Cannon Point-LP LPC Controller [8086:a309] (rev 10)
IOMMU Group 10 00:1f.4 SMBus [0c05]: Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323] (rev 10)
IOMMU Group 10 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller [8086:a324] (rev 10)
IOMMU Group 11 03:00.0 PCI bridge [0604]: PLDA PCI Express Bridge [1556:be00] (rev 02)
IOMMU Group 11 04:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller [102b:0536] (rev 04)
IOMMU Group 12 05:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
IOMMU Group 12 05:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
 
Last edited: