Z490i and iommu (will it hurt badly?)

zztoper

New Member
Mar 16, 2022
8
0
1
69
Hello guys,

i wanted to ask You how badly am i fucked with my setup.
Trying to passtrough Mellanox ConnectX-3

Code:
Group 2:
[8086:1901]        00:01.0  PCI bridge                                       6th-10th Gen Core Processor PCIe Controller (x16)
[15b3:1003] [R]  01:00.0  Ethernet controller                      MT27500 Family [ConnectX-3]

pve 7.4
kernel 6.1.15
latest bios, VT-d enabled (no IOMMU option in bios)
acs enabled with multifunction (that makes no difference in iommu)
 
I'm not sure what you are asking exactly. If that's the actual and whole IOMMU group, then you can passthrough 01:00.0 (as PCI bridges don't count). However, if you used pcie_acs_override then it impossible to predict. What is the output of cat /proc/cmdline; for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done?
 
There must be an issue with ConectX-3, quad intel 1gb lan card is working with those settings. With ConnectX-3 i get IRQ errors
 
Code:
kvm: -device vfio-pci,host=0000:01:00.0,id=hostpci0,bus=pci.0,addr=0x10: vfio 0000:01:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy
TASK ERROR: start failed: QEMU exited with code 1

and dmesg output on error
Code:
genirq: Flags mismatch irq 16. 00000000 (vfio-intx(0000:01:00.0)) vs. 00000080 (i801_smbus)

my iommu groups look like this

Code:
Group 0:        [8086:9bc8] [R] 00:02.0  VGA compatible controller                CometLake-S GT2 [UHD Graphics 630]
Group 1:        [8086:9b63]     00:00.0  Host bridge                              10th Gen Core Processor Host Bridge/DRAM Registers
Group 2:        [8086:1901]     00:01.0  PCI bridge                               6th-10th Gen Core Processor PCIe Controller (x16)
                [15b3:1003] [R] 01:00.0  Ethernet controller                      MT27500 Family [ConnectX-3]
Group 3:        [8086:06f9]     00:12.0  Signal processing controller             Comet Lake PCH Thermal Controller
Group 4:        [8086:06ed]     00:14.0  USB controller                           Comet Lake USB 3.1 xHCI Host Controller
USB:            [413c:2107]              Bus 001 Device 003                       Dell Computer Corp. KB212-B Quiet Key Keyboard
USB:            [05e3:0608]              Bus 001 Device 002                       Genesys Logic, Inc. Hub
USB:            [8087:0026]              Bus 001 Device 005                       Intel Corp.
USB:            [048d:5702]              Bus 001 Device 004                       Integrated Technology Express, Inc. ITE Device
USB:            [1d6b:0002]              Bus 001 Device 001                       Linux Foundation 2.0 root hub
USB:            [1d6b:0003]              Bus 002 Device 001                       Linux Foundation 3.0 root hub
                [8086:06ef]     00:14.2  RAM memory                               Comet Lake PCH Shared SRAM
Group 5:        [8086:06f0] [R] 00:14.3  Network controller                       Comet Lake PCH CNVi WiFi
Group 6:        [8086:06e0]     00:16.0  Communication controller                 Comet Lake HECI Controller
Group 7:        [8086:06d2]     00:17.0  SATA controller                          Device 06d2
Group 8:        [8086:06c0] [R] 00:1b.0  PCI bridge                               Comet Lake PCI Express Root Port #17
Group 9:        [8086:06ac] [R] 00:1b.4  PCI bridge                               Comet Lake PCI Express Root Port #21
Group 10:       [8086:06b8] [R] 00:1c.0  PCI bridge                               Device 06b8
Group 11:       [8086:06bc] [R] 00:1c.4  PCI bridge                               Device 06bc
Group 12:       [8086:06b0] [R] 00:1d.0  PCI bridge                               Comet Lake PCI Express Root Port #9
Group 13:       [8086:06b4] [R] 00:1d.4  PCI bridge                               Device 06b4
Group 14:       [8086:0685]     00:1f.0  ISA bridge                               Device 0685
                [8086:06c8]     00:1f.3  Audio device                             Comet Lake PCH cAVS
                [8086:06a3]     00:1f.4  SMBus                                    Comet Lake PCH SMBus Controller
                [8086:06a4]     00:1f.5  Serial bus controller [0c80]             Comet Lake PCH SPI Controller
Group 15:       [8086:2522] [R] 02:00.0  Non-Volatile memory controller           NVMe Optane Memory Series
Group 16:       [8086:15f3] [R] 05:00.0  Ethernet controller                      Ethernet Controller I225-V
Group 17:       [144d:a809] [R] 06:00.0  Non-Volatile memory controller           Device a809
 
What is the exact error message? What is the output of the console command cat /proc/cmdline?
BOOT_IMAGE=/boot/vmlinuz-6.1.15-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on video=efifb:eek:ff vfio-pci.ids=8086:9bc8
 
BOOT_IMAGE=/boot/vmlinuz-6.1.15-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on video=efifb:eek:ff vfio-pci.ids=8086:9bc8
video=efifb:off has not worked since Promox 7.2. Use this work-around instead.
Code:
kvm: -device vfio-pci,host=0000:01:00.0,id=hostpci0,bus=pci.0,addr=0x10: vfio 0000:01:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy
TASK ERROR: start failed: QEMU exited with code 1

and dmesg output on error
Code:
genirq: Flags mismatch irq 16. 00000000 (vfio-intx(0000:01:00.0)) vs. 00000080 (i801_smbus)
I don't know how to fix IRQ issues, sorry. Maybe it's something that can be influences by your motherboard BIOS settings (or update)?
my iommu groups look like this
I don't know what I'm looking at in your reply. Maybe you can run the command I asked for? Then agian, it does not matter: you need to find out how to fix the IRQ issues.
 
video=efifb:off has not worked since Promox 7.2. Use this work-around instead.

I don't know how to fix IRQ issues, sorry. Maybe it's something that can be influences by your motherboard BIOS settings (or update)?

I don't know what I'm looking at in your reply. Maybe you can run the command I asked for? Then agian, it does not matter: you need to find out how to fix the IRQ issues.

thats the output of command You requested.

Code:
BOOT_IMAGE=/boot/vmlinuz-6.1.15-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on i915.enable_gvt=1
IOMMU group 0 00:02.0 VGA compatible controller [0300]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:9bc8] (rev 03)
IOMMU group 10 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:06b8] (rev f0)
IOMMU group 11 00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:06bc] (rev f0)
IOMMU group 12 00:1d.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #9 [8086:06b0] (rev f0)
IOMMU group 13 00:1d.4 PCI bridge [0604]: Intel Corporation Device [8086:06b4] (rev f0)
IOMMU group 14 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:0685]
IOMMU group 14 00:1f.3 Audio device [0403]: Intel Corporation Comet Lake PCH cAVS [8086:06c8]
IOMMU group 14 00:1f.4 SMBus [0c05]: Intel Corporation Comet Lake PCH SMBus Controller [8086:06a3]
IOMMU group 14 00:1f.5 Serial bus controller [0c80]: Intel Corporation Comet Lake PCH SPI Controller [8086:06a4]
IOMMU group 15 02:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe Optane Memory Series [8086:2522]
IOMMU group 16 05:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 02)
IOMMU group 17 06:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a809]
IOMMU group 18 lspci: -s: Invalid slot number
IOMMU group 1 00:00.0 Host bridge [0600]: Intel Corporation 10th Gen Core Processor Host Bridge/DRAM Registers [8086:9b63] (rev 03)
IOMMU group 2 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 03)
IOMMU group 2 01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
IOMMU group 3 00:12.0 Signal processing controller [1180]: Intel Corporation Comet Lake PCH Thermal Controller [8086:06f9]
IOMMU group 4 00:14.0 USB controller [0c03]: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller [8086:06ed]
IOMMU group 4 00:14.2 RAM memory [0500]: Intel Corporation Comet Lake PCH Shared SRAM [8086:06ef]
IOMMU group 5 00:14.3 Network controller [0280]: Intel Corporation Comet Lake PCH CNVi WiFi [8086:06f0]
IOMMU group 6 00:16.0 Communication controller [0780]: Intel Corporation Comet Lake HECI Controller [8086:06e0]
IOMMU group 7 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:06d2]
IOMMU group 8 00:1b.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #17 [8086:06c0] (rev f0)
IOMMU group 9 00:1b.4 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #21 [8086:06ac] (rev f0)

I got HP card with 2xSFP+ sitting in my truenas host.
Ill try to swap those soon to check if thats the card issue
 
Well Intel X520-DA2 works fine, so it must be pure ConnectX-3 problem in my setup
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!