[SOLVED] Mellanox ConnectX-3 VFs stuck in same SR-IOV group

Mr.Goodcat

Member
Feb 8, 2020
17
3
23
38
Hi,

I'm trying to get SR-IOV working on Proxmox (followed the guide for enabling SR-IOV: PCI(e) Passthrough - Proxmox VE), but always end up with all virtual functions (configured directly in the firmware of the ConnectX-3 card AND in /etc/modprobe.d/) stuck in the same IOMMU group. Switching PCIe slots didn't help either.

Code:
dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[    3.460271] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[    3.460326] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[    3.460361] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    3.460384] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    3.555920] pci 0000:c0:00.2: AMD-Vi: Found IOMMU cap 0x40
[    3.555922] pci 0000:c0:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[    3.555927] pci 0000:80:00.2: AMD-Vi: Found IOMMU cap 0x40
[    3.555929] pci 0000:80:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[    3.555933] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[    3.555934] pci 0000:40:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[    3.555938] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    3.555939] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[    3.555943] AMD-Vi: Interrupt remapping enabled
[    3.555944] AMD-Vi: Virtual APIC enabled
[    3.555945] AMD-Vi: X2APIC enabled
[    3.556605] AMD-Vi: Lazy IO/TLB flushing enabled
[    3.561685] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    3.561765] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[    3.561850] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[    3.561932] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).

Code:
mlxconfig -d /dev/mst/mt4099_pci_cr0 q

Device #1:
----------

Device type:    ConnectX3       
Device:         /dev/mst/mt4099_pci_cr0

Configurations:                              Next Boot
         SRIOV_EN                            True(1)         
         NUM_OF_VFS                          24             
         LINK_TYPE_P1                        ETH(2)         
         LINK_TYPE_P2                        ETH(2)         
         LOG_BAR_SIZE                        3               
         BOOT_PKEY_P1                        0               
         BOOT_PKEY_P2                        0               
         BOOT_OPTION_ROM_EN_P1               False(0)       
         BOOT_VLAN_EN_P1                     False(0)       
         BOOT_RETRY_CNT_P1                   0               
         LEGACY_BOOT_PROTOCOL_P1             None(0)         
         BOOT_VLAN_P1                        1               
         BOOT_OPTION_ROM_EN_P2               False(0)       
         BOOT_VLAN_EN_P2                     False(0)       
         BOOT_RETRY_CNT_P2                   0               
         LEGACY_BOOT_PROTOCOL_P2             None(0)         
         BOOT_VLAN_P2                        1               
         IP_VER_P1                           IPv4(0)         
         IP_VER_P2                           IPv4(0)         
         CQ_TIMESTAMP                        True(1)

Code:
41:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
        Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:0050]
        Flags: bus master, fast devsel, latency 0, IRQ 122, NUMA node 0
        Memory at b0400000 (64-bit, non-prefetchable) [size=1M]
        Memory at 2807f800000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at b0300000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [148] Device Serial Number 00-02-c9-03-00-40-c8-f0
        Capabilities: [154] Advanced Error Reporting
        Capabilities: [18c] #19
        Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_core

41:00.1 Ethernet controller [0200]: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:1004]
        Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:61b0]
        Flags: fast devsel, NUMA node 0
        [virtual] Memory at 28073800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [9c] MSI-X: Enable- Count=36 Masked-
        Capabilities: [40] Power Management version 0
        Kernel modules: mlx4_core

Trying to start a VM with one of the VFs attached fails, resulting in these entries in dmesg:
Code:
mlx4_en 0000:41:00.0: removed PHC
mlx4_core 0000:41:00.0: Disabling SR-IOV
pci 0000:41:00.1: Removing from iommu group 48
pci 0000:41:00.2: Removing from iommu group 48
pci 0000:41:00.3: Removing from iommu group 48
pci 0000:41:00.4: Removing from iommu group 48
pci 0000:41:00.5: Removing from iommu group 48
pci 0000:41:00.6: Removing from iommu group 48
pci 0000:41:00.7: Removing from iommu group 48
pci 0000:41:01.0: Removing from iommu group 48
pci 0000:41:01.1: Removing from iommu group 48
pci 0000:41:01.2: Removing from iommu group 48
pci 0000:41:01.3: Removing from iommu group 48
pci 0000:41:01.4: Removing from iommu group 48
pci 0000:41:01.5: Removing from iommu group 48
pci 0000:41:01.6: Removing from iommu group 48
pci 0000:41:01.7: Removing from iommu group 48
pci 0000:41:02.0: Removing from iommu group 48
pci 0000:41:02.1: Removing from iommu group 48
pci 0000:41:02.2: Removing from iommu group 48
pci 0000:41:02.3: Removing from iommu group 48
pci 0000:41:02.4: Removing from iommu group 48
pci 0000:41:02.5: Removing from iommu group 48
pci 0000:41:02.6: Removing from iommu group 48
pci 0000:41:02.7: Removing from iommu group 48
pci 0000:41:03.0: Removing from iommu group 48

Overall it's quite similar to: https://forum.proxmox.com/threads/pve-5-0-beta-2-mellanox-connectx-3-and-sr-iov.35103/

My setup
  • Proxmox 6.1-5 (Kernel: 5.3.13-2)
  • Supermicro H11SSL
  • AMD Epyc 7502
  • Mellanox ConnectX-3
Help would be greatly appreciated =D
 
After A LOT of experimenting, this seems to come down to the CPU/microcode, as it's an eng. sample. This thread can be closed!
 
Thanks for coming back with the solution - this will potentially help other!

You can always mark your threads as 'SOLVED' (just edit the thread (top menu) and set the prefix to 'SOLVED')
I'll go ahead and mark this one as solved.
 
Hi, I am on a Supermicro H12 motherboard with a AMD EPYC 7262, I'm encountering this same issue. I'm not on an engineering sample, can you be more specific regarding the microcode?
 
Hi, I am on a Supermicro H12 motherboard with a AMD EPYC 7262, I'm encountering this same issue. I'm not on an engineering sample, can you be more specific regarding the microcode?
Hi, I sold the CPU and don't have any further specifics. The issue disappeared after switching to a retail 7452. You should try updating the bios just to be sure.
 
Hi, I am on a Supermicro H12 motherboard with a AMD EPYC 7262, I'm encountering this same issue. I'm not on an engineering sample, can you be more specific regarding the microcode?
hi,there is a same issue on H12SSl-i motherboard with Epyc 7302 cpu and mcx5 nic's vfs. referencing several case, maybe the reason is acs supoort. H12ssl-i 's user manual marks the acs enable option(bois-> advanced->NB configuration->acs enable),but i didn't find it. maybe
epyc 7302 cpu not supoort acs. so,do your h12-ssl mothboard bios contain this option?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!