Mellanox 4 LX - PCIe

As you know passthrough can be finicky & quite a bit BIOS orientated. What version BIOS have you got on that board (use dmidecode | less ).
It appears the latest BIOS is version 7C77v1D from 2023-08-30, as per this site.
 
Here it is:

Code:
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.
Table at 0x73A5F000.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 1.D0
        Release Date: 08/23/2023
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 32 MB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                5.25"/1.2 MB floppy services are supported (int 13h)
                3.5"/720 kB floppy services are supported (int 13h)
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                USB legacy is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 5.17

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: Micro-Star International Co., Ltd.
        Product Name: MS-7C77
        Version: 1.0
        Serial Number: Default string
        UUID: 8847732e-cfae-4219-ab80-2cf05d7bf62d
        Wake-up Type: Power Switch
        SKU Number: Default string
        Family: Default string

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
        Manufacturer: Micro-Star International Co., Ltd.
        Product Name: MEG Z490I UNIFY (MS-7C77)
        Version: 1.0
        Serial Number: K717347478
        Asset Tag: Default string
        Features:
                Board is a hosting board
                Board is replaceable
        Location In Chassis: Default string
        Chassis Handle: 0x0003
        Type: Motherboard
        Contained Object Handles: 0
Handle 0x0003, DMI type 3, 22 bytes
Chassis Information
        Manufacturer: Micro-Star International Co., Ltd.
        Type: Desktop
        Lock: Not Present
        Version: 1.0
        Serial Number: Default string
        Asset Tag: Default string
        Boot-up State: Safe
        Power Supply State: Safe
        Thermal State: Safe
        Security Status: None
        OEM Information: 0x00000000
        Height: Unspecified
        Number Of Power Cords: 1
        Contained Elements: 0
        SKU Number: Default string
 
I found the same issue registered in Oracle OCI. The solution they stated, is move from VFIO to Virtualized and wait till new firmware is rolled out to hypervisors.

Wondering what they mean by firmware and how I could compare that with Proxmox....


1721647413974.png
 
x86-64-v2-AES, checked. Same issue.
vIOMMU, tried with Intel and Virtio, and same issue.
q35 to be honest, I tried only with Version 8 and latest, and same result.

I understand the card supports both Physical and Virtual Functions, like virtual switch...wondering if I'm missing something with the VFs and that is screwing the passthrough to the VM.
 
While reviewing your problem - I noticed something interesting in the Datasheet of your motherboard. They describe the (single) PCI-E slot as a "Graphics Interface". This leads me to think that possibly you are suffering from what is described in the PVE wiki as:
Some motherboards can't pass through GPUs on the first PCI(e) slot by default, because its vBIOS is shadowed during boot up. You need to capture its vBIOS when it is working "normally" (i.e. installed in a different slot), then you can move the card to slot 1 and start the vm using the dumped vBIOS.

The only way we would have of testing my theory would be to insert a different PCI device (maybe you have another spare graphics card lying around?) & put it in the PCI-E slot & see if you can pass that through to a VM.

Maybe another test; I understand that you tried the above with a Debian VM. Did it boot in the end? I understand that you were unable to use the ethernet card in that VM, but the question is does it show up at all in the VM with lspci ?
 
While reviewing your problem - I noticed something interesting in the Datasheet of your motherboard. They describe the (single) PCI-E slot as a "Graphics Interface". This leads me to think that possibly you are suffering from what is described in the PVE wiki as:


The only way we would have of testing my theory would be to insert a different PCI device (maybe you have another spare graphics card lying around?) & put it in the PCI-E slot & see if you can pass that through to a VM.

Maybe another test; I understand that you tried the above with a Debian VM. Did it boot in the end? I understand that you were unable to use the ethernet card in that VM, but the question is does it show up at all in the VM with lspci ?
that's a good point. unfortunately I don't have any other PCIe device I could use for testing it.

What I've tried about Debian is to install it, and it hangs forever in "Detecting Network Devices" during the installation. Later today I will try again, but without the PCIe attached during the installation, and I will add it once it's installed, to see more specific why it can't be detected by Debian.
 
I see on the wiki:

Code:
For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM.

I can see that each port of the NIC is in a different IOMMU group, but it is not being shared with anything else. Would be that a problem?

1721657969490.png
 
Code:
~# dmesg | grep 'remapping'
[    0.137489] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.139720] DMAR-IR: Enabled IRQ remapping in x2apic mode
 
that's a good point. unfortunately I don't have any other PCIe device I could use for testing it.

What I've tried about Debian is to install it, and it hangs forever in "Detecting Network Devices" during the installation. Later today I will try again, but without the PCIe attached during the installation, and I will add it once it's installed, to see more specific why it can't be detected by Debian.
My card runs quite well in a x4 also. It’s worth a try, I would say. Also propose disable sr-vio or activate virtual nice and use those instead. Also remove nic from any VMbR and reboot without it in a Vmbr before passthrough.
 
Would be that a problem?
I believe not. The important point is that the device you try to passthrough must not share an IMMOU group ID with another device.

I have one last thing you could try. Try changing the BIOS type for the VM from OVMF to SeaBIOS. You may have problems with the specific OS you are trying (IDK) but with a regular Debian VM there is no problem. I remember seeing passthrough issues being VM BIOS dependent.
 
In another season of this show, I followed these instructions and I enabled one VF per port, and I can passtrhough them without any issue.

https://pve.proxmox.com/wiki/PCI(e)_Passthrough#_sr_iov

# echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
# echo 1 > /sys/bus/pci/devices/0000:01:00.1/sriov_numvfs

I still need to figure it out how to assign permantly a MAC address, beacause my ISP doesn't like me changing the MAC on every reboot :) and if I can make it, probably I will keep it like that. My hopes are gone to be able to passthrough the physical ports to the VM.
 
Great that works. I think SR-Iov is fantastic with VFs.

But if you prefer without:
Did you try disabling SR-IOV on NIC firmware, reboot, then passing through?
 
Great that works. I think SR-Iov is fantastic with VFs.

But if you prefer without:
Did you try disabling SR-IOV on NIC firmware, reboot, then passing through?
I gave up. Yesterday I went bed like 4am.I disabled SRIOV from BIOS and tried, but same error… and I was happy with at least be able to use a VF, but netmap just made it impossible. either with emulated or native drivers, zenarmor blocked everything, so i went back again to a soft bridge. Im very frustrated.
 
Hi, yes for sure.
Disabling SR-IOV VF I meant on the NIC with Nvidia MST tools.

Not the Motherboard Bios.
 
Code:
~# mlxconfig -d /dev/mst/mt4117_pciconf0 s SRIOV_EN=FALSE

Device #1:
----------

Device type:        ConnectX4LX
Name:               MCX4121A-ACA_Ax
Description:        ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:             /dev/mst/mt4117_pciconf0

Configurations:                                          Next Boot       New
        SRIOV_EN                                    True(1)              False(0)

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.


~# mlxconfig -d /dev/mst/mt4117_pciconf0 s NUM_OF_VFS=0

Device #1:
----------

Device type:        ConnectX4LX
Name:               MCX4121A-ACA_Ax
Description:        ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:             /dev/mst/mt4117_pciconf0

Configurations:                                          Next Boot       New
        NUM_OF_VFS                                  8                    0

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

Rebooted, tried again, but same error on booting:
Code:
mlx5_core0: WARN: wait_fw_init:733:(pid 0): Waiting for FW initialization, timeout abort in 100 s
 
btw, I've also reinstalled proxmox from scratch. I lost all my VMs and I've started from 0, but the issue remains.
 
Possibly to reboot actual nic, maybe you must actually power down the power going to the nic, so that the nic itself restarts. So maybe a plug revival is required. IDK, but maybe check the number of VFs showing after a restart.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!