[SOLVED] Mellanox MT27500 [ConnectX-3] SR-IOV causes kernel panic in PVE7.2

recoco

Member
Dec 4, 2021
12
2
8
46
ConnectX-3 works fine with mlx4 driver before PVE7.2 version. Below is detail information:
06:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] 06:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

#lsmod |grep mlx mlx4_ib 241664 0 mlx4_en 147456 0 ib_uverbs 163840 1 mlx4_ib ib_core 380928 6 rdma_cm,mlx4_ib,iw_cm,ib_iser,ib_uverbs,ib_cm mlx4_core 389120 2 mlx4_ib,mlx4_en

Configuring VM to use the VF, It causes kernel NULL pointer dereference. And have to restart the PVE.
[ 7497.588084] vfio-pci 0000:06:00.7: enabling device (0000 -> 0002) [ 7499.962753] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 7499.963388] #PF: supervisor read access in kernel mode [ 7499.963972] #PF: error_code(0x0000) - not-present page [ 7499.964539] PGD 0 P4D 0 [ 7499.965109] Oops[B]: 0000 [#1] SMP NOPTI[/B] [ 7499.965681] CPU[B]: 0 PID: 22178 Comm: kvm Tainted: P O 5.15.35-2-pve #1 ...[/B]

After some googling, I found this should be a kernel mlx module bug.

update:
same in new kernel 5.15.35-3-pve.
Linux pve3 5.15.35-3-pve #1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) x86_64 GNU/Linux
 

Attachments

Last edited:
Could you try booting the latest 5.13 kernel to see if it works?
If it does work, you can pin it for the time being [0].


[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot (3.12.7)
The same NIC card works fine in 5.11, 5.13 kernel.

BTW, https://forums.developer.nvidia.com...-connectx-3-pro-for-debian-11-bullseye/213505 , NVIDIA support staff said that the upcoming MLNX_OFED 4.9 support will have support for Debian 11.2. It is targeted in June. I also noticed that new MLNX_OFED had been released on June 30 yesterday.
There is a bug in open source mlx driver it causes code 43 error when using Mallenox connect-3/pro in MS Windows guest OS in PVE. The bug only exists in Windows with VF driver. When using PF in Windows, it works fine. Or using VF in Linux guest OS, it also works fine.
This bug can be resolved using Mallenox OFED driver package. But Mallenox OFED does not support Debian 11. A good news is that MLNX_OFED 4.9 released on June 30 will have support for Debian11.
 
Last edited:
Actually, what you want is support for Ubuntu 22.04. The kernel used in PVE 7.2 is based on the one from Ubuntu 22.04.
It contains a few additional patches on top of the Ubuntu one.

So any out-of-tree driver has to be compatible with that, rather than the Debian 5.10 kernel.

If Ubuntu 22.04 is not yet supported, you could always just pin the 5.13 kernel for now, until the issues are resolved upstream or by the out-of-tree driver.
 
Got. It's regret that NVIDIA OFED still don't support Ubuntu 22.04. Now it supports Ubuntu 20.04.
 
The problem occurs in Dell xps8940 PC which is built form my homelab server. The same PVE and same kernel and same ConnectX-3 card works fine in my another server which is HP Z230 workstation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!