ConnectX-3 works fine with mlx4 driver before PVE7.2 version. Below is detail information:
Configuring VM to use the VF, It causes kernel NULL pointer dereference. And have to restart the PVE.
After some googling, I found this should be a kernel mlx module bug.
update:
same in new kernel 5.15.35-3-pve.
06:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
06:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
06:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
06:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
06:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
06:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
06:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
06:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
06:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
#lsmod |grep mlx
mlx4_ib 241664 0
mlx4_en 147456 0
ib_uverbs 163840 1 mlx4_ib
ib_core 380928 6 rdma_cm,mlx4_ib,iw_cm,ib_iser,ib_uverbs,ib_cm
mlx4_core 389120 2 mlx4_ib,mlx4_en
Configuring VM to use the VF, It causes kernel NULL pointer dereference. And have to restart the PVE.
[ 7497.588084] vfio-pci 0000:06:00.7: enabling device (0000 -> 0002)
[ 7499.962753] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 7499.963388] #PF: supervisor read access in kernel mode
[ 7499.963972] #PF: error_code(0x0000) - not-present page
[ 7499.964539] PGD 0 P4D 0
[ 7499.965109] Oops[B]: 0000 [#1] SMP NOPTI[/B]
[ 7499.965681] CPU[B]: 0 PID: 22178 Comm: kvm Tainted: P O 5.15.35-2-pve #1
...[/B]
After some googling, I found this should be a kernel mlx module bug.
update:
same in new kernel 5.15.35-3-pve.
Linux pve3 5.15.35-3-pve #1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) x86_64 GNU/Linux
Attachments
Last edited: