PVE 5.0 beta 2, Mellanox Connectx-3, and SR-IOV

arjones85

New Member
Jun 9, 2017
2
0
1
54
Hi all,

I have gotten as far as getting the iommu groups populated in /sys/kernel/iommu_groups, and 4 virtual cards showing up in lspci, with the cards shown as using the mlx4_core kernel module:

root@vmprox01:~# lspci | grep Mellanox
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
08:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core


root@vmprox01:~# find /sys/kernel/iommu_groups/ -type l | wc -l
97

root@vmprox01:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.10.11-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt​



I then configured my test VM to tell it to passthrough the ID of the first virtual card:

root@vmprox01:/etc/pve/qemu-server# cat 100.conf
bootdisk: scsi0
cores: 2
ide2: local:iso/CentOS-7-x86_64-Everything.iso,media=cdrom
memory: 4096
name: TestVM
net0: virtio=1A:41:1C:CE:A2:89,bridge=vmbr1
net1: virtio=FE:8C:59:9E:A2:D1,bridge=vmbr0
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,size=250G
scsihw: virtio-scsi-pci
smbios1: uuid=718070d7-ea5b-49a0-8f33-c93b958d41ce
sockets: 1
hostpci0: 08:00.1​



However when starting the VM it ends in an error:

Task viewer: VM 100 - Start

OutputStatus

Stop
TASK ERROR: can't reset pci device '08:00.1'

And in addition, all of the virtual cards no longer show in lspci, and the kernel module for the physical card has changed from mlx4_core to vfio-pci:


08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
Kernel driver in use: vfio-pci
Kernel modules: mlx4_core


From dmesg log when I start the VM:
[ 394.061120] mlx4_core 0000:08:00.0: Disabling SR-IOV
[ 394.080959] iommu: Removing device 0000:08:00.1 from group 19
[ 394.100943] iommu: Removing device 0000:08:00.2 from group 19
[ 394.120944] iommu: Removing device 0000:08:00.3 from group 19
[ 394.144945] iommu: Removing device 0000:08:00.4 from group 19



I'm not sure what the next step is here, as I am not sure what is going wrong in the above chain of events.

Any and all advice is welcome, thank you!
 
Last edited:
One thing that's concerning is that is looks like all of the virtual functions are in the same IOMMU group as the card (group 19 from your dmesg snippet). If everything is working right, they should be in different IOMMU groups. Can you print the output of:
Code:
find /sys/kernel/iommu_groups/ -type l | grep 08
This is assuming that your card is still at 08:00, change the number as appropriate

Here's what I get with mine (ConnectX-3 card at 03:00):
Code:
/sys/kernel/iommu_groups/21/devices/0000:03:00.2
/sys/kernel/iommu_groups/14/devices/0000:03:00.0
/sys/kernel/iommu_groups/22/devices/0000:03:00.3
/sys/kernel/iommu_groups/20/devices/0000:03:00.1

Each is in a different group, and so they can all be assigned independently. I was able to pass through a VF to a VM running Ubuntu 17.04 and successfully accessed it in the VM (ping+iperf to host machine address and another machine on the switch).

My system is a xeon e3-1240v5 in a Supermicro X11 motherboard, with the card plugged into a PCIE slot served by the chipset PCIE lanes (not the CPU PCIE lanes, that's important on the e3's and Core-Ix). What kind of hardware (CPU/motherboard) are you using and where is your card plugged in? If you're using a Xeon E5, it should really just work. If you're using anything else, care is required.
 
Looks like they are in the same iommu group:


root@vmprox01:~# find /sys/kernel/iommu_groups/ -type l | grep 08
/sys/kernel/iommu_groups/20/devices/0000:07:08.0
/sys/kernel/iommu_groups/19/devices/0000:08:00.3
/sys/kernel/iommu_groups/19/devices/0000:08:00.1
/sys/kernel/iommu_groups/19/devices/0000:08:00.4
/sys/kernel/iommu_groups/19/devices/0000:08:00.2
/sys/kernel/iommu_groups/19/devices/0000:08:00.0


This is a Supermicro X8DTU-6+, and yes it's a Xeon E5620. It's plugged into the only spot the board offers, a PCIe slot on a riser card.

I am unsure how to proceed to correct the above group assignments.
 
Looks like they are in the same iommu group:
root@vmprox01:~# find /sys/kernel/iommu_groups/ -type l | grep 08
/sys/kernel/iommu_groups/20/devices/0000:07:08.0
/sys/kernel/iommu_groups/19/devices/0000:08:00.3
/sys/kernel/iommu_groups/19/devices/0000:08:00.1
/sys/kernel/iommu_groups/19/devices/0000:08:00.4
/sys/kernel/iommu_groups/19/devices/0000:08:00.2
/sys/kernel/iommu_groups/19/devices/0000:08:00.0


This is a Supermicro X8DTU-6+, and yes it's a Xeon E5620. It's plugged into the only spot the board offers, a PCIe slot on a riser card.

I am unsure how to proceed to correct the above group assignments.

Unfortunately, I'm not sure that you can change the group assignments. Your CPU or chipset may not support the 'access control services' that are necessary for virtual functions and pci passthrough. It's different from vt-x and vt-d. I know the wiki says 'all Xeons', but your Xeon E5620 may be too old.
Although the wiki page links to it as additional reading, the vfio blogspot pages should be required reading for doing PCI passthrough or VF.
http://vfio.blogspot.com/2015/10/intel-processors-with-acs-support.html

Note that there might be another way for you. There's a kernel patch that basically overrides the need for ACS (but use at your own risk). Unclear if it ever got integrated into the kernel. It's mentioned here:
http://vfio.blogspot.com/2014/08/vfiovga-faq.html
As I understand it, the patch allows you to pass through functions in the same IOMMU group (or it artificially splits the IOMMU groups), but note that this weakens the isolation between VM's. In theory, the VM's can see each other's IO traffic on the affected device because they're not truly isolated.
Good luck.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!