Hi all,
I have gotten as far as getting the iommu groups populated in /sys/kernel/iommu_groups, and 4 virtual cards showing up in lspci, with the cards shown as using the mlx4_core kernel module:
I then configured my test VM to tell it to passthrough the ID of the first virtual card:
However when starting the VM it ends in an error:
I'm not sure what the next step is here, as I am not sure what is going wrong in the above chain of events.
Any and all advice is welcome, thank you!
I have gotten as far as getting the iommu groups populated in /sys/kernel/iommu_groups, and 4 virtual cards showing up in lspci, with the cards shown as using the mlx4_core kernel module:
root@vmprox01:~# lspci | grep Mellanox
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
08:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
root@vmprox01:~# find /sys/kernel/iommu_groups/ -type l | wc -l
97
root@vmprox01:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.10.11-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
08:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
root@vmprox01:~# find /sys/kernel/iommu_groups/ -type l | wc -l
97
root@vmprox01:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.10.11-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt
I then configured my test VM to tell it to passthrough the ID of the first virtual card:
root@vmprox01:/etc/pve/qemu-server# cat 100.conf
bootdisk: scsi0
cores: 2
ide2: local:iso/CentOS-7-x86_64-Everything.iso,media=cdrom
memory: 4096
name: TestVM
net0: virtio=1A:41:1C:CE:A2:89,bridge=vmbr1
net1: virtio=FE:8C:59:9E:A21,bridge=vmbr0
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,size=250G
scsihw: virtio-scsi-pci
smbios1: uuid=718070d7-ea5b-49a0-8f33-c93b958d41ce
sockets: 1
hostpci0: 08:00.1
bootdisk: scsi0
cores: 2
ide2: local:iso/CentOS-7-x86_64-Everything.iso,media=cdrom
memory: 4096
name: TestVM
net0: virtio=1A:41:1C:CE:A2:89,bridge=vmbr1
net1: virtio=FE:8C:59:9E:A21,bridge=vmbr0
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,size=250G
scsihw: virtio-scsi-pci
smbios1: uuid=718070d7-ea5b-49a0-8f33-c93b958d41ce
sockets: 1
hostpci0: 08:00.1
However when starting the VM it ends in an error:
Task viewer: VM 100 - Start
OutputStatus
Stop
TASK ERROR: can't reset pci device '08:00.1'
And in addition, all of the virtual cards no longer show in lspci, and the kernel module for the physical card has changed from mlx4_core to vfio-pci:OutputStatus
Stop
TASK ERROR: can't reset pci device '08:00.1'
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
Kernel driver in use: vfio-pci
Kernel modules: mlx4_core
From dmesg log when I start the VM:
[ 394.061120] mlx4_core 0000:08:00.0: Disabling SR-IOV
[ 394.080959] iommu: Removing device 0000:08:00.1 from group 19
[ 394.100943] iommu: Removing device 0000:08:00.2 from group 19
[ 394.120944] iommu: Removing device 0000:08:00.3 from group 19
[ 394.144945] iommu: Removing device 0000:08:00.4 from group 19
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
Kernel driver in use: vfio-pci
Kernel modules: mlx4_core
From dmesg log when I start the VM:
[ 394.061120] mlx4_core 0000:08:00.0: Disabling SR-IOV
[ 394.080959] iommu: Removing device 0000:08:00.1 from group 19
[ 394.100943] iommu: Removing device 0000:08:00.2 from group 19
[ 394.120944] iommu: Removing device 0000:08:00.3 from group 19
[ 394.144945] iommu: Removing device 0000:08:00.4 from group 19
I'm not sure what the next step is here, as I am not sure what is going wrong in the above chain of events.
Any and all advice is welcome, thank you!
Last edited: