foerkede

New Member
Nov 25, 2019
10
0
1
Hi,
I'm trying to get the virtual functions of my Intel I350 passed through to a router vm. I got the SR-IOV working, basically following this guide on reddit.
I also blacklisted their module so they don't get loaded by the host (tried both blacklist and adding to vfio-pci).

Everything should be ready, but when I add one VF to my vm via the GUI or qemu command, with or without pcie on, I get this error:
Code:
TASK ERROR: can't reset pci device '0000:02:10.0'

After that, the whole card seems to crash, both ports stop working and the VFs disappear from lspci until reboot.
I found a few older forum posts about this, back then it was fixed by hot-adding the pci device with
Code:
qm> device_add pci-assign,host=02:10.0
Error: 'pci-assign' is not a valid device model name
qm> device_add vfio-pci,host=02:10.0
Error: Bus 'pcie.0' does not support hotplugging

Since that doesn't work anymore I'm helpless, maybe someone has an idea what to try.

Some more info:
Error log
Code:
Use of uninitialized value $name in concatenation (.) or string at /usr/share/perl5/PVE/SysFSTools.pm line 253.
Use of uninitialized value $name in concatenation (.) or string at /usr/share/perl5/PVE/SysFSTools.pm line 253.
Use of uninitialized value $name in concatenation (.) or string at /usr/share/perl5/PVE/SysFSTools.pm line 253.
Use of uninitialized value $name in concatenation (.) or string at /usr/share/perl5/PVE/SysFSTools.pm line 253.
Use of uninitialized value $name in concatenation (.) or string at /usr/share/perl5/PVE/SysFSTools.pm line 253.
TASK ERROR: can't reset pci device '0000:02:10.0'

dmesg after error
Code:
[  269.895280] igb 0000:01:00.0: removed PHC on enp1s0f0
[  269.895387] pci 0000:02:10.0: Removing from iommu group 1
[  269.895743] pci 0000:02:10.4: Removing from iommu group 1
[  269.895846] pci 0000:02:11.0: Removing from iommu group 1
[  269.896104] pci 0000:02:11.4: Removing from iommu group 1
[  269.896143] pci 0000:02:12.0: Removing from iommu group 1
[  269.896180] pci 0000:02:12.4: Removing from iommu group 1
[  269.896239] pci 0000:02:13.0: Removing from iommu group 1
[  269.896245] pci_bus 0000:02: busn_res: [bus 02] is released
[  271.530749] igb 0000:01:00.0: IOV Disabled
[  271.620503] igb 0000:01:00.1: removed PHC on enp1s0f1

lspci
Code:
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:10.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
02:10.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
02:11.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
02:11.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
02:12.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
02:12.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
02:13.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)

VM config
Code:
bios: ovmf
bootdisk: scsi0
cores: 2
efidisk0: local-lvm:vm-100-disk-1,size=128K
hostpci0: 02:10,pcie=1
ide2: local:iso/pfSense-CE-2.4.4-RELEASE-p3-amd64.iso,media=cdrom
machine: q35
memory: 1024
name: pfSense
numa: 0
ostype: other
scsi0: local-lvm:vm-100-disk-0,size=8G
scsihw: virtio-scsi-pci
smbios1: uuid=87e8aaff-831c-4b41-9b4e-e55e82ccfe98
sockets: 1
vmgenid: 074fab00-b25e-469e-a9c7-3a157c75dd82
 
can you post your pveversion -v ?
 
can you post your pveversion -v ?
Sure. Just did an upgrage, problem stays.
Code:
pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-1-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-5.3: 6.1-4
pve-kernel-helper: 6.1-4
pve-kernel-5.3.18-1-pve: 5.3.18-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.14-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-12
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-19
pve-docs: 6.1-4
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-10
pve-firmware: 3.0-5
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
Last edited:
can you post the output of 'lspci -nnk' and your iommu groups?
 
can you post the output of 'lspci -nnk' and your iommu groups?

Code:
root@pve2:~# lspci -nnk
00:00.0 Host bridge [0600]: Intel Corporation Skylake Host Bridge/DRAM Registers [8086:190f] (rev 07)
        Subsystem: ASRock Incorporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [1849:190f]
        Kernel driver in use: skl_uncore
00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07)
        Kernel driver in use: pcieport
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 510 [8086:1902] (rev 06)
        Subsystem: ASRock Incorporation HD Graphics 510 [1849:1902]
        Kernel driver in use: i915
        Kernel modules: i915
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller [8086:a12f] (rev 31)
        Subsystem: ASRock Incorporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [1849:a12f]
        Kernel driver in use: xhci_hcd
00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-H Thermal subsystem [8086:a131] (rev 31)
        Subsystem: ASRock Incorporation 100 Series/C230 Series Chipset Family Thermal Subsystem [1849:a131]
        Kernel driver in use: intel_pch_thermal
        Kernel modules: intel_pch_thermal
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-H CSME HECI #1 [8086:a13a] (rev 31)
        Subsystem: ASRock Incorporation 100 Series/C230 Series Chipset Family MEI Controller [1849:a13a]
        Kernel driver in use: mei_me
        Kernel modules: mei_me
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31)
        Subsystem: ASRock Incorporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [1849:a102]
        Kernel driver in use: ahci
        Kernel modules: ahci
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #5 [8086:a114] (rev f1)
        Kernel driver in use: pcieport
00:1c.7 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #8 [8086:a117] (rev f1)
        Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-H LPC Controller [8086:a148] (rev 31)
        Subsystem: ASRock Incorporation B150 Chipset LPC/eSPI Controller [1849:a148]
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-H PMC [8086:a121] (rev 31)
        Subsystem: ASRock Incorporation 100 Series/C230 Series Chipset Family Power Management Controller [1849:a121]
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-H SMBus [8086:a123] (rev 31)
        Subsystem: ASRock Incorporation 100 Series/C230 Series Chipset Family SMBus [1849:a123]
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801
01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
        Subsystem: Intel Corporation Ethernet Server Adapter I350-T2 [8086:5002]
        Kernel driver in use: igb
        Kernel modules: igb
01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
        Subsystem: Intel Corporation Ethernet Server Adapter I350-T2 [8086:5002]
        Kernel driver in use: igb
        Kernel modules: igb
02:10.0 Ethernet controller [0200]: Intel Corporation I350 Ethernet Controller Virtual Function [8086:1520] (rev 01)
        Subsystem: Intel Corporation I350 Ethernet Controller Virtual Function [8086:5002]
        Kernel modules: igbvf
02:10.4 Ethernet controller [0200]: Intel Corporation I350 Ethernet Controller Virtual Function [8086:1520] (rev 01)
        Subsystem: Intel Corporation I350 Ethernet Controller Virtual Function [8086:5002]
        Kernel modules: igbvf
02:11.0 Ethernet controller [0200]: Intel Corporation I350 Ethernet Controller Virtual Function [8086:1520] (rev 01)
        Subsystem: Intel Corporation I350 Ethernet Controller Virtual Function [8086:5002]
        Kernel modules: igbvf
02:11.4 Ethernet controller [0200]: Intel Corporation I350 Ethernet Controller Virtual Function [8086:1520] (rev 01)
        Subsystem: Intel Corporation I350 Ethernet Controller Virtual Function [8086:5002]
        Kernel modules: igbvf
02:12.0 Ethernet controller [0200]: Intel Corporation I350 Ethernet Controller Virtual Function [8086:1520] (rev 01)
        Subsystem: Intel Corporation I350 Ethernet Controller Virtual Function [8086:5002]
        Kernel modules: igbvf
02:12.4 Ethernet controller [0200]: Intel Corporation I350 Ethernet Controller Virtual Function [8086:1520] (rev 01)
        Subsystem: Intel Corporation I350 Ethernet Controller Virtual Function [8086:5002]
        Kernel modules: igbvf
02:13.0 Ethernet controller [0200]: Intel Corporation I350 Ethernet Controller Virtual Function [8086:1520] (rev 01)
        Subsystem: Intel Corporation I350 Ethernet Controller Virtual Function [8086:5002]
        Kernel modules: igbvf
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
        Subsystem: ASRock Incorporation Motherboard (one of many) [1849:8168]
        Kernel driver in use: r8169
        Kernel modules: r8169

Code:
root@pve2:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:1c.7
/sys/kernel/iommu_groups/5/devices/0000:00:17.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.2
/sys/kernel/iommu_groups/3/devices/0000:00:14.0
/sys/kernel/iommu_groups/1/devices/0000:02:12.4
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:02:12.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:11.4
/sys/kernel/iommu_groups/1/devices/0000:02:11.0
/sys/kernel/iommu_groups/1/devices/0000:02:10.4
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:02:10.0
/sys/kernel/iommu_groups/1/devices/0000:02:13.0
/sys/kernel/iommu_groups/8/devices/0000:00:1f.2
/sys/kernel/iommu_groups/8/devices/0000:00:1f.0
/sys/kernel/iommu_groups/8/devices/0000:00:1f.4
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/4/devices/0000:00:16.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:04:00.0

Now I see that that the VFs and the physical card are in the same iommu group, could that be the problem?
 
Now I see that that the VFs and the physical card are in the same iommu group, could that be the problem?
yes, because then we try to reset all devices from the iommu group, which resets the nic, which removes the virtual functions and the error messages make sense...

maybe there is a bios option you can turn on, such as "ARI" or 'SR-IOV', but how the iommu groups are split is determined by the iommu on the motherboard (and the bios)
 
yes, because then we try to reset all devices from the iommu group, which resets the nic, which removes the virtual functions and the error messages make sense...

maybe there is a bios option you can turn on, such as "ARI" or 'SR-IOV', but how the iommu groups are split is determined by the iommu on the motherboard (and the bios)
Ok that's disappointing, I already activated all possible things in the bios, even the SR-IOV options inside of the card's options in the bios. Sadly I can't use another PCIe slot, the rest are only x1.

If there isn't another option to make this work (for instance not resetting the card somehow?) I probably have to give up on this. Thank you though for your help!
 
If there isn't another option to make this work (for instance not resetting the card somehow?)
would not really help, as the host could not use the physical card anymore...

what would work is to passthrough the whole card through to the vm and not the virtual functions)
 
would not really help, as the host could not use the physical card anymore...

what would work is to passthrough the whole card through to the vm and not the virtual functions)
Yes that would probably work, I just hoped I could reuse the card for other vms and also use the internal switch. The software switch creates too much overhead, if i route gigabit through the vm the CPU gets to 100%.

Just in case I get other hardware in the future: the VFs and PFs need to be in separate iommu groups, but do the VFs need a separate group each or can they be all in the same?
 
but do the VFs need a separate group each or can they be all in the same?
depends on how you want to use them.
in general you cannot assign two devices from the same iommu group to different vms (or use some on the host and some in the guest)
 
Hello @foerkede & @dcsapak , I stumbled to this thread as I had exactly the same issue: PVE crashed as all VFs where in the same IOMMU group.
Not sure what mainboard manufacturerer you have but what really helped me was "RTFM" (read the Friendly manual ;-) ).

There are many different switches & options in BIOS to setup before I had different IOMMU groups for VFs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!