NIC card Passthrough and essential things to think about

ieronymous

Well-Known Member
Apr 1, 2019
285
21
58
45
Hello again (twice today but I was preparing this post a couple of days now)

Well according to the very famous GPU Passthrough (and there is a very good official guide for that) there is the need to passthrough other pcie devices and you ll understand soon enough why I have many doubts if it is supposed to be done like the gpu way. If I cover that then I believe 85+ % of the need to pass through a pci device will be covered here with my post since what someone will want to passthrough will be a NIC card (for instance pfsence) an HBA (Host Bus Adapter, probably for Virtualizing Free-True-Nas) and a usb/sata controller for VM's. Pretty much the above pcie devices cover the vast majority of things someone might want to passthrough.

Enough with the long intro and lets get to the point. I want to passthrough an Intel I350 4 port Nic to a VM and here lies my thoughts about it trying to follow the official guide and my custom notes. My H/W and Bios settings will follow at the end of the post. So.......
1) I have configure the Grub adding the intel_iommu=on at the GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" line Depending on a forum's member here I have an extra question that came forward. In case of Proxmox zfs and systemd-boot usage do I have to fill the above line in/etc/default/grub or /etc/kernel/cmdline or both?
2) Assuming this out the equation, since my cpu is an E3 Xeon 1260L and according to others has not such a good IOMMU grouping capability do you think that the extra options iommu=pt pcie_acs_override=downstream,multifunction will help??
3)Does a Network Card with 4 ports considered to be a multifunction device like a gpu which has the audio part embedded into it or not? If not the the multifunction option above will be irrelevant to my case dont you think. But from what it depends if a device considered to be a multifunction device or not. Any cli command to find out?
VFIO modules I have added the (vfio / vfio_iommu_type1 / vfio_pci / vfio_virqfd) in /etc/modules
4)Does this update-initramfs -u -k all applies to my case also (I mean because running systemd-boot or is it irrelevant)
IOMMU interrupt remaping I have added the following lines
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
5)Even though I have the path /sys/kernel/iommu_groups/ running the command find /sys/kernel/iommu_groups/ -type l returns nothing
Is it because I havent yet added iommu=pt pcie_acs_override=downstream,multifunction and it is needed due to my weird (as many say) behavior of the xeon e3 1260L?
or motherboard issue to spplit devices in groups ? And what does that mean that all motherboard is one big group and cant pass anything?
6)Do you have to black list drivers for all pcie devices you want to pass through or is it applicable only to the gpu? Because my network card is using
Kernel driver in use: igb
Kernel modules: igb So do I have to blacklist like echo "blacklist idb" >> /etc/modprobe.d/blacklist.conf or it isnt needed? What if the onboard nic was intel also (it is)
and was using the same igb kernel driver (it isnt ). Wouldn t that be a problem too?
Finally I added the nic vendor to the VFIO echo "options vfio-pci ids=8086:1521"> /etc/modprobe.d/vfio.conf
7)All for ports have the same vendor id so i have to enter it once to include all 4 of them. Probably the gpu has an audio part from a different vendor and that results in 2
different ids you want to stick in the vfio right?

I would like to be excused for the long of the post but I really believe that if all or at least some of the questions been answered will help noobies at first and trigger also more experienced users to look into it and in general have a better understanding of how Proxmox works in the passthrough section at least.


Last but not least my H/W Spec are as follows
Machine:dell optiplex 7010 with
Cpu :Xeon E3 1260L socket LGA1155 4/8 Threads
support for VT-X / VT-d according to Intel's spec sheet
https://ark.intel.com/content/www/u...eon-processor-e3-1260l-8m-cache-2-40-ghz.html

Motherboard: Q77 express based chipset At least according to this official Intel link
https://www.intel.com/content/www/u...000005758/boards-and-kits/desktop-boards.html
seems to support VT-X/VT

Bios Settings:Version ->Latest
Boot Sequence -> UEFI
Advanced Boot options -> Enable LegacyOption ROMs
TPM Security ->Unchecked
Secure Boot Enable ->Disabled
Virtualization Support->Enable Virtualization
VT for Direct I/O ->Enabled
8) Trusted Execution ->Unchecked (I suppose this has to be enabled ???????) because of this
Intel® Desktop Boards require the following components to support Intel VT or Intel VT-d:
A third-party VMM (virtual machine manager) may also be required
Do this option needs to be enabled?


And some of the results of the relevant commands
Code:
lspci -v

02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
Kernel driver in use: igb
Kernel modules: igb

Code:
lspci -n -s 02:00   (card's Vendor IDs)

02:00.0 0200: 8086:1521 (rev 01)
02:00.1 0200: 8086:1521 (rev 01)
02:00.2 0200: 8086:1521 (rev 01)
02:00.3 0200: 8086:1521 (rev 01)

Code:
dmesg | grep -e DMAR -e IOMMU
[    0.018082] ACPI: DMAR 0x00000000D7FFFC48 0000B8 (v01 INTEL  SNB      00000001 INTL 00000001)
[    0.150014] DMAR: Host address width 36
[    0.150016] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.150022] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c0000020e60262 ecap f0101a
[    0.150025] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.150030] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
[    0.150033] DMAR: RMRR base: 0x000000daf78000 end: 0x000000daf9efff
[    0.150035] DMAR: RMRR base: 0x000000db800000 end: 0x000000df9fffff
[    0.150038] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.150040] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.150042] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.150451] DMAR-IR: Enabled IRQ remapping in x2apic mode

Code:
find /sys/kernel/iommu_groups/ -type l        returns nothing why?


Code:
dmesg | grep Virtual                          returns nothing why?
 
Last edited:
3)Does a Network Card with 4 ports considered to be a multifunction device like a gpu which has the audio part embedded into it or not? If not the the multifunction option above will be irrelevant to my case dont you think. But from what it depends if a device considered to be a multifunction device or not. Any cli command to find out?
I passed through 3 ports of my i350-T4 to one VM and 1 port to another VM. Here the 4 ports are 4 functions each with a own group.
6)Do you have to black list drivers for all pcie devices you want to pass through or is it applicable only to the gpu? Because my network card is using
Kernel driver in use: igb
Kernel modules: igb So do I have to blacklist like echo "blacklist idb" >> /etc/modprobe.d/blacklist.conf or it isnt needed? What if the onboard nic was intel also (it is)
and was using the same igb kernel driver (it isnt ). Wouldn t that be a problem too?
Finally I added the nic vendor to the VFIO echo "options vfio-pci ids=8086:1521"> /etc/modprobe.d/vfio.conf
I didn't blacklisted the NIC drivers because my onboard NIC is also an i350 and everything works without it.
 
  • Like
Reactions: abortionparty
Many interrelated questions, but maybe I can answer a few.

1. Please read this post to determine if you are using systemd or GRUB.
2. iommu=pt might increase performance for devices that are not passed through to VMs. Overriding ACS (and IOMMU grouping) can help separate devices to enable passthrough but I do not recommend it for production use because you cannot guarantee device/security isolation/separation.
3. It is a multi-function device if it presents itself as multiple devices on the PCI(express)-bus, which it does because you see 02:00.0 .. 02:00.3. If these are not in separate IOMMU groups, you need to pass it through as a single multi-function device 02:00 to a single VM, otherwise you can pick and choose.
4. Always do a update-initramfs -u to make sure your changes to the kernel parameters and module options are applied. If you use GRUB, you also need to do a update-grub.
5. I think you forgot to add intel_iommu=on to the kernel parameters. iommu=pt does not enable PCI passthrough (see 2).
6. No, you don't need to blacklist, unless resetting the device you want to passthrough does not work and you need the kernel to not load the driver. If you use options vfio-pci ids=... you might also need to add a softdep to prevent the driver from being earlier that vfio-pci. However, you only need this if the device and/or driver cannot do a proper reset.
7. When using vfio-pci.ids as a kernel paramter or in option in a file in /etc/modprobe.d/, it cannot differentiate between devices with the same ID. Just like blacklisting cannot differentiate between different devices that use the same driver. Therefore, you only need to write the ID once, but it might be overkill (see 6).

I suggest you first focus on enabling IOMMU (see 5) and inspecting the IOMMU grouping using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done;.
If the multi-function device (the four PCI network devices) are in a single group (or each in a single group) without other devices, then using PCI passthrough will probably just work. If not, you can experiment with options and kernel parameters. If you can show me actual IOMMU groups, I might be able to help some more.
 
  • Like
Reactions: abortionparty
First of all thank you both for your time and answers
I passed through 3 ports of my i350-T4 to one VM and 1 port to another VM. Here the 4 ports are 4 functions each with a own group.
Probably my issue here is that I cant have the system to report the iommu groups somehow since yours returns results.
I didn't blacklisted the NIC drivers because my onboard NIC is also an i350 and everything works without it.

So from your answer the assumption is .... you black list when the driver is used only for the device to be passed through (that doesnt make sense)
On the contrary your situation where you share the driver for both nics you should have a problem yet it seems for your saying you dont.

Many interrelated questions, but maybe I can answer a few.
Probably but yet the all bind to the same outcome. Needed knowledge for a task requires a lot of things to work or having the ability from before.
1. Please read this post to determine if you are using systemd or GRUB.
I had already read this and I am using systemd (The/etc/kernel/cmdline is also present and populated)

2. iommu=pt might increase performance for devices that are not passed through to VMs. Overriding ACS (and IOMMU grouping) can help separate devices to enable passthrough but I do not recommend it for production use because you cannot guarantee device/security isolation/separation.
Totally get this and as for production environment there is an exploit for that ACS override also. By the way I know that i can just use the SR-ION option of the card and accomplish my goal that way but I dont want just yet since I need this passthrough to work.

3. It is a multi-function device if it presents itself as multiple devices on the PCI(express)-bus, which it does because you see 02:00.0 .. 02:00.3. If these are not in separate IOMMU groups, you need to pass it through as a single multi-function device 02:00 to a single VM, otherwise you can pick and choose.
Also very logical explanation but I need to figure out as I mentioned a few lines above to make the system output the IOMMU groups somehow

5. I think you forgot to add intel_iommu=on to the kernel parameters
Nope I did.

6. No, you don't need to blacklist, unless resetting the device you want to passthrough does not work and you need the kernel to not load the driver. If you use options vfio-pci ids=... you might also need to add a softdep to prevent the driver from being earlier that vfio-pci. However, you only need this if the device and/or driver cannot do a proper reset.
Very good stuff here. As for softdep just read it got the general idea not a clue how it could be implemented here (which paths, lines to add .. etc)
As for the Reset partI dont know if this is the same bug many rx400/500 amd cards are suffering when someone restarts teh VM and not the card and the card and the system cant unload the drivers, so next restart you end up with a black screen. So I guess here without net at all. I remember there is a command to display if a device can do a reset or something like that.

I suggest you first focus on enabling IOMMU
...and that is what I am trying to achieve 3 days now.

PS Do you have an opinion to as enabling Trusted EXECUTION IN Virtualization Support would further help somehow?


Thank you both once again
 
If you have no IOMMU groups, then IOMMU is not (completely) enabled in UEFI/BIOS and/or changes have not been applied. Please check cat /proc/cmdline.
Enabling trusted execution will probably not change anything. Have you tried journalctl -b 0 | grep -i iommu and lsmod | grep vfio?
Both E3 Xeon 1260L and Q77 appear to support VT-d. Maybe it is just not supported by the Dell BIOS? Maybe the Dell support website can help?
 
Last edited:
  • Like
Reactions: abortionparty
So from your answer the assumption is .... you black list when the driver is used only for the device to be passed through (that doesnt make sense)
On the contrary your situation where you share the driver for both nics you should have a problem yet it seems for your saying you dont.
You blacklist drivers so your proxmox host can't use and initialize the device. If you put in a GPU without blacklisting proxmox will try to initialize it and if it is initialized it can't be passed through into a VM. I got two i350. One which the proxmox host uses and one that should ne passed through to VMs. If I would blacklist that intel driver I wouldn't be able to use one of the NICs on the host itself.
So I only blacklisted the ATI/Nvidia GPU drivers but not the Intel NIC drivers.
And it works fine without blacklisting them.
 
  • Like
Reactions: abortionparty
If you have no IOMMU groups, then IOMMU is not (completely) enabled in UEFI/BIOS and/or changes have not been applied. Please check cat /proc/cmdline.
Enabling trusted execution will probably not change anything. Have you tried journalctl -b 0 | grep -i iommu and lsmod | grep vfio?
Both E3 Xeon 1260L and Q77 appear to support VT-d. Maybe it is just not supported by the Dell BIOS? Maybe the Dell support website can help?
Well it is in one of those situations that you have made so many changes that you cant figure out what it was but did the trick. Anyway I checked everything again, i run pve-efiboot-tool refresh again (or it was the first time cant recall). Also didnt yet enabled (even if irrelevant) that Trusted Execution Bios Option since I am at work and Optiplex 7010 doesnt have remote management sadly. And now i have groups!!!
Before I post the results something else to be clarified first because the new passthrough guide doesnt clearly shows how to
So.................
Even though I added the extra option intel_iommu=on in /etc/kernel/cmdline like this
root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on
Should it be like this maybe
root=ZFS=rpool/ROOT/pve-1 boot=zfs
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

And under /etc/default/grub path insie the lines
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on"
a)Are they needed here as well or becasue of the systemd they are irrelevant?
b)It is mentioned “If you are using systemd-boot make sure to sync the new initramfs to the bootable partitions

In my case which is zfs mirror so both disks need to contain the boot info (ESR) in order to be able to boot after a disk failure.
Which command does the guide talks about? Is it about pve-efiboot-tool refresh only or something else?

And now the results of the commands
Code:
cat /proc/cmdline
initrd=\EFI\proxmox\5.4.78-2-pve\initrd.img-5.4.78-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on

Code:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done;
IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller [8086:0108] (rev 09)
IOMMU group 10 00:1f.0 ISA bridge [0601]: Intel Corporation Q77 Express Chipset LPC Controller [8086:1e47] (rev 04)
IOMMU group 10 00:1f.2 SATA controller [0106]: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] [8086:1e02] (rev 04)
IOMMU group 10 00:1f.3 SMBus [0c05]: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller [8086:1e22] (rev 04)
IOMMU group 11 02:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
IOMMU group 12 02:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
IOMMU group 13 02:00.2 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
IOMMU group 14 02:00.3 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
IOMMU group 1 00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 Processor Family Integrated Graphics Controller [8086:010a] (rev 09)
IOMMU group 2 00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04)
IOMMU group 3 00:16.0 Communication controller [0780]: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 [8086:1e3a] (rev 04)
IOMMU group 4 00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04)
IOMMU group 5 00:1a.0 USB controller [0c03]: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 [8086:1e2d] (rev 04)
IOMMU group 6 00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller [8086:1e20] (rev 04)
IOMMU group 7 00:1c.0 PCI bridge [0604]: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 [8086:1e10] (rev c4)
IOMMU group 8 00:1c.4 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 [8086:1e18] (rev c4)
IOMMU group 9 00:1d.0 USB controller [0c03]: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 [8086:1e26] (rev 04)

From this I see that each port has a different group and not shared by others, Not care much though since I need to pass through the card not specific ports Also journalctl -b 0 | grep -i iommu not needed now since the above outcome

Code:
lsmod | grep vfio
vfio_pci               53248  0
vfio_virqfd            16384  1 vfio_pci
irqbypass              16384  2 vfio_pci,kvm
vfio_iommu_type1       32768  0
vfio                   32768  2 vfio_iommu_type1,vfio_pci

 
a)Are they needed here as well or becasue of the systemd they are irrelevant?
b)It is mentioned “If you are using systemd-boot make sure to sync the new initramfs to the bootable partitions
Please have a look at this post. If you use systemd, you don't need to change the settings for GRUB. If you do use GRUB, just put them after root=ZFS=....
Glad to see you got it working! You can passthrough the entire multi-function device to a single VM, if you want, using the GUI or adding hostpci0: 02.00.
 
Please have a look at this post. If you use systemd, you don't need to change the settings for GRUB. If you do use GRUB, just put them after root=ZFS=....
Glad to see you got it working! You can passthrough the entire multi-function device to a single VM, if you want, using the GUI or adding hostpci0: 02.00.
You answered me with the same link in another member's post and still I believe that it hasn't the info needed, doesn't clarifying what I asked, your answer did. You could have avoided the link both here and in that other post I mentioned.

If you check my initial post I do mention that I am using systemd

Thank you for pointing out where and how exactly I should place it.

If I pass through the device for gui how am I supposed to inform Proxmox dont load it for itself during post. I believe I have to stab it first and claim it mine for usage in other purploses
 
  • Like
Reactions: abortionparty
Apologies for forgetting that you already mentioned systemd. I thought you asked about both.
If I pass through the device for gui how am I supposed to inform Proxmox dont load it for itself during post. I believe I have to stab it first and claim it mine for usage in other purploses
Do you really need to do this? Proxmox should be able to do this automatically when you start the VM.
If you want to prevent Proxmox to load any driver for the device(s), you can add options vfio-pci ids=8086:1521 to a file in the directory /etc/modprobe.d/ or specify vfio-pci.ids=8086:1521 as a kernel parameter in /etc/kernel/cmdline. Run update-initramfs -u and reboot to activate the change.
 
  • Like
Reactions: abortionparty
Do you really need to do this? Proxmox should be able to do this automatically when you start the VM.
Should but does it? How many things should work from gui and just dont or do partially. Admins and devs of Proxmox should have an opinion about that whom they know where to look at and check if checking or enabling options in the gui have the corresponding outcome in Proxmox conf files.
If you want to prevent Proxmox to load any driver for the device(s), you can add options vfio-pci ids=8086:1521 to a file in the directory /etc/modprobe.d/ or specify vfio-pci.ids=8086:1521 as a kernel parameter in /etc/kernel/cmdline. Run update-initramfs -u and reboot to activate the change.
I know that its part of the official passthrough guide after all but nice of you to point that out in case I forget. I have unloaded everything now (in order to make the iommu groups show up which they did) and now I am about to bind that id to the vfio module
 
  • Like
Reactions: abortionparty
Should but does it? How many things should work from gui and just dont or do partially. Admins and devs of Proxmox should have an opinion about that whom they know where to look at and check if checking or enabling options in the gui have the corresponding outcome in Proxmox conf files.

I know that its part of the official passthrough guide after all but nice of you to point that out in case I forget. I have unloaded everything now (in order to make the iommu groups show up which they did) and now I am about to bind that id to the vfio module
Yes it does work (automatically unbinding and resetting PCI devices), unless some of the hardware involved (CPU, motherboard, UEFI, device) is not working according to standards or specifications... which does happen too often in real life. Luckily, there is often a work-around found or created by the open-source community.
Please note that I am just a (currently paying) home-user that enjoys using Proxmox, and cannot possibly comment on the admins and devs of Proxmox.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!