[TUTORIAL] Enabling SR-IOV for Intel NIC (X550-T2) on Proxmox 6

I've found that the write to `sriov_numvfs` can only be done once, and if you want less or more vf entries, you'll have to reboot... unless I missed a removal step to reset the VFs
you can reset by writing 0.

I've adapted my solarflare proxmox script for my QNAP intel X710 cards:

nano /etc/systemd/system/sriov-vfs.service

Code:
[Unit]
Description=Enable SR-IOV and detach guest VFs from host
Requires=network.target
After=network.target
Before=pve-firewall.service
[Service]
Type=oneshot
RemainAfterExit=yes
# Create NIC VFs
ExecStart=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecStart=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
# Set static MACs for VFs
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 0 mac 76:9e:17:83:39:e5'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 1 mac 46:2c:6d:24:6b:1b'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 2 mac 3e:47:48:12:ed:94'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 3 mac be:e3:6a:f3:8f:ac'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 4 mac 62:8f:3d:bb:02:08'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 5 mac ae:91:57:b9:14:7f'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 6 mac 5a:c2:08:a9:68:a7'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 7 mac b2:f0:18:af:cb:c5'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 0 mac 16:47:7c:a8:95:98'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 1 mac a6:c7:c5:7f:9c:22'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 2 mac b6:0f:45:34:5e:19'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 3 mac 2a:f7:37:84:31:30'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 4 mac 8a:fa:f8:c5:0b:93'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 5 mac b2:f5:d5:2f:79:06'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 6 mac c2:92:f5:fa:32:20'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 7 mac 2e:fb:29:1e:48:31'
# Detach VFs from host
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.1 > /sys/bus/pci/devices/0000\\:01\\:02.1/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.2 > /sys/bus/pci/devices/0000\\:01\\:02.2/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.3 > /sys/bus/pci/devices/0000\\:01\\:02.3/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.4 > /sys/bus/pci/devices/0000\\:01\\:02.4/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.5 > /sys/bus/pci/devices/0000\\:01\\:02.5/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.6 > /sys/bus/pci/devices/0000\\:01\\:02.6/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:02.7 > /sys/bus/pci/devices/0000\\:01\\:02.7/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:0a.1 > /sys/bus/pci/devices/0000\\:01\\:0a.1/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:0a.2 > /sys/bus/pci/devices/0000\\:01\\:0a.2/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:0a.3 > /sys/bus/pci/devices/0000\\:01\\:0a.3/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:0a.4 > /sys/bus/pci/devices/0000\\:01\\:0a.4/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:0a.5 > /sys/bus/pci/devices/0000\\:01\\:0a.5/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:0a.6 > /sys/bus/pci/devices/0000\\:01\\:0a.6/driver/unbind'
ExecStart=/usr/bin/bash -c 'echo 0000:01:0a.7 > /sys/bus/pci/devices/0000\\:01\\:0a.7/driver/unbind'
# List new VFs
ExecStart=/usr/bin/lspci -D -d 8086:154c
# Destroy VFs
ExecStop=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecStop=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
# Reload NIC VFs
ExecReload=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c 'echo 0 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f0np0/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c 'echo 8 > /sys/class/net/ens2f1np1/device/sriov_numvfs'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 0 mac 76:9e:17:83:39:e5'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 1 mac 46:2c:6d:24:6b:1b'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 2 mac 3e:47:48:12:ed:94'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 3 mac be:e3:6a:f3:8f:ac'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 4 mac 62:8f:3d:bb:02:08'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 5 mac ae:91:57:b9:14:7f'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 6 mac 5a:c2:08:a9:68:a7'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f0np0 vf 7 mac b2:f0:18:af:cb:c5'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 0 mac 16:47:7c:a8:95:98'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 1 mac a6:c7:c5:7f:9c:22'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 2 mac b6:0f:45:34:5e:19'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 3 mac 2a:f7:37:84:31:30'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 4 mac 8a:fa:f8:c5:0b:93'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 5 mac b2:f5:d5:2f:79:06'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 6 mac c2:92:f5:fa:32:20'
ExecReload=/usr/bin/bash -c '/usr/bin/ip link set ens2f1np1 vf 7 mac 2e:fb:29:1e:48:31'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.3 > /sys/bus/pci/devices/0000\\:01\\:02.1/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.4 > /sys/bus/pci/devices/0000\\:01\\:02.2/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.5 > /sys/bus/pci/devices/0000\\:01\\:02.3/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.6 > /sys/bus/pci/devices/0000\\:01\\:02.4/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:00.7 > /sys/bus/pci/devices/0000\\:01\\:02.5/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.0 > /sys/bus/pci/devices/0000\\:01\\:02.6/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.1 > /sys/bus/pci/devices/0000\\:01\\:02.7/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.3 > /sys/bus/pci/devices/0000\\:01\\:0a.1/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.4 > /sys/bus/pci/devices/0000\\:01\\:0a.2/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.5 > /sys/bus/pci/devices/0000\\:01\\:0a.3/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.6 > /sys/bus/pci/devices/0000\\:01\\:0a.4/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:01.7 > /sys/bus/pci/devices/0000\\:01\\:0a.5/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:02.0 > /sys/bus/pci/devices/0000\\:01\\:0a.6/driver/unbind'
ExecReload=/usr/bin/bash -c 'echo 0000:01:02.1 > /sys/bus/pci/devices/0000\\:01\\:0a.7/driver/unbind'
ExecReload=/usr/bin/lspci -D -d 8086:154c
[Install]
WantedBy=multi-user.target

Enable with:

Code:
systemctl daemon-reload
systemctl enable sriov-vfs.service

Usage:

Code:
systemctl start sriov-vfs.service
systemctl stop sriov-vfs.service
systemctl reload sriov-vfs.service
systemctl status sriov-vfs.service

source + more discussion about the Qnap X710 cards:

https://forums.servethehome.com/index.php?threads/qnap-qxg-10g2t-x710-discussion-thread.46735/
 
Last edited:
So apologies for asking what likely is a pretty silly question, but what would I use SR-IOV for?
It allows for more than one VM (or multiple VMs and the host) to have "raw"/direct access to the network interface. Not that bandwidth between the VMs and host that uses this NIC's VFs, will be limited/constraint to the PCIe bandwidth going to and from that interface. This also means that the VM will have to load the correct drivers for that physical interface, and not like in "normal" virtualized interfaces, the drivers for like the virtio/realtek/e1000/etc. NICs usually "presented" to the VM by KVM/Qemu
I have NIC's that support it, and I use them on Proxmox boxes using the default vmbr0 to hardware network interface bridge.

Would I be better served by enabling SR-IOV and forwarding these virtual network devices to each guest? What is the benefit? Latency/performance? CPU load?

Yes and no. It depends.
If you use VirtIO, and you have VMs on the same physical host, then a Linux or openvswitch bridge would be more preferable, as you are now only limited by the CPU & memory bandwidths (and the OpenVSwitch or Linux Bridge overheads), while using the VFs you'll be hitting the PCIe bus to and fro the hardware NIC... but you'll be able to do hardware offload of things checksums/etc. inside the VM (okay VirtIO doesn't do checksums as it doesn't "need" to
do it inside the CPU-memory as it's assumed "perfect" but then the host should ask/do the checksums via that interface on the way out of that NIC, jsut "extra" scatter/gather though)

SO, I've been using OpenVSwitch bridging for the past >10years, however, I'm now hit with a case where the SRIOV makes loads of sense as I'll be mostly having the VMs talking only/directly to the outside work on their interfaces (host would just add a bridge in between) even on separate VLANs (see ip link ##### vf # vlan ####) but I need the host to ALSO have some communication on that interface (might be yet another VLAN too) which

To note: SR-IOV is a "subset" of PCI pass-through, ie. PCI passthrough is giving that "whole" device to the VM... (with the all-functions setting, all the interfaces that uses that "chip" in lay mans/simplified terms)

To put it in context as a 2 examples of Quad port interfaces, a Mellanox CX-4 and a Intel X710

The X710 is a "single chip" that handle the 4x ports. you'll notice it as the ".#' at the end
```
02:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
02:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
02:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
02:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
```

Don't have the access to the real quad CX-4 servers at present, but it'll look similar to:
```
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:01.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:01.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
```
notice the :0#: in the middle, as it is two "chips" on the same PCIe bus each with separate functions , the .# at the end

You can either pass through the functions one at a time, ie. 02:00.0, or 81:01.1 for that interface/port, or you can passthrought the whole "chip" with all-functions.

The difference above, is that the Intel X710 will pass-through all 4 ports, ie. funtions .0/1/2/3, while the Mellanox case for 81:01.x + all-functions, you'll only pass-through functions .0 & .1 of 81:01 and not the other 2x ports of "chip" 81:00

SR-IOV with VFs, you now split each of those ports, into sub/virtual ports that looks like:
```
82:00.2 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
82:00.3 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]

ip link show
8: ens1f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000
link/ether 98:03:9b:79:45:9a brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 1 link/ether 00:53:44:dd:dd:33 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
vf 2 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
altname enp130s0f0np0
```


each with a NEW mac address... and some other "funs" like spoof protection (ie. a VM can't change it's MAC), "force" it into a 802.1q/1ad VLAN. Ie. now, you pass-through that virtual PCI interface to the VM (oh, you can't AT PRESENT
do nested SR-IOV) and then you can have multiple VMs, each on a separate VLAN, NOT going through the host's bridge - some performance gains, but restrictive in other cases... though understand what and where :)

And yes, I've learn these after >10years "ignoring" it, all in the past weekend... but it's fun :)
 
Last edited:
ExecStart=/usr/bin/lspci -D -d 8086:154c
Yeah, THIS should be set as a variable of sorts, or point people to the right place to find that value, especially as the Intel Vendor code is 8086: and is used for everything else, so for Intel based NICs you need to specify the device code otherwise yu'll also see the CPU etc. etc. on a Intel based system
 
Yeah, THIS should be set as a variable of sorts, or point people to the right place to find that value, especially as the Intel Vendor code is 8086: and is used for everything else, so for Intel based NICs you need to specify the device code otherwise yu'll also see the CPU etc. etc. on a Intel based system
from the source forum thread i linked to:

"I obtained the lspci -D -d 8086:154c command by trying various combo's from the output of lspci -vvm -nn -s 0000:01:02.0"

Code:
#lspci -vvm -nn -s 0000:01:02.0
Device: 01:02.0
Class:  Ethernet controller [0200]
Vendor: Intel Corporation [8086]
Device: Ethernet Virtual Function 700 Series [154c]
SVendor:        QNAP Systems, Inc. [1baa]
SDevice:        Ethernet Virtual Function 700 Series [0000]
Rev:    02
ProgIf: 00
NUMANode:       0
IOMMUGroup:     158

If someone knows a better way etc.. please share