SR-IOV success stories?

rungekutta

Active Member
Jul 19, 2021
41
8
28
49
Hi, very happy Proxmox user here.
However... what are the real success stories with SR-IOV networking which could maybe be shared here? I recently attempted this, based on the steps in the documentation complemented by other more detail guides online (as to how to set VLANs on VFs etc). However the end results were... disappointing. Using iperf3 to test, lots of packet drops / retransmissions between VMs attached to VFs on the same Nic, with changing behaviour depending on whether the VFs were on the same or different physical port on the Nic. Same port bad, different port good. The behaviour also seemed to change with MTU and packet size - if I forced iperf3 to use smaller packet loads (below 1400) then performance was much more consistent although generally poor. Switching back to Linux bridges again, and all was good.

Hardware: Supermicro X11-SSM motherboard, Xeon 1225 v6 CPU, Intel XXV710-DA2 NIC, all with latest firmware from official sources. Proxmox 8.4.1 and test VMs vanilla Debian 12.

What are you experiences?
 
Last edited:
Hello,
i'm not so far as you .. but will hopefully soon.

NIC: Broadcom P225P - 2 x 25/10G PCIe

My investigation until now with this NIC ..
  • use newest Firmware
  • use newest Driver
  • use OVS if applicable (not the Linux Bridges)
  • check and make the right settings (just around "100" of them ) on the card itself and in the Bios
    • Number of VFs, partitioning, NPAR, MTU, Bandwidth limits, Offloads, QoS, "behaviour"..
    • as example: the P225P allows full offload from OVS + depending on some settings, routes traffic always to the next Switch or acts as local Switch (with much higher Bandwidth)
  • check/optimise your network settings on your directly attached switch(es)
.. it's not just activating SR-IOV (as i initially thought)
.. and the documentation on the INet is very fragmented
 
Thanks. Following on to my first note, I got a little further. According to Intel’s own documentation, MTU must be the same on physical and all virtual functions, otherwise leads to undefined behaviour (which is what I saw). Previously I have run MTU 9000 on all physical ports and then either 9000 or 1500 on Linux bridges as relevant, which has worked well, but clearly isn’t a pattern that works with SR-IOV.

Also, my idea was to let host use physical port, and assign VFs to VMs. Not entirely clearly whether this is supposed to work, or if you can *only* use VFs, also for host, once you’ve started down that route. I saw some weird behaviour here as well.

All documentation from Intel and Mellanox assumes that you compile and use their proprietary drivers and tools, which I have no interest in doing, which is another complication.

Finally, obviously you can’t use the Proxmox firewall anymore as it’s per definition bypassed. Fully logical but a bit of a shame as I’ve used it for VM to VM isolation (in effect running each VM in its own DMZ).

So, yeah, mixed bag still…
 
Hello together,

I searched for answers but found this questions. Actually we are building a small proxmox cluster that will be built upon SR-IOV. We use Intel E810 and the Resource-Mapping feature from proxmox. The following stuff had to be done be ourselves:

* Creating an udev rule that activates the necessary amount of SR-IOV devices
* Created another udev rule for the PCI-ID add of the VF to assign a reasonable queue amount to the VF
* Created a Mapping script/systemd service that runs "pvesh create /cluster/mapping/pci" with the correct PCI devices, IOMMU group etc. at the right moment during startup to have the resource-pool filled. IOMMU group often changes between reboots this way this cannot be a static assignment (unfortunatly)
* Created a guest-hookscript that configures the VF for the use inside the VM. We store VLAN/MAC/resource-pool-name address inside a "tag" and set it during pre/post-start

This way the VM can even be moved around to another server, if the VLAN and the Resource-Mappings are available there. We use the "legacy" switchdev mode from Intel E810, this means the guest-hookscript does mainly do three "ip link set" commands to configure VLAN/MAC and trust mode (otherwise iPXE complains)

Regarding stability and functionality it all works as expected but a few minor issue arise:
* Of course no network statistic in proxmox because everything is handled in hardware
* E810: The PF driver cannot reset packet counters of VF, even if VF is unused. If you get another VF assigned the hardware packet counters inside the VM are ... wrong because they are only counted upwards until reboot.
* E810: iPXE/Linux Kernel has issues with MSI-X queue setup (iPXE sets it to "1", and the linux kernel cannot undo this). This is only an issue if you use iPXE. We are working on releasing an update for iPXE iavf driver.
 
Good to hear that it works for you! Do you achieve full switching speed between VFs on the same physical port? On VFs across physical ports? Do you let host use the physical port and assign VFs to VMs, or do you only do VFs?

Personally I found all this to be too finicky and with too many undocumented quirks and pitfalls that I wrote it off in the end as "not worth it". I guess with a much larger system and dozens or hundreds of VMs this equation could change... And/or this becomes better supported and less esoteric over time...
 
Hello rungekutta,

Actually we limit the output speed with "ip link set .... max_tx_rate" to 5G. This is more like a safty thing to avoid the theoretical possibility that one VM is eating all bandwidth. I tested this option with 100 MBit/s or 1 G and this amount of traffic be reached without issues with tcp file transfers.

Actually we try to avoid the software Linux bridge, because we observed issues when using IPv6 fragments.

In our setup we try to simulate/replace lots of small machines with Gigabit networking and 6/8 core CPUs by a quite big proxmox server (Dell R770 w/ Intel 6767P CPU and Intel E810 networking card).

In our case the host system is using the PF and virtually has no traffic - only the Proxmox webinterface and ssh session etc. The virtual machines are the "network-power-houses" - the first installation with an older machine (Dell R640, also using E810 network card) were quite promising that the plan will work.
 
In our case the host system is using the PF and virtually has no traffic - only the Proxmox webinterface and ssh session etc. The virtual machines are the "network-power-houses" - the first installation with an older machine (Dell R640, also using E810 network card) were quite promising that the plan will work.
Interesting, thanks. In my case host runs Ceph and I would like to do that over the same NIC that I also virtualise with SR-IOV to guests. We never had any stability issues with Linux bridges and performance is quite good too, so SR-IOV is more a theoretical exercise to see if I can get network latency even shorter. Could be beneficial for the router, for example, which is also virtualised.
 
Last edited: