VM to VM bandwith ~6-7Gbps mit virtualized router

dannyyy

New Member
Nov 23, 2024
5
0
1
Hallo,

EDIT: Thought I was in a German speaking forum. Now translated into English.

Status Quo
On my Minisforum MS-01 (Intel 13900H, 96GB RAM), I run Proxmox with a virtualized router. On the router, only the basic firewalling rules are active, no IDS / IPS or any additional filters / proxies.

Both, the ISP uplink as well as my switch are fully 10G. My virtualized router has currently 8 cores and 8GB memory assigned (also tested with higher resources) without success.
I tried independent OPNsense (what I had used so far) and VyOS.
My regular setup is having a VLAN aware bridge (vmbr1) and then assigning for each VLAN a virtual interface. But to test out, I also passed the vmbr1 as a trunk and let the router handle the VLAN with sub-interfaces (eth1.10, ...).

This is a sketch of my network setup:
1748526481367.png
Tests
I did several tests with Iper3. I wanted to test the uplink to my ISP from the router, as well ass from the VMs. Later I started testing the bandwidth between the VMs. Short, if the router has to do NATting at most, I get my desired speed. Which confirms, that my switch as well as the ISP uplink is fully functional and delivers the contracted bandwith.

1. Proxmox Node --> ISP (9.3 Gbps)
2. PC --> PC (9.4 Gbps)
3. VM1 --> LXC1 / VM2 --> LXC2 (48-50 Gbps)
4. Router eth1 --> VM1 (48-50 Gbps)
---
5. VM1 --> VM2 / VM1 --> PC / VM1 --> ISP (6-7 Gbps)

But as soon as the connections are across VLANs, the bandwith drops. Since my Proxmox node itself is also in a VLAN and the connection to the ISP is excellent, I'm asking what it is. Then according to the list above, VM to ISP is not.

Root causes
I tried to rule out all the possible causes, that's why I played with two router OS, different resource settings, ...
All the recommended tuning parameters (Internet, ChatGPT, ...) such as playing with hardware-offloading, multiqueue on the bridge, have worsened the result (~5Gbps).

Just observing the general CPU usage doesn't reveal any hardware limitations. The entire Proxmox node as well as single VMs do not exceed 5-15%.

My last guess is, that my machines utilizes just one core or cores, which are efficiency-cores, with not enough power. Would in this case, the utilization in a VM still display 10%?

Do you have experience with such a home lab setup? Any suggestions, what I could test next?

Thank you a lot!
Cheers Danny
 
Last edited:
You could try passing through the NIC via PCI passthrough, which is recommended if you want to run a high-performance router inside a VM.
 
  • Like
Reactions: Johannes S
What will happen with all my VMs? If I pass through the NIC, then I guess it's exclusively used by my OPNsense VM. Which interface do I have to assign to all my VMs?

In my scenario, where Inter-VLAN routing is done on a virtualized router and all the clients (in different VLANs) are also virtual, does the physical NIC do anything?
 
Ah, sorry I misread your initial post and thought the uplink was slow - was a bit too premature there. Just to be sure: you're routing between VMs in different VLANs (e.g. VM1 and VM2) on the local network with your OPNsense / VyOS VM and that is what's slow?

How does the VM configuration of your router look like? Can you also post the configuration of the VMs as well your general network setup on the pve host?

Code:
qm config <vmid> # for VMs + Router
cat /etc/network/interfaces

Could you also include the network setup of your VMs?
 
To make the debugging process easier, I made a much simpler, and isolated setup. No real NIC involved.

Proxmox
vmbr0 Linux Bridge, VLAN aware, no Firewall

Ubuntu 1a (LXC)
eth0 --> vmbr0, VLAN 110, 172.17.110.1/24, GW: 172.17.110.254
CPU: 8 cores, 4GB memory

Ubuntu 1b (LXC)
eth0 --> vmbr0, VLAN 110, 172.17.110.2/24, GW: 172.17.110.254
CPU: 8 cores, 4GB memory

Ubuntu 2 (LXC)
eth0 --> vmbr0, VLAN 120, 172.17.120.1/24, GW: 172.17.120.254
CPU: 8 cores, 4GB memory

OPNsense (VM)
Clean install, no additional rules set, despite an allow rule for vtnet0 <--> vtnet1
vtnet0 --> vmbr0, VLAN 110, 172.17.110.254/24
vtnet1 --> vmbr0, VLAN 120, 172.17.120.254/24
CPU: 8 cores, 8GB memory

Network is full working. Every host (Ubuntu1, Ubuntu2, OPNsense) can ping each other and communicate

Tests
Default settings on OPNsense
On Ubuntu 1a: iperf3 -c 172.17.110.2 --> ~50Gbps
On Ubuntu 1a: iperf3 -c 172.17.110.2 -P 4 --> ~150Gbps

On Ubuntu 1a: iperf3 -c 172.17.120.1 --> ~6Gbps
On Ubuntu 1a: iperf3 -c 172.17.120.1 -P 4 --> ~6Gbps

On OPNsens: iperf3 -c 172.17.110.1 --> ~6Gbps
On OPNsens: iperf3 -c 172.17.120.1 --> ~6Gbps

On OPNsens: iperf3 -c 172.17.110.1 -P 4 --> 5.5Gbps
On OPNsens: iperf3 -c 172.17.120.1 -P 4 --> 5.5Gbps

---

Disabled firewall with pfctl -d
On Ubuntu 1a: iperf3 -c 172.17.120.1 --> ~5-5.5Gbps
On Ubuntu 1a: iperf3 -c 172.17.120.1 -P 4 --> ~5-5.5Gbps

--

I'm really clueless :(
 
Yes, the bottleneck is clearly the OPNsense VM. The connection between 1a and 1b is fast, because it only uses the linux bridge, all other connections are slow because they pass the OPNsense.

Do you have CPU type host set on the OPNsense? Also, if the OPNsense network device has 8 cores, then it would make sense to set the queues to 8 as well. What network device type are you using? virtio?
 
  • Like
Reactions: Johannes S
Thank you very much @shanreich I really appreciate your help!

Do you have CPU type host set on the OPNsense?
Yes, the type is set to host

What network device type are you using? virtio?
Yes, I'm using VirtIO

Also, if the OPNsense network device has 8 cores, then it would make sense to set the queues to 8 as well.
I tried it already in the past, but not yet in the isolated test setting. Running Iper3 with default settings decreased the performance a bit, but increased for parallel test:
On Ubuntu 1a: iperf3 -c 172.17.120.1 --> ~5.5Gbps
On Ubuntu 1a: iperf3 -c 172.17.120.1 -P 4 --> ~13.5Gbps
On Ubuntu 1a: iperf3 -c 172.17.120.1 -P 8 --> ~20Gbps

At least, this shows a slightly better performance in parallel benchmarks.
I still fill unsatisfied with this, but I don't know whether there is more to expect. I don't want to invest much more hours, when reality is close to this result.
My gut feeling thought, a single performance of at least 20+Gbps should be possible, so that VM to VM transfers are fast, and I can fully saturate my internet connection (10G). My ISP would also offer (25G) for the same price, but if I see this numbers it's not worth to infest in expensive SFP28 hardware
 
The numbers with additional threads seem pretty alright to me. If you're worried about the efficiency cores, then you could try pinning the VM to your performance cores and see if that improves the speed. Otherwise I don't really have that much experience with performance tuning OPnsense, so maybe others with more experience could chime in.
 
I did some additional testing, and it seems OPNsense or FreeBSD is really not made for high-performance routing. I cannot say whether it's caused by the virtualization or in general.

OPNsense
Tried all kind of tunings (https://binaryimpulse.com/2022/11/opnsense-performance-tuning-for-multi-gigabit-internet/) and all combination of hardware offloading. Still got the best results with the original OPNsense settings.

There are a lot of blog posts, forum entries, and all experience the same. Some even on bare-metal. Also CPU couldn't be the issue. Some has cheap Intel i3, others have server grade AMD Epics.

Ubuntu Server
Same setup as in my last post. Just enabled routing net.ipv4.ip_forward=1 . Superior performance. Single threaded up to ~50Gbps and multithreaded up to ~160Gbps. No other settings needed. But of course it acted as a pure router. No iptables or similar running.

VyOS
Same setup as in my last post. A bit faster, but away from 10Gbps in single threaded. A bit better than OPNsene in parallel tests (25+ Gbps).
Then I enabled hardware offloading (GRO, GSO, RPS, SG, TSO) for LAN1 and LAN2. I only left out LRO. But haven't tried any other combinations.
Single-threaded already reach amazing ~30Gbps and parallel tests ~100Gbps. Just to make clear, there is still no physical hardware involved. Everyting is running on a NIC-less Linux bridge on Proxmox. Why hardware-offloading had an effect, I don't know. I guess the VirtIO driver have good capabilities.

Verdict
I'm really thinking about switching to VyOS. But there will be a steep learning curve and a lot of efforts until I have configured everything I did before (DHCP, IGMP, DDNS, Firewall, Monitoring, ...)

Still hoping someone knows which magic buttons I have to push, to let OPNsense performe better :D