pfSense VM slower than expected inter-vlan routing

mshorey

Member
Dec 24, 2021
18
0
6
43
pve 8.2.2
I'm hoping someone else may have some insight for me that is running pfSense as a VM on their proxmox instance and maybe not seeing the speeds they expect between VLANs. Iperf3 testing between VMs on the same VLAN I can reach transfer speeds of ~30Gbps. But between VMs on different VLANs (when the traffic needs to be routed through the pfSense VM) I'm seeing maybe 5-6Gbps. Watching the CPU utilization on the pfSense VM when it's having to route these iperf tests between the VLANs it never goes over 15% or so. I have set multiqueue to 4 or 8 for each VM depending on their vCPU count and that hasn't seemed to make a difference. All VMs are using virtio to connect and tagging their traffic for specific VLANs. I was previously doing everything over a single linux bridge (vmbr0) but I thought I'd add a second bridge (vmbr1) to pfSense just for my VM's vlan (100) to see if that would make a difference and it did a little. Previously I was seeing about 4Gbps now I'm seeing just under 6Gbps testing from a VM on VLAN1 (vmbr0) to VLAN100 (vmbr1) but am seeing more retransmits than I'd expect. Everything on my network is still 1500mtu and I would change it if I thought it'd make a difference but the CPU utilization on my pfSense VM is incredibly low so that would not be indicative of needing to use jumbo packets IMO. I'm open to any suggestion ya'll might have and extremely appreciative. My pfSense version is 2.7.2 and the VM config is as follows:
root@pve-1:/etc/pve/qemu-server# cat 107.conf
agent: 1
balloon: 0
boot: order=scsi0;ide2
cores: 8
cpu: host
hostpci0: 0000:01:00.1,pcie=1
ide2: local:iso/pfSense-CE-2.7.2-RELEASE-amd64.iso,media=cdrom,size=854172K
machine: q35
memory: 8192
meta: creation-qemu=8.1.2,ctime=1702956906
name: PFSENSE-2
net0: virtio=BC:24:11:EF:4B:41,bridge=vmbr0,queues=8
net1: virtio=BC:24:11:89:B5:BB,bridge=vmbr1,queues=8
numa: 0
onboot: 1
ostype: l26
scsi0: VMs:vm-107-disk-0,iothread=1,size=12G
scsihw: virtio-scsi-single
smbios1: uuid=7aa0d7e5-90b6-444c-98ec-2bcdab0a0e43
sockets: 1
startup: order=1,up=30
vmgenid: 331695e6-6024-4a0f-a672-cd39aac55e20
And here's one of the testing VMs:
root@pve-1:/etc/pve/qemu-server# cat 103.conf
agent: 1
balloon: 0
boot: order=scsi0
cores: 4
cpu: host
memory: 16384
meta: creation-qemu=7.0.0,ctime=1662999076
name: DOCKER
net0: virtio=36:C7:81:59:D4:93,bridge=vmbr0,queues=4
numa: 0
onboot: 1
ostype: l26
scsi0: VMs:vm-103-disk-0,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f2c40f91-177b-410f-ab0d-c930e7b6160f
sockets: 1
startup: order=4,up=30
tags:
vmgenid: 3abb9524-cb3e-46a2-8862-52d37bce2907
And the other testing VM:
root@pve-1:/etc/pve/qemu-server# cat 112.conf
agent: 1
balloon: 0
boot: order=scsi0;ide2;net0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 4096
meta: creation-qemu=7.1.0,ctime=1672979532
name: NPM
net0: virtio=2a:ed:b7:28:34:63,bridge=vmbr1,queues=4,tag=100
numa: 0
onboot: 1
ostype: l26
scsi0: VMs:vm-112-disk-0,discard=on,iothread=1,size=42G
scsihw: virtio-scsi-single
smbios1: uuid=47b48916-70d2-4543-9220-c350461b565e
sockets: 1
vmgenid: 28600f0b-74d5-494d-a64e-c4559cfc5edc
And I've attached a screenshot of the iperf3 testing performed after making the changes mentioned.
 

Attachments

  • Screenshot 2024-06-03 151518.png
    Screenshot 2024-06-03 151518.png
    127.5 KB · Views: 15
How does the CPU utilization of the whole system change when you run your tests?

You are moving from using layer 2 switching handled at the kernel level (near hardware level) to layer 3 switching/routing where the packets must be passed from the kernel to the pfSense VM and then back to the kernel. The result is far more work being carried out.
 
  • Like
Reactions: mshorey
How does the CPU utilization of the whole system change when you run your tests?

You are moving from using layer 2 switching handled at the kernel level (near hardware level) to layer 3 switching/routing where the packets must be passed from the kernel to the pfSense VM and then back to the kernel. The result is far more work being carried out.
It looks to hover around 20% when I'm running 4 parallel streams iperf3 between the VMs on different VLANs
 

Attachments

  • Screenshot 2024-06-03 191542.png
    Screenshot 2024-06-03 191542.png
    32.5 KB · Views: 9
I think this is all indicating that you should use VLANs to isolate traffic, rather than route high volumes of traffic between VLANs using software.

From the details posted, you are

- on a single pfSense VM locking one of the 8 vCPUs at 100% (15% overall load)

- On the overall system locking 8 of the reported 44 CPUs at 100% which will be from kernel level work and all the test VMs you are running pfSense on.

All this load and the resulting low performance come from the fact that you are trying to use pfSense to emulate a high-performance layer 3 switch which is something that is normally built at the silicon level in an asic. You may see better performance by deploying something like openswitch, but you are also likely to see even greater cpu load.
 
  • Like
Reactions: mshorey
I think this is all indicating that you should use VLANs to isolate traffic, rather than route high volumes of traffic between VLANs using software.

From the details posted, you are

- on a single pfSense VM locking one of the 8 vCPUs at 100% (15% overall load)

- On the overall system locking 8 of the reported 44 CPUs at 100% which will be from kernel level work and all the test VMs you are running pfSense on.

All this load and the resulting low performance come from the fact that you are trying to use pfSense to emulate a high-performance layer 3 switch which is something that is normally built at the silicon level in an asic. You may see better performance by deploying something like openswitch, but you are also likely to see even greater cpu load.
All of this makes good sense. I appreciate your input and there may be a L3 switch in my near future.
 
If you are hoping for a ~30Gbps or even ~10Gbps L3 switch you may find the cost a little on the high side. This is a market where people will install openswitch onto a dedicated server with a few high-speed NICs. They get to throw CPU cores at the problem, but without the virtualization overheads.
 
Came here to say that I have the exact same problem right now. It's been driving me crazy for quite a while trying to determine if the bottleneck is occurring on my switch, firewall, or Proxmox box. It appears to be a Proxmox issue.

Rather than pfSense, I run OPNsense. My CPU performance is almost the same as yours (never runs high). Intra-VLAN switching (same vlan) between VMs gives me ~30~45 GB/s when carried out by the Proxmox host with VirtIO. When intra-VLAN switching between two distinct devices separated by my switch, I saturate my 2.5 GBe NIC.

But once I move to inter-VLAN routing (different vlans), I see a massive reduction in speed with many retries/retransmissions in iperf. For the record, I have Intel i225 Rev 3 (2.5 Gbe) NICs in my Proxmox box and OPNsense box.

When iperf testing from the Proxmox host itself, I don't see any inter-VLAN routing bottlenecks. Here is a link to my current post. I also linked another person's post who is having the same issue. So far, the Proxmox admin who answered my question is surmising the problem stems from OPNsense. I don't think it does, but I will be completing a test this weekend to determine it once and for all.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!