(EVPN) SDN inter-node vm to vm throughput looks degraded

freakingObelix

New Member
Mar 11, 2025
18
1
3
Hello folks, well... I know there are many other posts about this and I'm just not sure if I did everything that was suggested in those posts but so far:
1. used multiqueue in both VM adapters
2. tested in the same node ~13gbps throughput (iperf3), and different node ~1.6gbps with or without multiqueue enabled.
3. both phisical nodes are bare metal, dual 10gbps NIC NOT shared with storage, dedicated 10gbps NIC for SDN

My HW is a bit old, but solid. Dual cpu xeon X5670 on each node, NICs are HP (Emulex) NC553i.
I'm using Non-registered repos, v8.4.1, updated today. Frr 10.2.2-1+pve1

Should I expect more throughput and assume I did something wrong / there is any issue with kernel or any module? Or is this normal?

Thanks for helping!
 
I really don't known how vxlan are performing with such "bit old" -> 2010 cpu ;)

also modern nic have vxlan offloading, so it'll be full cpu here for vxlan encapsulation.

maybe try to disable spectre,meltdown,...mitigations.
 
I was truly afraid of such answer xD already tried disabling, no changes, so rolled back.
maybe check your cpu stats if you don't have a core at 100%, as it's quite possible that old nics don't have the RSS feature compatible with vxlan, so are not able to dispatch vxlan traffic across multiple cores.
 
Hey @spirit yes, that's exactly what is going on.
I guess EVPN isn't optimal in this scenario.
One-last-question: Does QinQ require hardware offloading too? Because if not (or "not that much" as I think it would be), then there is my answer. If I need to span across sites I can use some L2 VPN... I thought about this but I don't know the limits of my stupidity before begining to build things. So much over my knowledge.
Thanks a lot for your help.
 
Last edited:
Hello again! @spirit after a maintenance window, switched every single VNET to QinQ but... nothing changed at all. All my network supports a MTU of 9216b, the bridge for QinQ has a MTU of 2000, and the qinq zone is defined with a MTU of 1500 to be transparent for VMs. Same host performance maxes out the interface capacity but inter-node traffic through the SDN always saturates a single core and causes this bottleneck. Isn't there anything else I can do?
Thanks in advance.