Poor SDN Performance

Nov 12, 2024
6
0
1
Hello everyone,

I'm encountering an unusual networking issue in our 4-node cluster, and I’m hoping to get some insights or solutions from the community.

Cluster Setup:
  • Nodes: 4
  • Network Cards: 25 Gbps
  • Switch: MikroTik, 25 Gbps ports

Scenario 1: Using Linux Bridge
  • Configuration: Two Windows VMs connected via a Linux bridge through the MikroTik switch.
  • Performance: iperf tests show 15-20 Gbps.

Scenario 2: Using SDN Zone
  • Configuration: Both VMs placed in the same SDN zone, running on two separate nodes.
  • Performance: iperf tests drop to 1-1.5 Gbps.
  • Same Node Setup: When both VMs are on the same node within the SDN zone, iperf improves to 2-3 Gbps.

What I’ve Checked:
  • SDN Configuration: Verified that the 25 Gbps interfaces are utilized for the traffic. This is confirmed by monitoring the switch traffic.
  • Network Adapter: Using the virtio network adapter.
  • Multiqueue Options: Tried different multiqueue settings without any improvement.

Questions:
  • Has anyone experienced similar issues with SDN zones in a multi-node setup?
  • Are there specific SDN configurations or optimizations that might help achieve higher throughput between nodes?
  • Could there be any limitations or bottlenecks within the SDN implementation that I might have overlooked?
Any assistance or suggestions would be greatly appreciated!


EDIT: I´m using VXLan SDN Zones. When testing with a "simple" Zone on the same host, I can reach up tu 10Gbps between the two VM´s.

Thank you,
Luca
 
Last edited:
Hello everyone,

I'm encountering an unusual networking issue in our 4-node cluster, and I’m hoping to get some insights or solutions from the community.

Cluster Setup:
  • Nodes: 4
  • Network Cards: 25 Gbps
  • Switch: MikroTik, 25 Gbps ports

Scenario 1: Using Linux Bridge
  • Configuration: Two Windows VMs connected via a Linux bridge through the MikroTik switch.
  • Performance: iperf tests show 15-20 Gbps.

Scenario 2: Using SDN Zone
  • Configuration: Both VMs placed in the same SDN zone, running on two separate nodes.
  • Performance: iperf tests drop to 1-1.5 Gbps.
  • Same Node Setup: When both VMs are on the same node within the SDN zone, iperf improves to 2-3 Gbps.

What I’ve Checked:
  • SDN Configuration: Verified that the 25 Gbps interfaces are utilized for the traffic. This is confirmed by monitoring the switch traffic.
  • Network Adapter: Using the virtio network adapter.
  • Multiqueue Options: Tried different multiqueue settings without any improvement.

Questions:
  • Has anyone experienced similar issues with SDN zones in a multi-node setup?
  • Are there specific SDN configurations or optimizations that might help achieve higher throughput between nodes?
  • Could there be any limitations or bottlenecks within the SDN implementation that I might have overlooked?
Any assistance or suggestions would be greatly appreciated!


EDIT: I´m using VXLan SDN Zones. When testing with a "simple" Zone on the same host, I can reach up tu 10Gbps between the two VM´s.

Thank you,
Luca
Hi Luca,

You may have a look at the article https://pve.proxmox.com/pve-docs/chapter-pvesdn.html and read specific details regarding MTU size.

I have a gigabit setup and have reduced the MTU for every VM to 1430 and get full giga bit transfer (100%) on VM to VM transfer between nodes
 
Quick Update: After Setting the MTU Size on the Bridge and therefore also on the VM´s to 1430 I can reach a speed of up to 8-10 Gbit per second on the same host. This should be the limit for the virtual 10 Gbit/s Networkcard in windows.

But now when I move the VM´s to different hosts, the speed is still between 2-3Gbit/s.
All my switches are set tu 1500 and the NIC´s on the Server are also set to1500.

Does anyone have a clue what cloud be going on here?
 
Last edited:
Quick Update: After Setting the MTU Size on the Bridge and therefore also on the VM´s to 1430 I can reach a speed of up to 8-10 Gbit per second on the same host. This should be the limit for the virtual 10 Gbit/s Networkcard in windows.

But now when I move the VM´s to different hosts, the speed is still between 2-3Gbit/s.
All my switches are set tu 1500 and the NIC´s on the Server are also set to1500.

Does anyone have a clue what cloud be going on here?
1733902578085.png

For Windows you need to set the MTU like this, setting the MTU at the VM interface level does not work for me, you may try diff sizes that work for you, it is mostly between 1430 to 1450

For Linux, I can set the MTU at the VM level in ProxMox by using a static value or using 1 to inherit the MTU from the SDN zone, which actually propagates to the VM and you can see the MTU value for the interface

If setting the MTU on windows works for you, than you can set a MTU of 1550 on your switch (most people will have consumer hardware and can NOT do this hence we need to set it per vm), that should solve the issue IMHO.

I am no expert, just sharing what I have learnt in my journey...
 
I changed the settings in windows itself, as you explained in the picture, but there isn´t any difference. The performance is the same.

I´m not sure if I´m missing something. I configured the SDN Bride, the VM Settings and set the MTU inside the VM´s. When I check the MTU on the path from VM1 to VM2 it says that it is at 1430, wich would be correct. Is there anything else that could be the reason for this behaviour?

I´m definetly lost here. I tried everything to debug the problem.

Looking forward to any replies.

Luca
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!