Open vSwitch (OVS) with RSTP and OVSIntPort - ARP issue

pveFTW

New Member
Jan 20, 2024
1
0
1
Dear community,

We have setup similiar to OVS with RSTP and have a strange issue.

The difference to the described setup is, that instead of 10Ge interconnects we have 10Ge and 1Ge Fallback - Uplinks to the switches.

So far the VMs are working fine.

But from one of the nodes we can not ping another node over the OVSIntPorts
The ARP-Request does no go through.

Might this be related to this post?
The IntPort is configured in access mode and other hosts on the same subnet are accessible, just two nodes to each other not.
Stange, because from the third node both the others are accessible.
Why is it recommended to not put an ip address on the bridge itself?
It would be in the untagged vlan anyway.
What are possible issues doing so?

Also we tried to setup with bonded links, but it seems that ovs still does not support RSTP on bonds.
Are the any news on this?

Thank you for your considerations in advance :)
 
Hi, I've ran into the same issue. Yes, the thread you linked is definitely related. I've now configured an address on the OVSBridge of each node and it's working fine. I'm using a 3-node cluster where each node has a connection to each other node (ring topology, RSTP). I wanted to split up the CEPH network to have a different OVS vlan (OVSIntPort), but the problem manifested there. :( However VM traffic is fine, you can specify the VLAN tag in the VM network device configuration. Works like a charm for inter-VM traffic.

Code:
auto ens2f0
iface ens2f0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options vlan_mode=native-untagged other_config:rstp-port-mcheck=true other_config:rstp-port-admin-edge=false other_config:rstp-enable=true other_config:rstp-port-auto-edge=false other_config:rstp-path-cost=150

auto ens2f1
iface ens2f1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options vlan_mode=native-untagged other_config:rstp-port-mcheck=true other_config:rstp-port-admin-edge=false other_config:rstp-enable=true other_config:rstp-port-auto-edge=false other_config:rstp-path-cost=150

auto vmbr0
iface vmbr0 inet static
        address 172.31.253.1/24
        ovs_type OVSBridge
        ovs_ports ens2f0 ens2f1
        ovs_mtu 9000
        up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
        post-up sleep 10
#VM-NETWORK
 
I just found a workaround and since I spend quite some time troubleshooting this, I thought others might find it useful.
I followed the same steps in document OP referenced except that I don't have secondary switch.
The workaround seems to be to add VLAN-aware Linux Bridge and set OVS Bridge as parent. I know it might not be recommended but hey, at least it works in my homelab and I don't care too much if it is not officially supported configuration.
Full network config I used below.
Code:
auto lo
iface lo inet loopback

#Onboard NIC
auto eno1
iface eno1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options other_config:rstp-path-cost=20000 other_config:rstp-port-auto-edge=false other_config:rstp-enable=true other_config:rstp-port-mcheck=true other_config:rstp-port-admin-edge=false
        gso-offload off # Buggy Intel e1000 driver workaround
        tso-offload off # Buggy Intel e1000 driver workaround

#10GBE Link1
auto enp1s0f0
iface enp1s0f0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options other_config:rstp-enable=true other_config:rstp-port-mcheck=true other_config:rstp-port-admin-edge=false other_config:rstp-path-cost=2000 other_config:rstp-port-auto-edge=false

#10GBE Link2
auto enp1s0f1
iface enp1s0f1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options other_config:rstp-port-admin-edge=false other_config:rstp-port-mcheck=true other_config:rstp-enable=true other_config:rstp-path-cost=2100 other_config:rstp-port-auto-edge=false

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports eno1 enp1s0f0 enp1s0f1
        ovs_mtu 9000
        up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=61440 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
        post-up sleep 10
   
auto vmbr1
iface vmbr1 inet manual
        bridge-ports vmbr0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 9000

auto vmbr1.10
iface vmbr1.10 inet static
        address 192.168.10.3/24
        gateway 192.168.10.254
        mtu 9000

Iperf:
Code:
Connecting to host 192.168.10.4, port 5201
[  5] local 192.168.10.3 port 32992 connected to 192.168.10.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.15 GBytes  9.91 Gbits/sec   66   1.69 MBytes
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec    2   1.69 MBytes
[  5]   2.00-3.00   sec  1.15 GBytes  9.90 Gbits/sec    2   1.69 MBytes
[  5]   3.00-4.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.69 MBytes
[  5]   4.00-5.00   sec  1.15 GBytes  9.90 Gbits/sec    2   1.69 MBytes
[  5]   5.00-6.00   sec  1.15 GBytes  9.90 Gbits/sec    1   1.69 MBytes
[  5]   6.00-7.00   sec  1.15 GBytes  9.90 Gbits/sec    3   1.69 MBytes
[  5]   7.00-8.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.69 MBytes
[  5]   8.00-9.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.69 MBytes
[  5]   9.00-10.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.69 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec   76             sender
[  5]   0.00-10.00  sec  11.5 GBytes  9.89 Gbits/sec                  receiver
 
Last edited:
I found a workaround for my three node test cluster in RSTP ring configuration.
It's a hyperconverged PVE 8.1.4 cluster with ceph.
After start or reboot a node I have to restart the ovs service from node shell to get it working correctly.

# systemctl restart openvswitch-switch

Every node has dual 10G SFP+ Intel X520 NIC and a Intel 1G LOM port.
I have two OVS-bridges, one for 10G RSTP ring(vmbr0) and one for 1G LOM (vmbr1).
The LOM OVS-bridge(vmbr1) is for management(mgmt Type:IntPort) and second cororsync (link1 Type:IntPort) and the over one is for all cluster internal traffic (link0, ceph, migration, etc.)
All is working good but after a reboot of a node I have to restart OVS service on this node to get a network connection via 10G RSTP ring (vmbr0).
 
YES! Thanks! Also spent some time debugging this. Restarting the openvswitch-switch works like a charm.

Trying a bit more in the same direction, I found that

post-up sleep 60


on the bridge also did the trick for rebooting one node sometimes, but not reliably. I ended up with an /etc/rc.local that restarts the openvswitch 60s after boot.
 
Update: The /etc/rc.local solution was not stable either. I ended up with another bridge in front for just the RSTP (vmbr2). That bridge is patched to the internal bridge vmbr0 that handles all VM trafic. That one removes all RSTP packages. An additional bridge vmbr1 is used for the WAN. Stable this far.

Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr1
    ovs_options tag=500 vlan_mode=access

auto enp3s0
iface enp3s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr2
        ovs_mtu 9000
        ovs_options tag=112 vlan_mode=native-untagged trunk=42,43,44,45,110,111,113,114,115,116,501 other_config:rstp-port-mcheck=true other_config:rstp-enable=true other_config:rstp-port-admin-edge=false

auto ens1f4
iface ens1f4 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr2
    ovs_mtu 9000
        ovs_options tag=112 trunk=42,43,44,45,110,111,113,114,115,116,501 other_config:rstp-port-mcheck=true other_config:rstp-enable=true other_config:rstp-port-admin-edge=false

auto ens1f4d1
iface ens1f4d1 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr2
    ovs_mtu 9000
        ovs_options tag=112 trunk=42,43,44,45,110,111,113,114,115,116,501 other_config:rstp-port-mcheck=true other_config:rstp-enable=true other_config:rstp-port-admin-edge=false

auto lan0
iface lan0 inet static
    address 192.168.112.21/24
    gateway 192.168.112.1
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_mtu 1500
    ovs_options tag=112

auto san0
iface san0 inet static
    address 192.168.114.21/24
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_mtu 1500
    ovs_options tag=114

auto pve0
iface pve0 inet static
    address 192.168.110.21/24
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_mtu 9000
    ovs_options tag=110

auto patch0
iface patch0 inet manual
    ovs_type OVSPatchPort
    ovs_patch_peer patch2
    ovs_bridge vmbr0

auto patch2
iface patch2 inet manual
        ovs_type OVSPatchPort
        ovs_patch_peer patch0
    ovs_bridge vmbr2

auto vmbr0
iface vmbr0 inet manual
    ovs_type OVSBridge
    ovs_ports lan0 san0 pve0 patch0
        ovs_options other_config:forward-bpdu=false
    ovs_mtu 9000
#Internal

auto vmbr1
iface vmbr1 inet manual
    ovs_type OVSBridge
    ovs_ports eno1
    ovs_options other_config:forward-bpdu=false
#External

auto vmbr2
iface vmbr2 inet manual
    ovs_type OVSBridge
    ovs_ports patch2 ens1f4 ens1f4d1 enp3s0
    up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=4096
    post-up sleep 10
    ovs_mtu 9000
#RSTP

source /etc/network/interfaces.d/*
 
  • Like
Reactions: Max2048
Update: The /etc/rc.local solution was not stable either. I ended up with another bridge in front for just the RSTP (vmbr2). That bridge is patched to the internal bridge vmbr0 that handles all VM trafic. That one removes all RSTP packages. An additional bridge vmbr1 is used for the WAN. Stable this far.
Many thanks you just saved my day !
I was extending our mesh network (with wg-meshconf) and I switched to ifupdown2 to ease new configuration handling. I began to have erratic behavior until I find your post.
I tried your solution: I use 2 bridges (with different mtu) so I added 2 others patched bridges in front of them and "everything fell in operation" :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!