SDN, EVPN and Multiple BGP Controllers Issue

duckworld

New Member
Sep 29, 2025
2
0
1
I have a three node Promxox 9 cluster. Each node has a 1GbE NIC and a dual-25GbE NIC (Connect-X 4 Lx). I've been trying unsuccessfully for several days to use Proxmox's SDN capabilities to create a mesh network between the three nodes (by directly connecting the nodes with SFP28 cables) and get all of the following working:
  1. Ceph storage running on the mesh network at 25GbE
  2. Inter-VM communication running on the mesh network at 25GbE
  3. VMs can reach the LAN + WAN
  4. (e)BGP to my OPNsense router so that external clients can reach the VMs
  5. (e)BGP with multiple BGP controllers / Exit Nodes (for HA)
  6. VMs can talk to Ceph controller (e.g. for Kubernetes) at 25GbE
Creating a mesh network with an OpenFabric SDN Fabric is really easy, and I can get Ceph communicating on the Fabric without issue. I've read several guides (like this one) to run EVPN and VXLAN on the Fabric and I'm able to get VMs/containers running on different hosts to talk to each other at 25GbE. I can also add a single BGP controller pointing to my OPNsense router and announce my routes, so long as the BGP controller and the Exit Node for the EVPN zone are on the SAME node.

However, once I try to add multiple Exit Nodes and/or multiple BGP controllers, everything falls apart.

In the majority of my testing, I've been using a container 10.255.69.31 and VM 10.255.69.41 on prox01, and a container 10.255.69.34 and VM 10.255.69.44 on prox04 (all IPs within the VXLAN subnet).

The TL;DR is that no matter what I try, I will lose partial or all connectivity between my LAN and my VMs. The most consistent issue I run into is that e.g. I have VMs, 2x Exit Nodes and 2x BGP controllers running on prox01 and prox04 and on my OPNsense I'll receive routes that look something like:

ValidBestInternalNetworkNext HopMetricLocPrfWeightPathOrigin
yyn10.255.69.0/2410.4.10.310065430?
ynn10.255.69.0/2410.4.10.340065430?
yyn10.255.69.31/3210.4.10.340065430IGP
yyn10.255.69.41/3210.4.10.340065430IGP
yyn10.255.69.34/3210.4.10.310065430IGP
yyn10.255.69.44/3210.4.10.310065430IGP

The /32 routes are all for the wrong nodes, and I can't ping any of these IPs from my desktop. Depending on the config, I'll be able to get a single ping off and then they'll stop responding. If I stop the ping and wait ~5 mins, I'll be able to do a single ping again.

I've tried literally hundreds of configurations (including changes to SDN (BGP, EVPN Zone etc.), host networking, Linux tunables etc.) to get this to work, including:
  • EVPN controller + BGP controllers + OPNsense all same ASN
  • EVPN controller + BGP controllers same ASN and OPNsense different
  • EVPN controller + BGP controllers + OPNsense all different ASNs
  • BGP controllers + OPNsense same ASN and EVPN controller different
  • All BGP controllers unique ASNs, EVPN controller and OPNsense unique
There are several other posts (a handful are listed below) from the past couple of years where other people have reported seeing similar issues:
I'd really like to get multiple Exit Nodes + BGP controllers working for HA reasons, as well as for hopefully better networking performance. Has anyone been able to get a similar setup to work reliably?



For reference, here's my current (stable) setup that hits goals 1 through 4 (but not 5 and 6), using a single BGP controller + exit node:
  • OPNsense: 10.4.10.1
  • prox01: 10.4.10.31
  • prox03: 10.4.10.33
  • prox04: 10.4.10.34
Code:
# On prox01
> cat /etc/network/interfaces

auto lo
iface lo inet loopback

iface enx5847ca7b312c inet manual

iface enp1s0f0np0 inet manual
        mtu 9000

iface enp1s0f1np1 inet manual
        mtu 9000

auto vmbr0
iface vmbr0 inet static
        bridge-ports enx5847ca7b312c
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

auto vmbr0.10
iface vmbr0.10 inet static
        address 10.4.10.31/24
        gateway 10.4.10.1

Code:
# On prox01
> cat /etc/network/interfaces.d/sdn

auto vrf_evpnzone
iface vrf_evpnzone
        vrf-table auto
        post-up ip route del vrf vrf_evpnzone unreachable default metric 4278198272

auto vrfbr_evpnzone
iface vrfbr_evpnzone
        bridge-ports vrfvx_evpnzone
        bridge_stp off
        bridge_fd 0
        mtu 8950
        vrf vrf_evpnzone

auto vrfvx_evpnzone
iface vrfvx_evpnzone
        vxlan-id 10000
        vxlan-local-tunnelip 10.255.0.31
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 8950

auto vxlan_vxnet1
iface vxlan_vxnet1
        vxlan-id 10500
        vxlan-local-tunnelip 10.255.0.31
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 8950

auto vxnet1
iface vxnet1
        address 10.255.69.1/24
        post-up iptables -t nat -A POSTROUTING -s '10.255.69.0/24' -o vmbr0.10 -j SNAT --to-source 10.4.10.31
        post-down iptables -t nat -D POSTROUTING -s '10.255.69.0/24' -o vmbr0.10 -j SNAT --to-source 10.4.10.31
        post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
        post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
        hwaddress BC:24:11:D8:8F:70
        bridge_ports vxlan_vxnet1
        bridge_stp off
        bridge_fd 0
        mtu 8950
        ip-forward on
        arp-accept on
        vrf vrf_evpnzone

auto dummy_prox-of
iface dummy_prox-of inet static
        address 10.255.0.31/32
        link-type dummy
        ip-forward 1

auto dummy_prox-of
iface dummy_prox-of inet6 static
        address fc69:cefe:255::31/128
        link-type dummy
        ip-forward 1

auto enp1s0f0np0
iface enp1s0f0np0
        ip-forward 1

auto enp1s0f1np1
iface enp1s0f1np1
        ip-forward 1

Code:
> cat /etc/pve/sdn/*

evpn: proxevpn
        asn 65430
        fabric prox-of

bgp: bgpprox01
        asn 65430
        node prox01
        peers 10.4.10.1
        bgp-multipath-as-path-relax 1
        ebgp 1
        loopback dummy_prox-of

openfabric_fabric: prox-of
        csnp_interval 2
        hello_interval 1
        ip6_prefix fc69:cefe:255::/64
        ip_prefix 10.255.0.0/24

openfabric_node: prox-of_prox01
        interfaces name=enp1s0f0np0
        interfaces name=enp1s0f1np1
        ip 10.255.0.31
        ip6 fc69:cefe:255::31

openfabric_node: prox-of_prox03
        interfaces name=enp1s0f0np0
        interfaces name=enp1s0f1np1
        ip 10.255.0.33
        ip6 fc69:cefe:255::33

openfabric_node: prox-of_prox04
        interfaces name=enp65s0f0np0
        interfaces name=enp65s0f1np1
        ip 10.255.0.34
        ip6 fc69:cefe:255::34
{"zones":{"evpnzone":{"subnets":{"10.255.69.0/24":{"ips":{"10.255.69.1":{"gateway":1}}}}},"evpnPRD":{"subnets":{}},"epvnzone":{"subnets":{}}}}subnet: evpnzone-10.255.69.0-24
        vnet vxnet1
        gateway 10.255.69.1
        snat 1

vnet: vxnet1
        zone evpnzone
        tag 10500

evpn: evpnzone
        controller proxevpn
        vrf-vxlan 10000
        exitnodes prox01
        ipam pve
        mac BC:24:11:D8:8F:70
        mtu 8950

Code:
# On all nodes
> cat /etc/sysctl.d/zzz-network.conf

net.ipv4.ip_forward=1
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0

Code:
# On prox01
> cat /etc/frr/frr.conf

frr version 10.3.1
frr defaults datacenter
hostname prox01
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_evpnzone
 vni 10000
exit-vrf
!
router bgp 65430
 bgp router-id 10.4.10.31
 no bgp default ipv4-unicast
 coalesce-time 1000
 bgp disable-ebgp-connected-route-check
 bgp bestpath as-path multipath-relax
 neighbor BGP peer-group
 neighbor BGP remote-as external
 neighbor BGP bfd
 neighbor 10.4.10.1 peer-group BGP
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65430
 neighbor VTEP bfd
 neighbor VTEP update-source dummy_prox-of
 neighbor 10.255.0.33 peer-group VTEP
 neighbor 10.255.0.34 peer-group VTEP
 !
 address-family ipv4 unicast
  network 10.4.10.31/32
  neighbor BGP activate
  neighbor BGP soft-reconfiguration inbound
  import vrf vrf_evpnzone
 exit-address-family
 !
 address-family ipv6 unicast
  import vrf vrf_evpnzone
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor VTEP activate
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  advertise-all-vni
 exit-address-family
exit
!
router bgp 65430 vrf vrf_evpnzone
 bgp router-id 10.255.0.31
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 !
 address-family ipv4 unicast
  redistribute connected
 exit-address-family
 !
 address-family ipv6 unicast
  redistribute connected
 exit-address-family
 !
 address-family l2vpn evpn
  default-originate ipv4
  default-originate ipv6
 exit-address-family
exit
!
ip prefix-list loopbacks_ips seq 10 permit 0.0.0.0/0 le 32
ip prefix-list only_default seq 1 permit 0.0.0.0/0
!
ipv6 prefix-list only_default_v6 seq 1 permit ::/0
!
route-map MAP_VTEP_IN deny 1
 match ip address prefix-list only_default
exit
!
route-map MAP_VTEP_IN deny 2
 match ipv6 address prefix-list only_default_v6
exit
!
route-map MAP_VTEP_IN permit 3
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
route-map correct_src permit 1
 match ip address prefix-list loopbacks_ips
 set src 10.4.10.31
exit
!
ip protocol bgp route-map correct_src
router openfabric prox-of
 net 49.0001.0102.5500.0031.00
exit
!
interface dummy_prox-of
 ipv6 router openfabric prox-of
 ip router openfabric prox-of
 openfabric passive
exit
!
interface enp1s0f0np0
 ipv6 router openfabric prox-of
 ip router openfabric prox-of
 openfabric hello-interval 1
 openfabric csnp-interval 2
exit
!
interface enp1s0f1np1
 ipv6 router openfabric prox-of
 ip router openfabric prox-of
 openfabric hello-interval 1
 openfabric csnp-interval 2
exit
!
access-list pve_openfabric_prox-of_ips permit 10.255.0.0/24
!
ipv6 access-list pve_openfabric_prox-of_ip6s permit fc69:cefe:255::/64
!
route-map pve_openfabric permit 100
 match ip address pve_openfabric_prox-of_ips
 set src 10.255.0.31
exit
!
route-map pve_openfabric6 permit 110
 match ipv6 address pve_openfabric_prox-of_ip6s
 set src fc69:cefe:255::31
exit
!
ip protocol openfabric route-map pve_openfabric
!
ipv6 protocol openfabric route-map pve_openfabric6
!
!
line vty
 
I believe I've found the fix! It turns out that net.ipv4.conf.all.rp_filter and net.ipv4.conf.default.rp_filter don't work like you'd probably expect. It looks like this settings file is being read too late during startup - after most/all of the network interfaces have been created - so setting a "default" value doesn't really do anything.

Modifying my /etc/sysctl.d/zzz-network.conf to:

Code:
net.ipv4.ip_forward=1
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.*.rp_filter=0

fixes the issue - I can have multiple Exit Nodes and BGP controllers at the same time and everything works as expected!

There's still the issue where the BGP routes have the "Next Hop" as the node that the VM is NOT running on, but right now I don't care and I might just use a Prefix List in OPNsense to filter out the /32 routes, keeping only the /24 routes.



For anyone in the future: I've kept my SDN setup the same as the above post, just with all nodes as EVPN Zone Exit Nodes and (currently) two nodes with BGP Controllers. I have my OPNsense router using AS 65401, and my EVPN controller and all of the BGP controllers are using AS 65430.

Here's the output of cat /etc/pve/sdn/* for reference:

Code:
evpn: proxevpn
        asn 65430
        fabric prox-of

bgp: bgpprox01
        asn 65430
        node prox01
        peers 10.4.10.1
        bgp-multipath-as-path-relax 1
        ebgp 1
        loopback dummy_prox-of

bgp: bgpprox04
        asn 65430
        node prox04
        peers 10.4.10.1
        bgp-multipath-as-path-relax 1
        ebgp 1
        loopback dummy_prox-of

openfabric_fabric: prox-of
        csnp_interval 2
        hello_interval 1
        ip6_prefix fc69:cefe:255::/64
        ip_prefix 10.255.0.0/24

openfabric_node: prox-of_prox01
        interfaces name=enp1s0f0np0
        interfaces name=enp1s0f1np1
        ip 10.255.0.31
        ip6 fc69:cefe:255::31

openfabric_node: prox-of_prox03
        interfaces name=enp1s0f0np0
        interfaces name=enp1s0f1np1
        ip 10.255.0.33
        ip6 fc69:cefe:255::33

openfabric_node: prox-of_prox04
        interfaces name=enp65s0f0np0
        interfaces name=enp65s0f1np1
        ip 10.255.0.34
        ip6 fc69:cefe:255::34
{"zones":{"evpnzone":{"subnets":{"10.255.69.0/24":{"ips":{"10.255.69.1":{"gateway":1}}}}},"evpnPRD":{"subnets":{}},"epvnzone":{"subnets":{}}}}subnet: evpnzone-10.255.69.0-24
        vnet vxnet1
        gateway 10.255.69.1
        snat 1

vnet: vxnet1
        zone evpnzone
        tag 10500

evpn: evpnzone
        controller proxevpn
        vrf-vxlan 10000
        exitnodes prox03,prox04,prox01
        ipam pve
        mac BC:24:11:D8:8F:70
        mtu 8950

Finally, I have no idea if this is needed, but in OPNsense I have the tunable net.route.multipath set to 1.
 
Last edited: