SDN: Nodes do not advertise their own routes

virtual_door

New Member
Nov 5, 2024
5
0
1
I'm posting a separate thread to nail down the issue talked about here. The short of the problem is that in an SDN configuration, nodes do not advertise their own type 2 EVPN routes to BGP (non-EVPN) peers. This causes inefficient routing. And if you're unfortunate enough to use a switch which doesn't support BGP ECMP *cough cough Mikrotik*, then this problem is compounded further.

In the previous thread, @spirit suggested that a workaround is to use a switch that supports EVPN and use it as a normal VTEP. For some of us, that doesn't work (again, looking at you Mikrotik).

Alternatively, spirit suggested to define only one or two exit nodes, and use those to peer with your switch. This functionally works, but doesn't address the root problem which causes the inefficient routing.

And I totally understand this part in the previous thread
>>So every node shares only the routes to IP addresses of VMs, that should not be shared by the specific node.
This is normal, because local vm ip is bridged, so you don't need any route to access ip.
My argument would be that although that is true from the perspective of the node itself, it's not true from the perspective of whatever peers with that single node.

Anyway, I'd like to highlight this issue in FRR about this exact problem. It looks like it hasn't been triaged quite yet, but this may actually be a bug in FRR and unexpected behavior. If FRR fixes this, it would fix the problem here, too.
 
How does your SDN configuration look like (in particular the controllers)? Do you have an EVPN controller?
 
All the hosts (3) running Proxmox act as an EVPN controller. They are all connected to the same switch, but using L3 interfaces for connectivity. Each host peers via BGP with the switch.

The switch does not support EVPN, so the hosts can only announce regular BGP4/6 routes.

Here's the frr config on one of the hosts
Code:
!
frr version 8.5.2
frr defaults datacenter
hostname pod2
log syslog informational
service integrated-vtysh-config
!
vrf vrf_dev
 vni 100
exit-vrf
!
vrf vrf_prod
 vni 200
exit-vrf
!
router bgp 64550
 bgp router-id 10.1.254.10
 no bgp hard-administrative-reset
 no bgp default ipv4-unicast
 coalesce-time 1000
 no bgp graceful-restart notification
 neighbor VTEP peer-group
 neighbor VTEP remote-as 64550
 neighbor VTEP bfd
 neighbor VTEP update-source 10.1.254.10
 neighbor core peer-group
 neighbor 10.1.254.11 peer-group VTEP
 neighbor 10.1.254.12 peer-group VTEP
 neighbor 10.1.252.1 remote-as 64600
 neighbor 10.1.252.1 peer-group core
 !
 address-family ipv4 unicast
  network 10.1.254.10/32
  neighbor core activate
  neighbor core allowas-in
  neighbor core route-map CORE-NETS-IN in
  neighbor core route-map CORE-NETS-OUT out
  import vrf vrf_prod
 exit-address-family
 !
 address-family ipv6 unicast
  neighbor core activate
  neighbor core allowas-in
  neighbor core route-map CORE-NETS-IN in
  neighbor core route-map CORE-NETS-OUT out
  import vrf vrf_prod
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor VTEP activate
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  advertise-all-vni
 exit-address-family
exit
!
router bgp 64550 vrf vrf_dev
 bgp router-id 10.1.254.10
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
exit
!
router bgp 64550 vrf vrf_prod
 bgp router-id 10.1.254.10
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 !
 address-family ipv4 unicast
  redistribute connected
 exit-address-family
 !
 address-family ipv6 unicast
  redistribute connected
 exit-address-family
 !
 address-family l2vpn evpn
  default-originate ipv4
  default-originate ipv6
 exit-address-family
exit
!
ip prefix-list only_default seq 1 permit 0.0.0.0/0
!
ipv6 prefix-list only_default_v6 seq 1 permit ::/0
!
route-map CORE-NETS-IN permit 1
exit
!
route-map CORE-NETS-OUT permit 1
exit
!
route-map MAP_VTEP_IN deny 1
 match ip address prefix-list only_default
exit
!
route-map MAP_VTEP_IN deny 2
 match ipv6 address prefix-list only_default_v6
exit
!
route-map MAP_VTEP_IN permit 3
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
end

On each host, the BGP table shows only the routes to the L2 VNIs and /32 type 2 EVPN routes originating from other hosts, not its own.
 
Sorry, I dind't see your original message.

I really don't think they are other way currently. The exit-nodes on the proxmox node themself, where the vm are running, is more a trick to be able to reach outside network without need to have an extra node hardware.

All frr && cumulus/mellanox doc always have dedicated exit-nodes. (can be dedicated hosts or physical routers having evpn routing).

(Put 2 nodes behind your 2 mikrotik routers for example, to add the evpn functionnality)

I'll follow the frr issue if improvement is coming for next version
 
Hello!,

I believe I'm seeing the same thing. Just wanted to make sure it's the same case :)

Having 4 nodes:

peerA
peerB
peerC
peerD

I deploy a single VM (192.168.0.101/24) in peerC and from the external BGP node I see these routes being pushed:

peerA
192.168.0.0/24
192.168.0.101/32
peerB
192.168.0.0/24
192.168.0.101/32
peerC
192.168.0.0/24
peerD
192.168.0.0/24
192.168.0.101/32

Traffic always work via ECMP and internal forwarding, which is expected for the /24 example, but there seems to be a missed opportunity having already pushed the specific /32. The odd thing is that all but the actual VM host is pushing the specific route.

Is this expected?, could be just a configuration thing?.

Makes more sense to:
- Option A
* push just /24 through all nodes, live with the random entry point

- Option B
* push /24 through all nodes as a "catch all" rule
* /32 with the correct host as next hop
* efficient route for known /32 workloads

But we're with:

- Option C
* push /24, live with the random entry point
* push overlapping useless /32 poluting the routing table
 
Last edited:
Just wondering if there's been any changes in this space? (Trying it out now and I can only see the /24 routes on the bgp external to the cluster.

Edit: Some /32 routes appear to have made it out of the cluster however it seems at random. I've also seen behaviour where A VM is reachable from external to the cluster and 2 of 3 nodes are advertising the /32 but after migrating it to one of those nodes it stops advertising the /32 and the other two begin advertising it. Essentially not advertising the /32 from only the node that contains the VM. Is someone able to explain the logic behind what's advertised and when?
 
Last edited: