BGP-EVPN, VxLAN and FRR troubles

pkcz

Active Member
Mar 16, 2020
10
1
43
Hello all,

Im trying SDN technology preview on PVE 8.4. I ran into a problem like many others before me.
BGP-EVPN + VxLAN. It is working if you have all PVE nodes in one local network (one VLAN) only.
If you put some nodes in different networks, it breaks.

The core of problem is the FRR can't see neighbours over one hop (via default gw).

My setup node1 (100.64.1.101), node2 (100.64.1.102), node3 (100.64.1.103) works fine together.
But not works after adding node4 (192.168.1.104). They dont see new one on BGP.

Code:
node1# vtysh -c "show ip bgp nexthop"
Current BGP nexthop cache:
 100.64.1.102 valid [IGP metric 0], #paths 1, peer 100.64.1.102
  if vmbr0
  Last update: Wed Nov  6 15:18:15 2024

 100.64.1.103 valid [IGP metric 0], #paths 4, peer 100.64.1.103
  if vmbr0
  Last update: Wed Nov  6 15:18:15 2024

 192.168.1.104 invalid, #paths 0, peer 192.168.1.104
  Last update: Mon Nov 11 14:28:54 2024

The FRR option ip nht resolve-via-default doesn help, this workaround must be used

no ip nht resolve-via-default
ip route 192.168.1.104/32 100.64.1.1


And the most exciting thing is coming now.

1) if I take a frr.conf generated by Proxmox SDN and copy it (with router-id changed only) to openSUSE Thumbleweed (frr 10.0.2)
everything is working as should be.

2) if I take a frr.conf generated by Proxmox SDN and copy it (with router-id changed only) to Debian 12 (frr/stable,stable-security,now 8.4.4-1.1~deb12u1) it is not working, but config option ip nht resolve-via-default fix it.

3) On PVE 8.4 (frr/stable,now 8.5.2-1+pve1) mentioned workaround must be used to get things work.

PS. "It is working" means the node see all other nodes even behind default gw on BGP.
 
Thanks a lot for sharing this!

I was trying to get a VXLAN working between 2 remote sites connected via wireguard using this EVPN SDN and it wouldn't work.
I modified /etc/frr/frr.conf on all my nodes and all is working now :)
 
I'm running into a the same or a similar issue.

I have three sites/networks all configured with evpn. This issue is impacting two sites, which are also exit nodes which receive a default route via bgp. The third site (witness) works perfectly.

Both the static route and resolve-via-default fix the issue for me. But the changes are removed from the configuration when any changes are applied from the UI. The removal of the workaround from the configuration does not impact existing bgp connections, but a frr service restart, or reboot will cause loss of connectivity.

How did you make the workaround persistent?

I've tried adding the commands to frr.conf.local but they don't appear to have an impact.

Our configuration is going to be fluid for a while and I wouldn't want a reload due to a vnet change to leave the config in a state where a reboot will cause a network partition.

Thanks,

Erik