SDN / EVPN - can we use VRF's to keep EVPN/BGP away from Hypervisor Mangement Routing?

rendrag

New Member
Dec 10, 2024
4
0
1
Hello,

I'm currently labbing up proxmox to see if we can replace our VMWare/NSX-T deployments with it. The initial test looked really promising. Then I went to do a closer-to-production deployment, with separated management and routing networks, and it's all fallen apart. Here's the basic networking overview for what I'm trying to do.

0fc6b7c9-a990-42ac-97cd-96b1ff3c2d41.png
With the cluster formed, and management configured, it was working great. Setup EVPN, BGP controllers, and it just wasn't working.. SSH'd into a hypervisor, and looked at the routing table, and realised it isn't creating a VRF/route table to do all the base EVPN routing in - it's just inserting the default routes from BGP into the base routing table, so now there are two separate sets of default routes - one out the management network, one out the public network (i.e. via the IP's of 10G-P1 and 10G-P2 on VLAN2 in the above diagram).

The basic premise we're aiming for is that the hypervisors must only be reachable on the MGMT network, and must only be able to talk outbound via the MGMT network. VM's behind EVPN must only be able to talk outbound via VLAN2 networking (or on a trunked vlan, but I'm not testing that right, now as I figure that's 'normal' functionality)

Did I miss a tickbox somewhere to tell it that the EVPN routing must be separate from the hypervisor routing? Or is this not possible with the Proxmox SDN as currently implemented? Would I be better just using VXLAN vnets, and then running a couple of VyOS VM's inside the cluster doing the BGP+EVPN part of the equation?

Thankyou!
 
Last edited:
SDN should create a separate VRF for each zone. So if you create an EVPN controller and attach it to a zone, it should insert the learned routes into the VRF of the zone, not the default VRF. So EVPN routing should already be separate from the hypervisor routing. Which routes are you seeing in the default routing table that shouldn't be there? The routes for the underlay network? The BGP controller is for creating an underlay network, which gets handled by the default routing table.
 
Thanks Stefan,

Yeah, I was expecting a separate VRF for each zone, but the BGP controllers seem to just be putting the routing for the EVPN zones into the default routing table.
I feel like I've missed something important here that I'm not quite putting my finger on.. Are the BGP controllers not the routing link between the EVPN zone(s) and external upstream routers?

Edit: No, you're kind of right - the routes for the interfaces in the EVPN zones are in VRF's - but the routes being imported by the BGP controllers (i.e. default routes) are going into the main routing table, which is affecting the hypervisor connectivity - and outbound routing from the EVPN VRF's is using the main routing table, which is using a mix of BGP and static routes.

i.e. routing table looks like:
inside of frr vtysh:
Bash:
proxmox-hv-01# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

B>* 0.0.0.0/0 [20/0] via 100.127.254.113, bond0.2279, weight 1, 00:17:59
  *                  via 100.127.254.114, bond0.2279, weight 1, 00:17:59
C>* 10.68.70.0/24 is directly connected, vmbr0, 00:18:05
B>* 27.50.65.0/28 [20/0] is directly connected, vnet1 (vrf vrf_evx1), weight 1, 00:18:04
C>* 100.126.9.16/28 is directly connected, bond0.2278, 00:18:05
C>* 100.127.254.112/29 is directly connected, bond0.2279, 00:18:05

And back out in the shell:
Bash:
root@proxmox-hv-01:~# ip route
default via 10.68.70.1 dev vmbr0 proto kernel onlink
default nhid 28 proto bgp metric 20
        nexthop via 100.127.254.114 dev bond0.2279 weight 1
        nexthop via 100.127.254.113 dev bond0.2279 weight 1
10.68.70.0/24 dev vmbr0 proto kernel scope link src 10.68.70.121
27.50.65.0/28 nhid 22 dev vnet1 proto bgp metric 20
100.126.9.16/28 dev bond0.2278 proto kernel scope link src 100.126.9.20
100.127.254.112/29 dev bond0.2279 proto kernel scope link src 100.127.254.115

I shouldn't technically even see that 27.50.65.0/28 route in the main routing table, but I guess because frr is leaking it into the main table for the BGP controller, it's ending up there?

Should I not be creating the vlan 2279 (what bgp connects over) as a simple vlan, and be creating it as an SDN vlan so that it's in a vrf? or would the BGP controller still end up pushing the routing into the main routing table?
 
Last edited:
but the routes being imported by the BGP controllers (i.e. default routes) are going into the main routing table, which is affecting the hypervisor connectivity - and outbound routing from the EVPN VRF's is using the main routing table, which is using a mix of BGP and static routes.

I shouldn't technically even see that 27.50.65.0/28 route in the main routing table, but I guess because frr is leaking it into the main table for the BGP controller, it's ending up there?

There are different address families in play: IPv4/6 and L2VPN EVPN. BGP controller is for IPv4/6 routes and imports them into the default routing table. EVPN controller is for L2VPN EVPN routes and imports them into the respective VRF (depending on the RT). If you want to externally announce routes into your EVPN network, you need to do this via via the L2VPN EVPN address family, not the IPv4/6 families. It seems like the problem is that you are announcing the routes in the IPv4/6 address family instead, which causes them to be imported into the default routing table. You need to setup your external peer to properly announce the routes in the correct address family.

The BGP controller is for when you want to use BGP as the IGP, not for announcing routes for the overlay network (= EVPN). So if you don't want to use BGP as your IGP you don't need a BGP controller at all.
 
Thanks Stefan,

So it sounds like no, it's not possible to have the Proxmox EVPN peer with an external BGP peer without mixing with the hypervisor management routing table at this time? I did try adding the BGP peers to the evpn controller peer list, but this caused the frr config to add them as peers in the VTEP peer group, and add them to the l2vpn evpn address-family, which of course caused the switches to reject the BGP connections, as they were expecting ipv4/ipv6 unicast address-family connections.

I'll have a look at the option of running a pair of HA VyOS vm's in-stack per cluster as EVPN peers to do the EVPN>BGP routing, although I really wanted something UI-based, as that leaves it in SysEng control, and not needing NOC, which makes it more likely to be accepted as a replacement for vmware.

Thanks,

Damien