I'm posting a separate thread to nail down the issue talked about here. The short of the problem is that in an SDN configuration, nodes do not advertise their own type 2 EVPN routes to BGP (non-EVPN) peers. This causes inefficient routing. And if you're unfortunate enough to use a switch which doesn't support BGP ECMP *cough cough Mikrotik*, then this problem is compounded further.
In the previous thread, @spirit suggested that a workaround is to use a switch that supports EVPN and use it as a normal VTEP. For some of us, that doesn't work (again, looking at you Mikrotik).
Alternatively, spirit suggested to define only one or two exit nodes, and use those to peer with your switch. This functionally works, but doesn't address the root problem which causes the inefficient routing.
And I totally understand this part in the previous thread
Anyway, I'd like to highlight this issue in FRR about this exact problem. It looks like it hasn't been triaged quite yet, but this may actually be a bug in FRR and unexpected behavior. If FRR fixes this, it would fix the problem here, too.
In the previous thread, @spirit suggested that a workaround is to use a switch that supports EVPN and use it as a normal VTEP. For some of us, that doesn't work (again, looking at you Mikrotik).
Alternatively, spirit suggested to define only one or two exit nodes, and use those to peer with your switch. This functionally works, but doesn't address the root problem which causes the inefficient routing.
And I totally understand this part in the previous thread
My argument would be that although that is true from the perspective of the node itself, it's not true from the perspective of whatever peers with that single node.>>So every node shares only the routes to IP addresses of VMs, that should not be shared by the specific node.
This is normal, because local vm ip is bridged, so you don't need any route to access ip.
Anyway, I'd like to highlight this issue in FRR about this exact problem. It looks like it hasn't been triaged quite yet, but this may actually be a bug in FRR and unexpected behavior. If FRR fixes this, it would fix the problem here, too.