Bugfix for EVPN SDN Multiple Exit Nodes

niwamo · Dec 8, 2023

When setting up the EVPN SDN a few weeks back, I encountered an issue: Selecting more than one exit node for a zone broke all external connectivity, regardless of whether a primary node was selected, and regardless of whether SNAT was turned on. (And yes, I set my rp_filters correctly.)

A bit of research confirmed that others had encountered the same issue; See here and and here.

I looked into it, found the bug, and have thoroughly verified the fix on the latest release of libpve-network-perl (0.9.5).

The problem:

By advertising a default route on all gateway nodes, packets are guaranteed to loop until TTL death
VXLAN interfaces only tunnel packets across vrfs when no entries are matched in the forwarding table
(See https://www.kernel.org/doc/Documentation/networking/vxlan.txt)
If multiple nodes are advertising default routes, then no node will not have a default route, and nodes will always prefer the default route over popping the packet across the vxlan interface, leading to packet looping

Solution:

ONLY 'default-originate' if the node is a "primary exit node"
This allows multiple exit nodes to function properly with or without a primary exit node; It also allows SNAT to work with multiple exit nodes with or without a primary exit node. I've seen some of the SDN developers claim SNAT "requires" a primary exit-node - that is absolutely not true, it is working in my lab after fixing the actual root cause.

Fortunately, the fix is quite easy:

Original (/usr/share/perl5/PVE/Network/SDN/Controllers/EvpnPlugin.pm)

Perl:

    @controller_config = ();
    #add default originate to announce 0.0.0.0/0 type5 route in evpn
    push @controller_config, "default-originate ipv4";
    push @controller_config, "default-originate ipv6";
    push(@{$config->{frr}->{router}->{"bgp $asn vrf $vrf"}->{"address-family"}->{"l2vpn evpn"}}, @controller_config);

Fixed

Perl:

    if ($exitnodes_primary eq $local_node) {
        @controller_config = ();
        #add default originate to announce 0.0.0.0/0 type5 route in evpn
        push @controller_config, "default-originate ipv4";
        push @controller_config, "default-originate ipv6";
        push(@{$config->{frr}->{router}->{"bgp $asn vrf $vrf"}->{"address-family"}->{"l2vpn evpn"}}, @controller_config);
    }

If you want to do some quick testing, I also scripted out the updates with perl. Note that I'm also changing get_standard_option('pve-node') to get_standard_option('pve-node-list'); In the latest release of PVE, you're unable to not select a primary exit node (in the UI at least) unless you make this change.

Bash:

perl -i -pe "s/\'exitnodes-primary\' => get_standard_option\(\'pve-node\'/\'exitnodes-primary\' => get_standard_option\(\'pve-node-list\'/" /usr/share/perl5/PVE/Network/SDN/Zones/EvpnPlugin.pm;
perl -i -p0e 's/^*\s\@controller_config = \(\);\s*\#add default originate to announce 0.0.0.0\/0 type5 route in evpn\s*push \@controller_config, "default-originate ipv4";\s*push \@controller_config, "default-originate ipv6";\s*push\(\@\{\$config->\{frr\}->\{router\}->\{"bgp \$asn vrf \$vrf"\}->\{"address-family"\}->\{"l2vpn evpn"\}\}, \@controller_config\);/\tif \(\$exitnodes_primary eq \$local_node\) \{\n\t\t\@controller_config = \(\);\n\t\t\#add default originate to announce 0.0.0.0\/0 type5 route in evpn\n\t\tpush \@controller_config, "default-originate ipv4";\n\t\tpush \@controller_config, "default-originate ipv6";\n\t\tpush\(\@\{\$config->\{frr\}->\{router\}->\{"bgp \$asn vrf \$vrf"\}->\{"address-family"\}->\{"l2vpn evpn"\}\}, \@controller_config\);\n\t\}/s' /usr/share/perl5/PVE/Network/SDN/Controllers/EvpnPlugin.pm;
systemctl restart pveproxy.service pvedaemon.service;

I'm new to the forum and unsure how to tag members. I'd like to pull in spirit, who I know has been very active on the SDN project. If anyone can help me out there, I'd appreciate it.

I found several other (smaller) bugs, and I have some additional suggestions - things I've implemented in my own lab that I believe make the user experience easier and/or nicer. If all goes well, with this bugfix, hoping to work with the team on the others as well.

spirit · Dec 8, 2023

Hi,

I have already fixed this bug
https://forum.proxmox.com/threads/s...using-multiple-exit-nodes.137362/#post-612296

but it's not yet available in official repo

you can try this package

"
wget https://mutulin1.odiso.net/libpve-network-perl_0.9.5_all.deb
dpkg -i libpve-network-perl_0.9.5_all.deb
"

content on pending patches:
https://lists.proxmox.com/pipermail/pve-devel/2023-December/060906.html
https://lists.proxmox.com/pipermail/pve-devel/2023-December/060910.html

The problem is a regression/but in frr 8.5 on pve8
The bug in frr was the filtering of import of default route from other node.
Each exit-node need to announce the default, but they shouldn't import the route announced by other exit-nodes. (or you'll have a loop).

ONLY 'default-originate' if the node is a "primary exit node"

This allows multiple exit nodes to function properly with or without a primary exit node; It also allows SNAT to work with multiple exit nodes with or without a primary exit node. I've seen some of the SDN developers claim SNAT "requires" a primary exit-node - that is absolutely not true, it is working in my lab after fixing the actual root cause.

That make not sense to disable default route, that's just the main role of an exit-node, announce the default to route to outside.
It's like you use only 1 exit-node if you disable default route on other nodes . (and that's why snat is working, because you have only 1exit-node).

S-nat need a primary exit-node because of conntrack., traffic need to always go out/in on the same node.

niwamo · Dec 8, 2023

1. You actually don't need default routes inside the vrf at all - VXLAN interfaces tunnel packets across the veth pair and into the default vrf if no route in the vrf's routing table matches the destination. In my current lab, none of my exit nodes has a default route inside the vrf. It not only works fine, it works as VXLAN interfaces are designed to. The only use for a default route in this scenario is to force all nodes to send their traffic through a single exit node, i.e. the primary exit node.

2. SNAT absolutely does not need a primary exit-node. It is working right now in my lab with multiple exit nodes, and no primary exit node. Yes, traffic does need to always go out and back in on the same node, but that's not a barrier. If i have node 1 (192.168.0.101) and node 2 (192.168.0.102), both are exit nodes, and both perform SNAT, then packets exiting on node 1 now have a SNAT'd return address of 192.168.0.101 and packets exiting on node 2 now have a SNAT'd return address of 192.168.0.102, ensuring packets are returned to the same nodes they exited. It's only an issue if you (for some reason) SNAT'd to the same IP on both nodes, and I can't think of a reason you'd do that.

I would be happy to show you a live proof of both of these claims.

spirit · Dec 8, 2023

niwamo said:
1. You actually don't need default routes inside the vrf at all - VXLAN interfaces tunnel packets across the veth pair and into the default vrf if no route in the vrf's routing table matches the destination. In my current lab, none of my exit nodes has a default route inside the vrf. It not only works fine, it works as VXLAN interfaces are designed to. The only use for a default route in this scenario is to force all nodes to send their traffic through a single exit node, i.e. the primary exit node.

niwamo said:
2. SNAT absolutely does not need a primary exit-node. It is working right now in my lab with multiple exit nodes, and no primary exit node. Yes, traffic does need to always go out and back in on the same node, but that's not a barrier. If i have node 1 (192.168.0.101) and node 2 (192.168.0.102), both are exit nodes, and both perform SNAT, then packets exiting on node 1 now have a SNAT'd return address of 192.168.0.101 and packets exiting on node 2 now have a SNAT'd return address of 192.168.0.102, ensuring packets are returned to the same nodes they exited. It's only an issue if you (for some reason) SNAT'd to the same IP on both nodes, and I can't think of a reason you'd do that.

Well, that's works indeed if all your nodes are exit-nodes, and they are routing the traffic from their local vms.
(but, I'm not sure It'll survive if you live migrate a vm, as the natted ip will change and conntrack could break on already established connections).

You need to think about setup where you have 1 or 2 exit-nodes (maybe doing bgp with upstream routers), and maybe 20 others nodes non-exit nodes.
That's why default are announced.
(But indeed, for local vms on exit-nodes, you don't need it)

It could be great if you can test the .deb so see if it's also fixing the bug for you.

niwamo · Dec 8, 2023

Default routes - That's a good point. If the new package fixes the filtering of other nodes' default routes on exit nodes, that is indeed a better solution. I'll test your package this weekend.

SNAT - Yep, a live migration would break active connections. However, I think it's worth documenting as an option (particularly for non-HA resources), rather than telling people it's completely impossible.

Some additional things I've noticed:

SNAT affects overlapping IP ranges in separate zones. Ideally, we could create overlapping subnets in separate zones with no concern for how that might affect behavior (i.e., like a VPC in AWS). Because SNAT is currently applied on the outbound interface (as it must be), those rules are unable to differentiate between the source zone. A solution would be to use two rules, one on the inbound vrf interface to mark the packet, then to reference the mark on the outbound interface. I.e.

Code:

iptables -t nat -A PREROUTING -i <vrf interface> -s <subnet> -j MARK --set-mark <vxlan tag #>
iptables -t nat -A POSTROUTING -o <out interface> -s <subnet> -m mark --mark <vxlan tag #> -j SNAT --to-source <node IP>

A pretty major one: Plugin does not remove iptables rules when SNAT selections are changed (or deleted). Additionally, because the plugin creates post-up and post-down rules, and ifreload (what PVE uses to make network changes) never executes post-down rules, you end up with a bunch of duplicate iptables rules. I think the best way to fix both of these issues is to add a comment to all SDN-managed rules and request a change in Proxmox's network API to clear all rules with that tag on network reloads. I've tested it locally and have yet to identify any side-effects.

Code:

post-up $iptables -t nat -A POSTROUTING -s '$cidr' -o $outiface -m mark --mark $tag -j SNAT --to-source $outip -m comment --comment 'PVE_SDN_MANAGED'

It would also be nice to have an option in the GUI to advertise SDN subnets to external BGP peers defined with BGP controllers. I've implemented a version of this locally to advertise routes to my edge router. This one would take more doing, given the requirement for GUI changes.
I'm also quite confused about the intent around "exit nodes local routing," but that's perhaps best saved for its own conversation.

spirit · Dec 8, 2023

rather than telling people it's completely impossible.

I don't remember, it's in the doc ?
I need to rework doc a litttle bit, adding schemas and differents setups / usecase.

niwamo said:
Some additional things I've noticed:

SNAT affects overlapping IP ranges in separate zones. Ideally, we could create overlapping subnets in separate zones with no concern for how that might affect behavior (i.e., like a VPC in AWS). Because SNAT is currently applied on the outbound interface (as it must be), those rules are unable to differentiate between the source zone. A solution would be to use two rules, one on the inbound vrf interface to mark the packet, then to reference the mark on the outbound interface. I.e.

Code:

iptables -t nat -A PREROUTING -i <vrf interface> -s <subnet> -j MARK --set-mark <vxlan tag #> iptables -t nat -A POSTROUTING -o <out interface> -s <subnet> -m mark --mark <vxlan tag #> -j SNAT --to-source <node IP>

yes, I was planning to work on this, thanks for the rules ! (do you use them in production ?)

niwamo said:
A pretty major one: Plugin does not remove iptables rules when SNAT selections are changed (or deleted). Additionally, because the plugin creates post-up and post-down rules, and ifreload (what PVE uses to make network changes) never executes post-down rules, you end up with a bunch of duplicate iptables rules. I think the best way to fix both of these issues is to add a comment to all SDN-managed rules and request a change in Proxmox's network API to clear all rules with that tag on network reloads. I've tested it locally and have yet to identify any side-effects.

niwamo said:

yes, it's the next thing on roadmap. I think I'll manage nat rules with a daemon (maybet thourgh pve-firewall daemon), because post-up/down are more a hack and not a google place to handle this.

niwamo said:
It would also be nice to have an option in the GUI to advertise SDN subnets to external BGP peers defined with BGP controllers. I've implemented a version of this locally to advertise routes to my edge router. This one would take more doing, given the requirement for GUI changes.

It doesn't work with the "advertise-subnet"option on the zone ?
If I remember, it should announce the /32 of each vm by default, and subnets if you enable the option.

Can you post your changes ? I'll be happy to integrate them .

(I'm using evon in production at work, but I'm using physical arista routers as evpn exit-nodes directly, so I don't always see bug/regression on this part until a proxmox user report it)

niwamo said:
I'm also quite confused about the intent around "exit nodes local routing," but that's perhaps best saved for its own conversation.

It was a specific feature asked by a proxmox forum user, to be able to join a vm in the evpn zone, from the exit node directly.

niwamo · Dec 8, 2023

spirit said:
I don't remember, it's in the doc ?
I need to rework doc a litttle bit, adding schemas and differents setups / usecase.

The current docs say Primary Exit Node is necessary for SNAT. I seem to recall it being said in more than that section, but could be remembering incorrectly. Anyways, next time the docs are updated, I'd suggest removing "necessary if you want to use SNAT" from the EVPN Zones section and adding something to the effect of "Note that using SNAT with multiple EVPN gateway nodes may cause interruptions to active connections during a live migration" to the Subnets section.

spirit said:
yes, I was planning to work on this, thanks for the rules ! (do you use them in production ?)

Just in the lab for now; I try not to use my unofficial hacks in prod

In any case, the production cluster I manage is not actually business critical.
Here's the actual source for the rules I've been using (Zones/EvpnPlugin.pm --> generate_sdn_config):

Perl:

push @iface_config, "post-up $iptables -t nat -A PREROUTING -s '$cidr' -i $vnetid -j MARK --set-mark $tag -m comment --comment 'PVE_SDN_MANAGED'";
push @iface_config, "post-up $iptables -t nat -A POSTROUTING -s '$cidr' -o $outiface -m mark --mark $tag -j SNAT --to-source $outip -m comment --comment 'PVE_SDN_MANAGED'";

spirit said:
yes, it's the next thing on roadmap. I think I'll manage nat rules with a daemon (maybet thourgh pve-firewall daemon), because post-up/down are more a hack and not a google place to handle this.

Nice, I can see it going both ways. Using the comments is a bit hacky but requires <10 lines of change to the current codebase. Using the daemon would be more powerful/flexible, but probably more complex. I haven't messed with the pve-firewall codebase, so can't say for sure. Either way, I'd be interested in contributing if it's helpful.

spirit said:
It doesn't work with the "advertise-subnet"option on the zone ?
If I remember, it should announce the /32 of each vm by default, and subnets if you enable the option.

Can you post your changes ? I'll be happy to integrate them .

(I'm using evon in production at work, but I'm using physical arista routers as evpn exit-nodes directly, so I don't always see bug/regression on this part until a proxmox user report it)

It was a specific feature asked by a proxmox forum user, to be able to join a vm in the evpn zone, from the exit node directly.

I will have to write this up over the weekend... what I'm running into is actually a combination of the local routing and advertise subnets settings.

I believe the /32 advertisements do work if configured a particular way, but the routes are advertised to external peers in the most amusing way possible - for example, if "vm1" is running on node1, every node except node1 will advertise a route to it, so you're guaranteed to never take the most efficient, or "correct," network path.

I don't think I was able to advertise the subnets until I made my local changes, but I will test again with the latest package when I'm doing the write-up.

Using the routers as evpn nodes is definitely the way to go, if they support it.

niwamo · Dec 12, 2023

Okay, reporting back --

spirit said:
wget https://mutulin1.odiso.net/libpve-network-perl_0.9.5_all.deb
dpkg -i libpve-network-perl_0.9.5_all.deb

Your still-in-dev package fixes the default route filtering for multiple exit-node configurations; however, it works too well and also filters default routes on non-exit nodes. So it fixes one problem and creates another.

Given the same two-node zone, with only one exit node selected, my non-exit node had this route table with your in-dev package:

Code:

unreachable default metric 4278198272
<EVPN ZONE SUBNET>/24 dev advert proto kernel scope link src <EVPN ZONE SUBNET GATEWAY>

and this route table with the "prod" package:

Code:

default nhid 43 via <MY EXIT NODE's IP> dev vrfbr_advert proto bgp metric 20 onlink
unreachable default metric 4278198272
<EVPN ZONE SUBNET>/24 dev advert proto kernel scope link src <EVPN ZONE SUBNET GATEWAY>
</32 ROUTE FROM EVPN> nhid 43 via <MY EXIT NODE's IP> dev vrfbr_advert proto bgp metric 20 onlink

-------------------------------------------

Following up on "Exit Nodes Local Routing" - I've been testing this in several scenarios, and I believe it ought to be removed, or at least the documentation ought to be significantly adjusted.

The testing and test results are fairly convoluted, so I'll do my best to explain what I found.

Description in the docs: "This is a special option if you need to reach a VM/CT service from an exit node." However, as tested:

Exit nodes can reach EVPN zone VMs/CTs running locally, whether or not "exit nodes local routing" is selected, and do so more efficiently (fewer hops) if "exit nodes local routing" is not selected.
Exit nodes cannot reach EVPN zone VMs/CTs running on other exit nodes, whether or not "exit nodes local routing" is selected. There is a workaround for this if "exit nodes local routing" is not selected (see below); I have not found a workaround for the case where "exit nodes local routing" is selected.
- Slight qualification: the statement above is true when using your in-development package, and when using my cruder "no default routes" fix. When using the prod package, multi-exit-node configurations don't work at all, so this can't really be tested (though I tried anyways)
There is exactly one use case I've found where "exit nodes local routing" meets its stated use case: Reaching EVPN zone VMs/CTs from an exit node, with the prerequisite that those VMs/CTs are not running on a different exit node.
- However, the same workaround from #2 still works for the non-"exit nodes local routing" scenario (tested with the prod package), so the incentive to create this option seems limited.

Root cause(s):

As far as I can tell, "exit nodes local routing" does two things under the hood:

1. Prevents vrf routes from being advertised in / "imported to" the non-vrf BGP config. This prevents the routes from being advertised to BGP peers; It also prevents frr from placing those routes in the node's default routing table. Important to note: Because "redistribute connected" is configured by default in this section, frr will place route(s) to the vrf's subnet(s) in the node's routing table any time that exitnodes_local_routing isn't set.

Perl:

# generate_controller_zone_config

if (!$exitnodes_local_routing) {
    @controller_config = ();
    #import /32 routes of evpn network from vrf1 to default vrf (for packet return)
    push @controller_config, "import vrf $vrf";
    push(@{$config->{frr}->{router}->{"bgp $asn"}->{"address-family"}->{"ipv4 unicast"}}, @controller_config);
    push(@{$config->{frr}->{router}->{"bgp $asn"}->{"address-family"}->{"ipv6 unicast"}}, @controller_config);

    @controller_config = ();
    #redistribute connected to be able to route to local vms on the gateway
    push @controller_config, "redistribute connected";
    push(@{$config->{frr}->{router}->{"bgp $asn vrf $vrf"}->{"address-family"}->{"ipv4 unicast"}}, @controller_config);
}

2. If exitnodes_local_routing is set, and the config is being generated for a gateway, the controller will add a route to the subnet via the xvrf interfaces.

Perl:

# generate_controller_vnet_config

return if !$exitnodes_local_routing;

my $local_node = PVE::INotify::nodename();
my $is_gateway = $exitnodes->{$local_node};

return if !$is_gateway;

my $subnets = PVE::Network::SDN::Vnets::get_subnets($vnetid, 1);
my @controller_config = ();
foreach my $subnetid (sort keys %{$subnets}) {
    my $subnet = $subnets->{$subnetid};
    my $cidr = $subnet->{cidr};
    push @controller_config, "ip route $cidr 10.255.255.2 xvrf_$zoneid";
}
push(@{$config->{frr_ip_protocol}}, @controller_config);

So, takeaway #1: All gateways will have routes to the EVPN Zone subnets, whether or not "exit nodes local routing" is set. What differs is whether that route will be directly via the subnet's gateway (not set) or whether the route will be via the 10.255.255/30 interface pair (set).

Takeaway #2: Exit-node connections to a VM/CT running on another exit-node will not work in either setup due to IP address conflicts.

In the "exit nodes local routing" configuration, the sending node will initiate a connection from its 10.255.255.1 interface. This packet will reach the VM running on the other exit node, but because both exit nodes have a 10.255.255.1 interface, the packet will not be routed back to the right interface; It will be dropped.
Similarly, in the non-"exit nodes local routing" configuration, the sending node will initiate a connection from its zone interface (e.g., if the zone subnet in question is 192.168.168.0/24, the sending node will initiate the connection from the gateway configured for that subnet - in my case, 192.168.168.1). This packet will make it over the vxlan tunnel, but on the far side, the host will be faced with a packet from 192.168.168.1, a destination of (let's say) 192.168.168.10, and a next hop of 192.168.168.1 (as both nodes will have an interface with the subnet's gateway address). This packet will be dropped. Even if it weren't dropped immediately, it would be dropped on the return route.
There is a workaround for the non-"exit nodes local routing" configuration: specify a source interface other than the subnet's gateway, e.g. ping <destination> -I <some other IP on the node>
I am not entirely sure why this workaround does not work for the "exit nodes local routing" configuration, but it fails to work on 100% of tests. Dropwatch indicates the reason is "ip_rcv_finish_core.constprop.0+1ce" - maybe you'll know what to do with that information, but I was stumped.

Of course, the specific use case of "exit node --> VM on non-exit node" works fine with "exit nodes local routing" because the IP address conflict no longer applies. However, there are many cases where "exit nodes local routing" either doesn't work, or works - but less well than the defaults.

Let me know if that made sense, or whether you have follow-up questions, clarifications, test requests, etc.

When I have time, I'll write-up what I believe are issues (and what I think are the solutions) with the subnet advertisement settings/configs.

niwamo · Dec 12, 2023

niwamo said:
Your still-in-dev package fixes the default route filtering for multiple exit-node configurations; however, it works too well and also filters default routes on non-exit nodes. So it fixes one problem and creates another.

Given the same two-node zone, with only one exit node selected, my non-exit node had this route table with your in-dev package:

Code:

unreachable default metric 4278198272 <EVPN ZONE SUBNET>/24 dev advert proto kernel scope link src <EVPN ZONE SUBNET GATEWAY>

and this route table with the "prod" package:

Code:

default nhid 43 via <MY EXIT NODE's IP> dev vrfbr_advert proto bgp metric 20 onlink unreachable default metric 4278198272 <EVPN ZONE SUBNET>/24 dev advert proto kernel scope link src <EVPN ZONE SUBNET GATEWAY> </32 ROUTE FROM EVPN> nhid 43 via <MY EXIT NODE's IP> dev vrfbr_advert proto bgp metric 20 onlink

Well, I have to backtrack on this -

After reading your patches (https://lists.proxmox.com/pipermail/pve-devel/2023-December/060906.html & https://lists.proxmox.com/pipermail/pve-devel/2023-December/060910.html), I couldn't see why it wasn't working, so I restored a snapshot and tested again. Everything is working as expected. I must have had a conflicting change in my lab. Nice work, thanks for the patch!

lunam · Jan 31, 2024

Hi there,

Just wondering if this patch is in the repos now or if I would need to apply these manually still.

Thank you

spirit · Jan 31, 2024

it's only in git
https://git.proxmox.com/?p=pve-network.git;a=commit;h=e614da43f13e3c61f9b78ee9984364495eff91b6

but no package yet (next version should be libpve-network-perl 0.9.6)

alchemydc · Mar 29, 2024

Looks like they just pushed 0.9.6 package. Thanks for the follow up @spirit .

alchemydc · Apr 1, 2024

I have been doing more testing with SDN and PVE8, and while I can confirm that the updated libpve-network-perl v0.9.6 does indeed fix the routing loop bug mentioned in this thread, there are other issues, perhaps specific to our architecture.

Our architecture consists of multiple PVE nodes, all of which are exit nodes.

With PVE7, deploying an EVPN/BGP SDN, we can configure multiple exit nodes in the Proxmox UI, and things work as expected. Each of our PVE nodes act as exit nodes, and outbound traffic from workloads on PVE1 will go via PVE1's public interface, workloads on PVE2 will go via PVE2's public interface, etc.

With PVE8 (pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)) we are forced to select a "primary exit node" in the UI. Once the SDN config is applied to the cluster, we get undesired results. Assuming we set PVE1 to be the "primary exit node", any outbound traffic from VM's on the SDN vnet will be routed via PVE1, even for workloads running on PVE2, PVE3...PVE-n, etc.

Why was the change made in PVE8 to require a "primary exit node"? I understand that you raised some concerns re SNAT during a live migration, etc. but I believe that these limitations are well understood and manageable. In our environment, each of our PVE nodes has its own high speed connection to the Internet, so we want each PVE node to be able to function as exit node for the workloads on their respective vnets without specifying a "primary exit node" that then becomes a bottleneck our architecture.

@spirit can you help?

Thanks

DC

spirit · Apr 2, 2024

alchemydc said:
I have been doing more testing with SDN and PVE8, and while I can confirm that the updated libpve-network-perl v0.9.6 does indeed fix the routing loop bug mentioned in this thread, there are other issues, perhaps specific to our architecture.

Our architecture consists of multiple PVE nodes, all of which are exit nodes.

With PVE7, deploying an EVPN/BGP SDN, we can configure multiple exit nodes in the Proxmox UI, and things work as expected. Each of our PVE nodes act as exit nodes, and outbound traffic from workloads on PVE1 will go via PVE1's public interface, workloads on PVE2 will go via PVE2's public interface, etc.

With PVE8 (pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)) we are forced to select a "primary exit node" in the UI. Once the SDN config is applied to the cluster, we get undesired results. Assuming we set PVE1 to be the "primary exit node", any outbound traffic from VM's on the SDN vnet will be routed via PVE1, even for workloads running on PVE2, PVE3...PVE-n, etc.

Why was the change made in PVE8 to require a "primary exit node"? I understand that you raised some concerns re SNAT during a live migration, etc. but I believe that these limitations are well understood and manageable. In our environment, each of our PVE nodes has its own high speed connection to the Internet, so we want each PVE node to be able to function as exit node for the workloads on their respective vnets without specifying a "primary exit node" that then becomes a bottleneck our architecture.

@spirit can you help?

Thanks

DC

It's an gui bug forcing this field.
a patch has been sent on the pve-devel mailing, but not yet commited
https://lists.proxmox.com/pipermail/pve-devel/2024-February/061924.html

as workaround, you can edit /etc/pve/sdn/zones.cfg , and remove the primary exit-node.
Then apply sdn config.

mme · Apr 6, 2024

@spirit thanks for investigating into this.
Please could you check and push upstream the following?

Code:

# /src/PVE/Network/SDN/Controllers/EvpnPlugin.pm
        if (!$exitnodes_primary || $exitnodes_primary eq $local_node) {
[..]
-           push @{$routemap_config_v6}, "match ip address prefix-list only_default_v6";
+           push @{$routemap_config_v6}, "match ipv6 address prefix-list only_default_v6";

The elsif is fine, just the v4/v6 typo in if-clause.

TIA!

spirit · Apr 7, 2024

mme said:
@spirit thanks for investigating into this.
Please could you check and push upstream the following?

Code:

# /src/PVE/Network/SDN/Controllers/EvpnPlugin.pm if (!$exitnodes_primary || $exitnodes_primary eq $local_node) { [..] - push @{$routemap_config_v6}, "match ip address prefix-list only_default_v6"; + push @{$routemap_config_v6}, "match ipv6 address prefix-list only_default_v6";

The elsif is fine, just the v4/v6 typo in if-clause.

TIA!

oh good catch ! can you open a bug report on bugzilla.proxmox.com ? I'll send a patch tomorrow.

mme · Apr 8, 2024

Sure, thanks!
https://bugzilla.proxmox.com/show_bug.cgi?id=5361

Search

Search

Bugfix for EVPN SDN Multiple Exit Nodes

niwamo

New Member

spirit

Distinguished Member

niwamo

New Member

spirit

Distinguished Member

niwamo

New Member

spirit

Distinguished Member

niwamo

New Member

niwamo

New Member

niwamo

New Member

lunam

New Member

spirit

Distinguished Member

alchemydc

New Member

alchemydc

New Member

spirit

Distinguished Member

mme

New Member

spirit

Distinguished Member

mme

New Member