Frr update to 10.4.1-1 broke external routing?

ns33

Member
Apr 4, 2024
30
3
8
Just updated all ourcluster to the latest pull I did with POM; apart of that included the frr package 10.4.1-1+pve1; I was previously on 10.3.1-1+pve4.

After doing the update, I noticed I could no longer route out of the vms to anything physical. The next hope would be the subnet defined in the vent but that's it. The hop after that should be the default gateway but that wasn't the case.

Just happened to look at one of the SRV networking - reload tasks in the history and noticed two things; the frr version was still 10.3.1 and with every reload I would receive 'INFO: "frr version 10.4.1" cannot be removed' I checked the frr.conf in /etc/frr and the version listed is in fact 10.3.1 at the top of the file. I guess I can't manually change that as with each reload of the SDN, my change is reverted.

So I removed frr and frr-pythontools and targeted10.3.1-1+pve4 for both and now I'm able to route to outside devices from a VM

Did I miss something I should have done during the update? I'm thinking the frr.conf would just need to the version to be changed to 10.4.1 but how would I do that? Or is this a bug?
 
But this should also be fixed by restarting frr using systemctl restart frr. If this problem persists, please paste the full frr config and the full error log.
 
Ah, this was my mistake, didn't realize frr 10.4.1 would make no-subscription so fast! Should be fixed with this patch: https://lore.proxmox.com/pve-devel/20251209105731.19965-1-g.goller@proxmox.com/
Awesome!!

Two probably dumb question:
  1. how would I go about getting the patch into my environment? Build from source? My means of transferring anything to the cluster is through sneaker-net. Or would the patched package be found in http://download.proxmox.com/debian/
  2. What is the typical workflow from a patch being made to becoming available in the no-subscription repo?
 
Hi!
the patch will need to be merged and then there will be a new build of pve-network available in no-sub.
But note that this error is just a reloading debug statement, it doesn't cause you routes to not be available. Please restart frr and check if the problem still persists, because this could also be a frr regression.
 
Understood!

I reinstalled frr back to 10.4.1, did a daemon reload and restarted the service on each node. I do still lose the ability to reach out to an external device.

Open each 'SRV networking - Reload' tasks, the only one I see a warning on is the on the node I have selected as the exit node.

Currently only one of the zones has an exit node defined. The warning is the first line:
vrf_<ZoneName> : warning: vrf_<ZoneName>: post-up cmd 'ip route delete vrf vrf_<ZoneName> unreachable default metric 4278198272' failed: returned 2 (RTNETLINK answers: No such process

Do note vrf_<ZoneName> is just not the actually name, just a sanitized version.

I don't see this warning/error using 10.3.1. Where should I begin my investigation into the cause? I can provided logs/config but I'd just need to rip stuff out of them.
 
Last edited:
This still shouldn't be really a problem. Please post the sdn config (everything in /etc/pve/sdn/) and the /etc/frr/frr.conf file. Also I presume you get the default route from some external router, does that route appear in the frr rib? so is it visible with show bgp l2vpn evpn route
or show ip route vrf vrf_<your_vrf_name>? Thanks!
 
Last edited:
This still shouldn't be really a problem. Please post the sdn config (everything in /etc/pve/sdn/) and the /etc/frr/frr.conf file. Also I presume you get the default route from some external router, does that route appear in the frr rib? so is it visible with show bgp l2vpn evpn route
or show ip route vrf vrf_<your_vrf_name>? Thanks!
Will do!

Do you need the .running-config in /etc/pve/sdn? Sanitizing that would be a minor headache. I can certainly do so, just figured I ask before going through that if it isn't needed
 
No, it's fine -- you can omit the .running-config!
 
Does this happen in all zones, if not, in which zone is the route missing (I presume this happens on ZoneName1 because that's the exit node)? Do you have firewalls enabled? Could you show the output of these commands:
Code:
vtysh -c "show ip route vrf vrf_<vrf_name>"
vtysh -c "show bgp summary"
vtysh -c "show bgp l2vpn evpn route"
 
I'm not sure if it occurs in other zones. Honestly, nothing is configured to use the other zones. In the end, all zones will have exits nodes defined, and the plan is to have multiple for the load balancing.

I don't have any firewalls enabled; disabled on the datacenter level, node level, interface, and nothing external.

Here are the outputs as well
 

Attachments

Hmm, this looks good as far as I can see. Where are the VMs that cannot reach outside the zone? Are they on the ExitNode or on other nodes (if they are on the other nodes, could you please share the RIB, so vtysh -c "show ip route vrf vrf_<vrfName>")? What IP address are you trying to reach that is outside the EVPN zone?
 
The VMs are spread out among the nodes. I can't ping out when using any of the VMs, internal is always fine between the versions.

The IPs I'm trying to reach fall within a slew of subnets and are on different vlans defined on the switch. IP routing is enable on the switch stack.

It's just anything within the sdn never leaves out the default gateway in 10.4.1. Like between versions the address stops be advertised to the external network. I definitely can confirm the local network side routes correctly. Through each nodes shell, I can reach anything on a different subnet or vlan
 
Hi!
1) How does your external network look like? Which IP are you pinging there (Just the subnet is also ok)?
2) How do you connect the exit-node with your external network?
3) How are you advertising the routes from the exit-node to the external network?
4) How does your default routing table look on the exit-node vtysh -c "show ip route"?
5) Could you get a tcpdump of the ping on the exit-node, so run tcpdump -envi any , then ping from a VM?

If you are worried about leaking stuff, you can also send me the output as a DM.
Thanks!