update sdn controller object failed

Aug 9, 2024
25
2
3
I'm trying to use OSPF Fabric underlay with EVPN - I had this functional before moving to 9.0, but when I define my OSPF fabric and then try to update my EVPN controller with the node IPs using either the interface IP or loopback IP I get this error:
update sdn controller object failed: must have exactly one of peers / fabric defined at /usr/share/perl5/PVE/Network/SDN/Controllers/EvpnPlugin.pm line 499. (500)

In OSPF I have 3 nodes using a loopback network of 10.127.1.0/24. The interfaces are all on VLAN3000 with an IP in 10.126.1.0/24. I'm not sure why it's not letting me add the peer IPs in EVPN now.

1754531934028.png1754531755818.png

Interface dummy_PVE is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
vrf: default
index 245 metric 0 mtu 1500 speed 0 txqlen 1000
flags: <UP,LOWER_UP,BROADCAST,RUNNING,NOARP>
MPLS Not specified by CLI
Multicast config is Not specified by CLI
Shutdown config is Not specified by CLI
Type: Ethernet
HWaddr: c6:12:bc:17:a3:50
inet 10.127.1.5/32 unnumbered
inet6 fe80::c412:bcff:fe17:a350/64
Interface Type dummy
Interface Slave Type None
protodown: off

pve3# show interface vlan3000
Interface vlan3000 is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
vrf: default
index 231 metric 0 mtu 9000 speed 50000 txqlen 1000
flags: <UP,LOWER_UP,BROADCAST,RUNNING,MULTICAST>
MPLS Not specified by CLI
Multicast config is Not specified by CLI
Shutdown config is Not specified by CLI
Type: Ethernet
HWaddr: e4:3d:1a:78:7f:30
inet 10.126.1.5/24
inet6 fe80::e63d:1aff:fe78:7f30/64
Interface Type Vlan
Interface Slave Type None
VLAN Id 3000
protodown: off
Parent interface: vmbr
 
I think I figured this out, but using OSPF fabric now breaks EVPN when using an external router. We have two Dell OS10 switches participating in the EVPN fabric and the only way to add them is to choose either an OSPF fabric, or to go back to defining OSPF via FRR override.

Can we get the ability to add other nodes or even a pair of RR while still using OSPF?
 
I think I figured this out, but using OSPF fabric now breaks EVPN when using an external router. We have two Dell OS10 switches participating in the EVPN fabric and the only way to add them is to choose either an OSPF fabric, or to go back to defining OSPF via FRR override.

Does it not work at all? Selecting the Fabric in the EVPN controller is equivalent to selecting all nodes in the fabric as peers - so essentially a full-mesh between all nodes. If you want to use Route Reflectors (even with routes learned via OSPF) then simply add the IPs of the RRs in the peers field, and all nodes should peer with the RRs.
 
Does it not work at all? Selecting the Fabric in the EVPN controller is equivalent to selecting all nodes in the fabric as peers - so essentially a full-mesh between all nodes. If you want to use Route Reflectors (even with routes learned via OSPF) then simply add the IPs of the RRs in the peers field, and all nodes should peer with the RRs.

The Perl code does not allow you to add an IP at all if a fabric is configured - whether or not it is added to the EVPN controller. This causes issues for us because we want to use our route reflectors, but now we must choose between using an OSPF sdn fabric or choosing our own peers. Why can't we do both?


Perl:
        die "must have exactly one of peers / fabric defined"
            if ($controller->{peers} && $controller->{fabric})
            || !($controller->{peers} || $controller->{fabric});
 
You do not have to configure the fabric in the EVPN controller if you want to use Route Reflectors - simply add the IP of the Route Reflectors to the peer field and leave the fabric field empty. It will still use the routes learned via OSPF to connect to the Route Reflectors.
 
You do not have to configure the fabric in the EVPN controller if you want to use Route Reflectors - simply add the IP of the Route Reflectors to the peer field and leave the fabric field empty. It will still use the routes learned via OSPF to connect to the Route Reflectors.

That’s kinda the problem. If you create an OSPF fabric after the evpn controller, you can not enter any IPs whether or not a fabric is selected. Just because the OSPF fabric exists we get the error.

Even then it would be great if we could do both. I guess I don’t see why it would have to be an either | or option.

Also I wish we could choose to have BFD in OSPF or BGP or at least let me turn it off. Our switches don’t support multihop BFD and using loopbacks with BFD makes FRR use that. It works well if you only use Proxmox EVPN but not as good with external evpn controllers. Nice to have anyway.
 
That’s kinda the problem. If you create an OSPF fabric after the evpn controller, you can not enter any IPs whether or not a fabric is selected. Just because the OSPF fabric exists we get the error.
I'm not sure I understand 100% correctly - how is your topology and what do you want to peer with what? From what I understand you created an OSPF fabric, and you have two Route Reflectors inside your OSPF network that you want to use as route reflectors for EVPN. If yes, what are the IPs of the Route Reflectors?

In that case you would simply enter the IPs of the route reflectors as peers in the EVPN controller, make sure the fabric field is empty and save, I just tested this on my test cluster and it seems to work. In your screenshot you still have the fabric selected in the dropdown.

The fabrics setting in SDN is just a shortcut for inserting all loopback IPs of all nodes in the fabric as peers. It's equivalent to entering all of the loopback IPs as peers in the IP field. I understand how the UI might be misleading and could be improved - I'll look into improving this and making it more clear.

Also I wish we could choose to have BFD in OSPF or BGP or at least let me turn it off. Our switches don’t support multihop BFD and using loopbacks with BFD makes FRR use that. It works well if you only use Proxmox EVPN but not as good with external evpn controllers. Nice to have anyway.
Can you open a Bugzilla entry for that?
 
I'm not sure I understand 100% correctly - how is your topology and what do you want to peer with what? From what I understand you created an OSPF fabric, and you have two Route Reflectors inside your OSPF network that you want to use as route reflectors for EVPN. If yes, what are the IPs of the Route Reflectors?

In that case you would simply enter the IPs of the route reflectors as peers in the EVPN controller, make sure the fabric field is empty and save, I just tested this on my test cluster and it seems to work. In your screenshot you still have the fabric selected in the dropdown.

The fabrics setting in SDN is just a shortcut for inserting all loopback IPs of all nodes in the fabric as peers. It's equivalent to entering all of the loopback IPs as peers in the IP field. I understand how the UI might be misleading and could be improved - I'll look into improving this and making it more clear.

I understand how the behavior should be but it just won't let me enter an IP. I have SDN fabric currently in use, I open evpn controller, delete the fabric from evpnctrl, add an IP for my RR and when I hit OK I get the same error.

1754596847402.png
 
Thanks for the clarification, I wasn't sure whether that's what you attempted. I will look to see if we can reproduce this internally, because I wasn't able to reproduce this when I tried yesterday. This should definitely work, so I'll look into it!

Can you post the output of the following files:

Code:
cat /etc/pve/sdn/fabrics.cfg
cat /etc/pve/sdn/controllers.cfg

Are there any errors in the syslog?
 
Thanks for the clarification, I wasn't sure whether that's what you attempted. I will look to see if we can reproduce this internally, because I wasn't able to reproduce this when I tried yesterday. This should definitely work, so I'll look into it!

Can you post the output of the following files:

Code:
cat /etc/pve/sdn/fabrics.cfg
cat /etc/pve/sdn/controllers.cfg

Are there any errors in the syslog?

no errors in syslog.

root@pve1:~# cat /etc/pve/sdn/fabrics.cfg
ospf_fabric: PVE
area 0.0.0.0
ip_prefix 10.127.1.0/24

ospf_node: PVE_pve1
interfaces name=vlan3000,ip=10.126.1.5/24
ip 10.127.1.5

ospf_node: PVE_pve2
interfaces name=vlan3000,ip=10.126.1.6/24
ip 10.127.1.6

ospf_node: PVE_pve3
interfaces name=vlan3000,ip=10.126.1.7/24
ip 10.127.1.7
root@pve1:~#

root@pve1:~# cat /etc/pve/sdn/controllers.cfg
evpn: evpnctlr
asn 65100
fabric PVE
 
Thanks for the clarification, I wasn't sure whether that's what you attempted. I will look to see if we can reproduce this internally, because I wasn't able to reproduce this when I tried yesterday. This should definitely work, so I'll look into it!

Can you post the output of the following files:

Code:
cat /etc/pve/sdn/fabrics.cfg
cat /etc/pve/sdn/controllers.cfg

Are there any errors in the syslog?

no errors in syslog.

root@pve1:~# cat /etc/pve/sdn/fabrics.cfg
ospf_fabric: PVE
area 0.0.0.0
ip_prefix 10.127.1.0/24

ospf_node: PVE_pve1
interfaces name=vlan3000,ip=10.126.1.5/24
ip 10.127.1.5

ospf_node: PVE_pve2
interfaces name=vlan3000,ip=10.126.1.6/24
ip 10.127.1.6

ospf_node: PVE_pve3
interfaces name=vlan3000,ip=10.126.1.7/24
ip 10.127.1.7
root@pve1:~#

root@pve1:~# cat /etc/pve/sdn/controllers.cfg
evpn: evpnctlr
asn 65100
fabric PVE
 
I tried reproducing this behavior on my machine, but wasn't successful. Could you maybe check the developer tools of your browser to see the body of the request that updates the EVPN controller?
 
I think the issue is you can’t delete the fabric from evpn controller after adding it. E.g. removing it and saving, then re-opening the controller a second time shows it still configured.

We just need the ability to use both OSPF fabric and our own evpn nodes, the OR limitation means you either use OSPF or define your own nodes. It defeats the purpose of configuring an underlay.
 
I think the issue is you can’t delete the fabric from evpn controller after adding it. E.g. removing it and saving, then re-opening the controller a second time shows it still configured.
Yes, but I tried exactly that several times and it worked on my machine, it should be fine to remove the line configuring the fabric from the controllers.cfg for now as a workaround. It seems like it might be an issue with the UI sending the wrong update request, hence why it would be interesting to see the output from the network tab.

We just need the ability to use both OSPF fabric and our own evpn nodes, the OR limitation means you either use OSPF or define your own nodes. It defeats the purpose of configuring an underlay.
I know it's an inconvenience but with the current implementation we had to add that restriction for now in order to not break existing setups - I would've preferred otherwise as well. We are looking into removing this restriction for the future though - sorry for the inconvenience. It should still be possible to use both OSPF and your own nodes by just adding them all as peers, without any fabric whatsoever.