[TUTORIAL] [Full mesh (routed setup) + EVPN] it is feasible even by using SDN!

interesting tutorial - i am hoping this can help me expose my thunderbolt mesh network (used for ceph public and cluster).

i have 3 node thunderbolt mesh based on this guide
it works great

this is the existing openfabric routing on each promox node, where X is the node number.
(note i only use the fc00::x/128 addresses for the ceph network, the IPv4 is there just because)

Code:
ip forwarding
ipv6 forwarding
!
interface en05
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface en06
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface lo
ip address 10.0.0.8x/32
ipv6 address fc00::8x/128
ip router openfabric 1
ipv6 router openfabric 1
openfabric passive
exit
!
router openfabric 1
net 49.0000.0000.000x.00
exit
!
exit
this is currently saved in both frr.conf and frr.local - my understanding is .local is not overwritten by SDN and is merged?

this the global section of my ceph cluster
Code:
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = fc00::/64
        fsid = 5e55fd50-d135-413d-bffe-9d0fae0ef5fa
        mon_allow_pool_delete = true
        mon_host = fc00::83 fc00::82 fc00::81
        ms_bind_ipv4 = false
        ms_bind_ipv6 = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = fc00::/64

what i would like to achieve is:
  1. keep the ceph public network on the thunderbolt - i want the proxmox QEMU processes to use that route and network (for example in failure scenarios its better to have a node connect to another node across the tunderbolt network)
  2. enable VMs (with existing globally routable IPv6 addresses) on the proxmox node to access the mesh network (fc00::81 thru fc00::83)
  3. enable clients on the LAN (with existing globally routable IPv6 addresses) to access the mesh network addresses (fc00::81 thru fc00::83)
questions:
  1. is my understanding of frr.conf anf frr.conf.local correct (and which takes precedent if the there is a conflict?)
  2. can i just follow the guide as-is?
  3. will the existing openfabric routing co-exist with the new bgp based routing or will the new bgp routing replace the openfabric routing?
  4. i see the options for exit nodes and advertising - i assume i need to use these in some way? (but i am unsure how they would be configured)
  5. my unifi router is capable of frr under the hood and bgp in the UI - i assume that can assist in providing route information to the clilents, but i am, unsure of the right way to go about this?
  6. is there a way to apply the SDN to one node at a time - i.e. to be able to do any frr.conf fix ups as i go along and keep my ceph running?
  7. any other gotchas i need to think about?
I am hoping this config could be applied and nothing would break....
Code:
root@pve1:/etc/pve/sdn# cat controllers.cfg
evpn: myEVPN
        asn 65000
        peers fc00::81 fc00::82 fc00::83

root@pve1:/etc/pve/sdn# cat zones.cfg
evpn: evpnZ1
        controller myEVPN
        vrf-vxlan 1000
        advertise-subnets 1
        exitnodes pve2,pve1,pve3
        exitnodes-local-routing 1
        ipam pve
        mac BC:24:11:55:29:D0
        mtu 65000

root@pve1:/etc/pve/sdn# cat vnets.cfg
vnet: vxnet1
        zone evpnZ1
        tag 10500

root@pve1:/etc/pve/sdn# cat subnets.cfg
subnet: evpnZ1-fc00::81-128
        vnet vxnet1
        gateway fc00::81

subnet: evpnZ1-fc00::82-128
        vnet vxnet1
        gateway fc00::82

subnet: evpnZ1-fc00::83-128
        vnet vxnet1
        gateway fc00::83
 
Last edited:
ouch, no, each node got a version of this error and frr did not restart, luckily restarting frr.service got my old openfabrice backup
vrfvx_evpnZ1 : error: vrfvx_evpnZ1: invalid vxlan-local-tunnelip dead:beef:830:1::83: Expected 4 octets in '2600:a801:830:1::83'
vrfbr_evpnZ1 : error: vrfbr_evpnZ1: bridge port vrfvx_evpnZ1 does not exist
vrfbr_evpnZ1 : warning: vrfbr_evpnZ1: apply bridge ports settings: bridge configuration failed (missing ports)
vxlan_vxnet1 : error: vxlan_vxnet1: invalid vxlan-local-tunnelip dead:beef:830:1::83: Expected 4 octets in '2600:a801:830:1::83'
vxnet1 : error: vxnet1: bridge port vxlan_vxnet1 does not exist
vxnet1 : warning: vxnet1: apply bridge ports settings: bridge configuration failed (missing ports)
vrf_evpnZ1 : warning: vrf_evpnZ1: post-up cmd 'ip route del vrf vrf_evpnZ1 unreachable default metric 4278198272' failed: returned 2 (RTNETLINK answers: No such process

TASK ERROR: command 'ifreload -a' failed: exit code 1
whats interesting is it errored on my real IPv6 address assigned to vmbr0, not the addresses defined in my thunderbolt net or the SDN

Oh i see it expected 4 octets! Does this mean SDN doesn't work when the host has an IPv6 address on vmbr0?

--edit ---
confirmed guess i will revist all of this when promox 9.1 comes around :-(
SDN VXLAN over IPv6 | Proxmox Support Forum
 
Last edited: