[TUTORIAL] [Full mesh (routed setup) + EVPN] it is feasible even by using SDN!

interesting tutorial - i am hoping this can help me expose my thunderbolt mesh network (used for ceph public and cluster).

i have 3 node thunderbolt mesh based on this guide
it works great

this is the existing openfabric routing on each promox node, where X is the node number.
(note i only use the fc00::x/128 addresses for the ceph network, the IPv4 is there just because)

Code:
ip forwarding
ipv6 forwarding
!
interface en05
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface en06
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface lo
ip address 10.0.0.8x/32
ipv6 address fc00::8x/128
ip router openfabric 1
ipv6 router openfabric 1
openfabric passive
exit
!
router openfabric 1
net 49.0000.0000.000x.00
exit
!
exit
this is currently saved in both frr.conf and frr.local - my understanding is .local is not overwritten by SDN and is merged?

this the global section of my ceph cluster
Code:
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = fc00::/64
        fsid = 5e55fd50-d135-413d-bffe-9d0fae0ef5fa
        mon_allow_pool_delete = true
        mon_host = fc00::83 fc00::82 fc00::81
        ms_bind_ipv4 = false
        ms_bind_ipv6 = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = fc00::/64

what i would like to achieve is:
  1. keep the ceph public network on the thunderbolt - i want the proxmox QEMU processes to use that route and network (for example in failure scenarios its better to have a node connect to another node across the tunderbolt network)
  2. enable VMs (with existing globally routable IPv6 addresses) on the proxmox node to access the mesh network (fc00::81 thru fc00::83)
  3. enable clients on the LAN (with existing globally routable IPv6 addresses) to access the mesh network addresses (fc00::81 thru fc00::83)
questions:
  1. is my understanding of frr.conf anf frr.conf.local correct (and which takes precedent if the there is a conflict?)
  2. can i just follow the guide as-is?
  3. will the existing openfabric routing co-exist with the new bgp based routing or will the new bgp routing replace the openfabric routing?
  4. i see the options for exit nodes and advertising - i assume i need to use these in some way? (but i am unsure how they would be configured)
  5. my unifi router is capable of frr under the hood and bgp in the UI - i assume that can assist in providing route information to the clilents, but i am, unsure of the right way to go about this?
  6. is there a way to apply the SDN to one node at a time - i.e. to be able to do any frr.conf fix ups as i go along and keep my ceph running?
  7. any other gotchas i need to think about?
I am hoping this config could be applied and nothing would break....
Code:
[root@pve1 17:37:04]$ cat *.cfg
evpn: cephEVPN
        asn 65000
        peers fc00::81, fc00::82, fc00::83

subnet: cephEVPN-fc00::81-128
        vnet cephVNET
        gateway fc00::81

subnet: cephEVPN-fc00::82-128
        vnet cephVNET
        gateway fc00::82

subnet: cephEVPN-fc00::83-128
        vnet cephVNET
        gateway fc00::83

vnet: cephVNET
        zone cephEVPN
        tag 10500

evpn: cephEVPN
        controller cephEVPN
        vrf-vxlan 1000
        ipam pve
        mac BC:24:11:91:2A:E8
        mtu 8950
 
Last edited:
ouch, no, each node got a version of this error and frr did not restart, luckily restarting frr.service got my old openfabrice backup
vrfvx_evpnZ1 : error: vrfvx_evpnZ1: invalid vxlan-local-tunnelip dead:beef:830:1::83: Expected 4 octets in '2600:a801:830:1::83'
vrfbr_evpnZ1 : error: vrfbr_evpnZ1: bridge port vrfvx_evpnZ1 does not exist
vrfbr_evpnZ1 : warning: vrfbr_evpnZ1: apply bridge ports settings: bridge configuration failed (missing ports)
vxlan_vxnet1 : error: vxlan_vxnet1: invalid vxlan-local-tunnelip dead:beef:830:1::83: Expected 4 octets in '2600:a801:830:1::83'
vxnet1 : error: vxnet1: bridge port vxlan_vxnet1 does not exist
vxnet1 : warning: vxnet1: apply bridge ports settings: bridge configuration failed (missing ports)
vrf_evpnZ1 : warning: vrf_evpnZ1: post-up cmd 'ip route del vrf vrf_evpnZ1 unreachable default metric 4278198272' failed: returned 2 (RTNETLINK answers: No such process

TASK ERROR: command 'ifreload -a' failed: exit code 1
whats interesting is it errored on my real IPv6 address assigned to vmbr0, not the addresses defined in my thunderbolt net or the SDN

Oh i see it expected 4 octets! Does this mean SDN doesn't work when the host has an IPv6 address on vmbr0?

--edit ---
confirmed guess i will revist all of this when promox 9.1 comes around :-(
SDN VXLAN over IPv6 | Proxmox Support Forum

I compiled ifupdown2 from the patched branch in github, this got me past the error above
my SDN stays in the 'pending state
 
Last edited:
Ok so i followed the tutorial.

Got some interesting results.
  • the EVPN shows as available in the UI
  • it trashed my VM connectivity - all VMs need to be rebooted evey time i do appy
I see these errors, there is no bgp / evpn in frr.conf and the bgpd is not started, i assume the root issue errors are the two vxlan creation failed errors


vxlan_cephVNET : error: vxlan_cephVNET: vxlan creation failed: pack expected 25 items for packing (got 37)
cephVNET : error: cephVNET: bridge port vxlan_cephVNET does not exist
cephVNET : warning: cephVNET: apply bridge ports settings: bridge configuration failed (missing ports)
vrfvx_cephEVPN : error: vrfvx_cephEVPN: vxlan creation failed: pack expected 25 items for packing (got 37)
vrfbr_cephEVPN : error: vrfbr_cephEVPN: bridge port vrfvx_cephEVPN does not exist
vrfbr_cephEVPN : warning: vrfbr_cephEVPN: apply bridge ports settings: bridge configuration failed (missing ports)

TASK ERROR: command 'ifreload -a' failed: exit code

--this is more IPv6 issues...
switching to IPv4 in the SDN config worked avoided those errors
 
Last edited: