Mesh Ceph Network with FRR Openfabric - Route question

cheabred · Sep 23, 2024

Looks like my route setup wants to route over one of the hosts to get to another, even though i have all 3 100G dac cables, attached, it wont direct connect to the other host for some reason?

pve1 => .1
pve2 => .2
pve3 => .3

pve1 port1 -> pve2 port 1
pve1 port 2 -> pve3 port 1
pve3 port2 -> pve2 port 2

first time using FRR at all. i understand the concept, but i dont see why its not realizing there is a less weighted path to get the the .1 server from the

i have switched cables around a ton (waited about 5 min between each move)

Followed this post here (https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_.28with_Fallback.29)

any help would be appreciated

cbugk · Sep 23, 2024

Your routing tables indicate that for whatever reason link between pve1 and pve2 is ignored. First two comes to mind are bad physical connection, or your FRR configuration could have same host registered as a neighbour twice due to incomplete copy/paste without editing the last bit.

Have you confirmed that your underlay works? That would rule out physical connection issues. Run iperf3 test between pve1 and pve2 but on the underlay IPs.

cheabred · Sep 24, 2024

Here are the 2 FRR configs for those 2 hosts,

there are No ip addresses assigned to the interfaces themselves FRR is handling that.

i did do an iperf test using the FRR ones, and everything is routing fine, except its an extra hop for some reason
tempted to switch to the "Routed Setup (Simple)" method instead of FRR as i feel that would be easier, and also faster switchovers if a cable goes back, or someone for some reason unplugs it.

cbugk · Sep 24, 2024

I am no networking expert and I've come across your post while looking for some insight on CEPH over an SDN for my 5 node cluster.

Having disclaimed that, I got FRR working between 4 nodes (sources listed below) with a routed complete mesh of 10G direct connections. Not that it is relevant, simply a work around for lack of a 10g switch on my part.

So, I had assumed that there needed to be a underlay network for an overlay network to work. I am still firm on that belief. I guess FRR nodes could use IPv6 link-local addresses if they wanted, but that would be a speculation.

Until the cavalry arrives, copy my plan:
* set a routed mesh, ensure each direct link has sufficient throughput
* set FRR nodes on top of that tested underlay
* create the vnet and subnet
* confirm speed test on the VMs

I would first do it on IPv4 for peace of mind and use IPv6 afterwards if that is a requirement.

There on, I, personally, am planning on keeping Ceph on the routed network and using that network for backup cluster communication as well. It is the internal network and needn't be modified, that allows for complete remake of SDN for my experiments. But, you do you.

I am going to prepare a blog for this, and might update my route according to your experience, so any confirmation is welcome. Here is my blog for routed mesh based on official docs, just another source to compare: https://blog.cbugk.com/post/proxmox-ceph-routed-ipv6/.

And below are what made me arrive on a working setup of FRR on 10G mesh. Unfortunately I don't have its blog ready:
* https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn
* https://bennetgallein.de/blog/setting-up-evpn-on-proxmox-sdn-a-comprehensive-guide
* https://myarbitrarystuff.com/2019/03/02/programmable-fabric-vxlan-bgp-evpn-part-1/

Edit: Oh the neighbour I was talking about is from bennetgailein link:

Code:

router bgp 6500
...
neighbor 172.16.0.2 peer-group VTEP
neighbor 172.16.0.3 peer-group VTEP

Which proves your point of underlay not being part of the FRR config. Official docs shows that correlation is set on the EVPN definition as in:

Code:

ID: myevpnctl
ASN#: 65000
Peers: 192.168.0.1,192.168.0.2,192.168.0.3

I need some testing to provide actual useful information it seems.

cbugk · Oct 8, 2024

I had an epiphany

. FRR does not necessarily need an underlay, it can configure the interface itself, and this was what @cheabred was doing.

In the mean time I confirmed my prior approach was based on bennetgallein's blog. When I use IP of routed setup for interface lo on that example, EVPN SDN works.

But I must admit, using solely FRR (while still renaming interfaces via systemd for uniformity) seems a better approach. This would allow for CEPH, the same way the routed setup does.

I am a bit busy recently, however, will drop an update when I get something, even a dead end, in a few weeks.

cheabred · Oct 8, 2024

I did manage to get it working, not 100% sure why it was not working before, but I switched to ipv6 for the mesh and now it's working perfectly, I'll post the configurations here soon incase anyone wants to know

cbugk · Oct 9, 2024

Btw, there actually is an apalrd video ( https://www.apalrd.net/posts/2023/cluster_routes/ ). He uses IPv6 as well, but link local addresses rather than setting point-to-point himself. Thus lo is enough to make it work. Creating a cluster is not necessary either of course.

Though I was able to break configuration by giving area 1.0.0.0 to vmbr0. Ended up commenting out all together to have 10 mesh only.

That change made unreliable iperf3 tests possible before the frr.service shut itself down between my frustrated restarts of the service on different machines. So I cannot say why you had experienced non-functional link at the beginning, but journalctl -fau frr.service output and /etc/frr/frr.conf pasted on ChatGPT was good enough in my case. I should have asked for logs

. Anyway, thanks for the brain storm.

cheabred · Oct 9, 2024

that is the exact video i found that i followed for the ipv6 and that's how i got it working, not sure why i didn't get ipv4 working, now im working on testing ceph vs starwinds vsan

cheabred · Dec 14, 2024

OK, coming back because I finally figured out how to get all this working very well. Below are my `frr.conf` and network configuration files, along with some key points about the setup.

### FRR Configuration (`frr.conf`)

Code:

ip forwarding
ipv6 forwarding
!

interface ens1f0np0
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!

interface ens1f1np1
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!

interface vmbr1
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!

router openfabric 1
 net 49.0000.0000.0003.00
exit
!

---

### Network Interfaces Configuration (`/etc/network/interfaces`)

Code:

auto lo
iface lo inet loopback

iface eno1 inet manual
iface eno2 inet manual
iface eno3 inet manual
iface eno4 inet manual

auto eno49
iface eno49 inet manual

auto eno50
iface eno50 inet manual

auto ens1f0np0
iface ens1f0np0 inet manual
    mtu 65520

auto ens1f1np1
iface ens1f1np1 inet manual
    mtu 65520

auto bond0
iface bond0 inet manual
    bond-slaves eno49 eno50
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2

auto vmbr0
iface vmbr0 inet static
    address 10.20.10.23/24
    gateway 10.20.10.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr1
iface vmbr1 inet6 static
    address fc00::83/128
    bridge-ports lo
    bridge-stp off
    bridge-fd 0

# 100G Mesh
post-up /usr/bin/systemctl restart frr.service

source /etc/network/interfaces.d/*

---

### Key Notes About the Configuration

- **100G Mesh Network**:
The 100G mesh network is configured as a `vmbr` (bridge interface), making it versatile for both general use and cluster operations. With dual 100G interfaces, I can use this network for tasks like RAM migration, which benefits greatly from the high bandwidth.

- **Unique Values Per Node**:
Each node has unique values for:
- `vmbr1`'s IPv6 address (`fc00::83/128` for this node).
- `net` in `frr.conf` (`49.0000.0000.0003.00` for this node).

- **IPv6 and IPv4**:
Although I’m using IPv6 primarily, this setup works for IPv4 as well.

- **Hosts File**:
Don’t forget to add the appropriate IP addresses and hostnames for all nodes in each node’s `/etc/hosts` file.

Code:

10.20.10.21 pve1
10.20.10.22 pve2
10.20.10.23 pve3
fc00::81 pve1
fc00::82 pve2
fc00::83 pve3

---

### Future Plans
I plan to explore using my 10G LACP-bonded network (main network) as a backup link for the mesh. This way, it can provide redundancy in case the mesh goes down.

---

This cleaned-up configuration works very well for me, and I hope it helps others set up similar networks. Let me know if you have suggestions for further improvements!

fitful · May 16, 2025

Hi

I'm trying to implement your Proxmox routed mesh network configuration using FRR.

I'm having trouble with the routing between nodes. When I check in FRR (show ... openfabric route), I only see routes for the local node; routes to other nodes are not appearing.

Could you please tell me if you are using this configuration successfully in production? Also, did you need to make any changes or find any necessary fixes since you originally posted it?

Any quick tips would be very helpful.

Thanks a lot for sharing your setup!

Search

Search

Mesh Ceph Network with FRR Openfabric - Route question

cheabred

Renowned Member

cbugk

New Member

cheabred

Renowned Member

cbugk

New Member

cbugk

New Member

cheabred

Renowned Member

cbugk

New Member

cheabred

Renowned Member

cheabred

Renowned Member

fitful

Active Member

We value your privacy