Mesh Ceph Network with FRR Openfabric - Route question

cheabred

Renowned Member
May 12, 2014
13
2
68
Portland, Oregon, United States
Looks like my route setup wants to route over one of the hosts to get to another, even though i have all 3 100G dac cables, attached, it wont direct connect to the other host for some reason?
1727051153958.png
1727050991289.png
1727051162883.png

pve1 => .1
pve2 => .2
pve3 => .3

pve1 port1 -> pve2 port 1
pve1 port 2 -> pve3 port 1
pve3 port2 -> pve2 port 2

first time using FRR at all. i understand the concept, but i dont see why its not realizing there is a less weighted path to get the the .1 server from the


i have switched cables around a ton (waited about 5 min between each move)

Followed this post here (https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_.28with_Fallback.29)

any help would be appreciated
 
Your routing tables indicate that for whatever reason link between pve1 and pve2 is ignored. First two comes to mind are bad physical connection, or your FRR configuration could have same host registered as a neighbour twice due to incomplete copy/paste without editing the last bit.

Have you confirmed that your underlay works? That would rule out physical connection issues. Run iperf3 test between pve1 and pve2 but on the underlay IPs.
 
Here are the 2 FRR configs for those 2 hosts,


there are No ip addresses assigned to the interfaces themselves FRR is handling that.

i did do an iperf test using the FRR ones, and everything is routing fine, except its an extra hop for some reason
tempted to switch to the "Routed Setup (Simple)" method instead of FRR as i feel that would be easier, and also faster switchovers if a cable goes back, or someone for some reason unplugs it.


1727141749912.png

1727141765209.png
 
I am no networking expert and I've come across your post while looking for some insight on CEPH over an SDN for my 5 node cluster.

Having disclaimed that, I got FRR working between 4 nodes (sources listed below) with a routed complete mesh of 10G direct connections. Not that it is relevant, simply a work around for lack of a 10g switch on my part.

So, I had assumed that there needed to be a underlay network for an overlay network to work. I am still firm on that belief. I guess FRR nodes could use IPv6 link-local addresses if they wanted, but that would be a speculation.

Until the cavalry arrives, copy my plan:
* set a routed mesh, ensure each direct link has sufficient throughput
* set FRR nodes on top of that tested underlay
* create the vnet and subnet
* confirm speed test on the VMs

I would first do it on IPv4 for peace of mind and use IPv6 afterwards if that is a requirement.

There on, I, personally, am planning on keeping Ceph on the routed network and using that network for backup cluster communication as well. It is the internal network and needn't be modified, that allows for complete remake of SDN for my experiments. But, you do you.

I am going to prepare a blog for this, and might update my route according to your experience, so any confirmation is welcome. Here is my blog for routed mesh based on official docs, just another source to compare: https://blog.cbugk.com/post/proxmox-ceph-routed-ipv6/.

And below are what made me arrive on a working setup of FRR on 10G mesh. Unfortunately I don't have its blog ready:
* https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn
* https://bennetgallein.de/blog/setting-up-evpn-on-proxmox-sdn-a-comprehensive-guide
* https://myarbitrarystuff.com/2019/03/02/programmable-fabric-vxlan-bgp-evpn-part-1/

Edit: Oh the neighbour I was talking about is from bennetgailein link:
Code:
router bgp 6500
...
neighbor 172.16.0.2 peer-group VTEP
neighbor 172.16.0.3 peer-group VTEP

Which proves your point of underlay not being part of the FRR config. Official docs shows that correlation is set on the EVPN definition as in:
Code:
ID: myevpnctl
ASN#: 65000
Peers: 192.168.0.1,192.168.0.2,192.168.0.3

I need some testing to provide actual useful information it seems.
 
Last edited:
I had an epiphany :oops:. FRR does not necessarily need an underlay, it can configure the interface itself, and this was what @cheabred was doing.

In the mean time I confirmed my prior approach was based on bennetgallein's blog. When I use IP of routed setup for interface lo on that example, EVPN SDN works.

But I must admit, using solely FRR (while still renaming interfaces via systemd for uniformity) seems a better approach. This would allow for CEPH, the same way the routed setup does.

I am a bit busy recently, however, will drop an update when I get something, even a dead end, in a few weeks.
 
Btw, there actually is an apalrd video ( https://www.apalrd.net/posts/2023/cluster_routes/ ). He uses IPv6 as well, but link local addresses rather than setting point-to-point himself. Thus lo is enough to make it work. Creating a cluster is not necessary either of course.

Though I was able to break configuration by giving area 1.0.0.0 to vmbr0. Ended up commenting out all together to have 10 mesh only.

That change made unreliable iperf3 tests possible before the frr.service shut itself down between my frustrated restarts of the service on different machines. So I cannot say why you had experienced non-functional link at the beginning, but journalctl -fau frr.service output and /etc/frr/frr.conf pasted on ChatGPT was good enough in my case. I should have asked for logs :D. Anyway, thanks for the brain storm.
 
OK, coming back because I finally figured out how to get all this working very well. Below are my `frr.conf` and network configuration files, along with some key points about the setup.

### FRR Configuration (`frr.conf`)
Code:
ip forwarding
ipv6 forwarding
!

interface ens1f0np0
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!

interface ens1f1np1
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!

interface vmbr1
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!

router openfabric 1
 net 49.0000.0000.0003.00
exit
!
---

### Network Interfaces Configuration (`/etc/network/interfaces`)
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
iface eno2 inet manual
iface eno3 inet manual
iface eno4 inet manual

auto eno49
iface eno49 inet manual

auto eno50
iface eno50 inet manual

auto ens1f0np0
iface ens1f0np0 inet manual
    mtu 65520

auto ens1f1np1
iface ens1f1np1 inet manual
    mtu 65520

auto bond0
iface bond0 inet manual
    bond-slaves eno49 eno50
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2

auto vmbr0
iface vmbr0 inet static
    address 10.20.10.23/24
    gateway 10.20.10.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr1
iface vmbr1 inet6 static
    address fc00::83/128
    bridge-ports lo
    bridge-stp off
    bridge-fd 0

# 100G Mesh
post-up /usr/bin/systemctl restart frr.service

source /etc/network/interfaces.d/*
---

### Key Notes About the Configuration

- **100G Mesh Network**:
The 100G mesh network is configured as a `vmbr` (bridge interface), making it versatile for both general use and cluster operations. With dual 100G interfaces, I can use this network for tasks like RAM migration, which benefits greatly from the high bandwidth.

- **Unique Values Per Node**:
Each node has unique values for:
- `vmbr1`'s IPv6 address (`fc00::83/128` for this node).
- `net` in `frr.conf` (`49.0000.0000.0003.00` for this node).

- **IPv6 and IPv4**:
Although I’m using IPv6 primarily, this setup works for IPv4 as well.

- **Hosts File**:
Don’t forget to add the appropriate IP addresses and hostnames for all nodes in each node’s `/etc/hosts` file.
Code:
10.20.10.21 pve1
10.20.10.22 pve2
10.20.10.23 pve3
fc00::81 pve1
fc00::82 pve2
fc00::83 pve3
---

### Future Plans
I plan to explore using my 10G LACP-bonded network (main network) as a backup link for the mesh. This way, it can provide redundancy in case the mesh goes down.

---

This cleaned-up configuration works very well for me, and I hope it helps others set up similar networks. Let me know if you have suggestions for further improvements!
 
  • Like
Reactions: cbugk

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!