[TUTORIAL] [Full mesh (routed setup) + EVPN] it is feasible even by using SDN!

vherrlein · Feb 18, 2024

Dear Community,

I'd like to share with you my recent discoveries.

For a while, I had few hardware components laying around to provide 10GB connectivity in between my cluster of 3 Proxmox servers.
Obviously, it was the time to make an upgrade from 2.5gb to 10 GB.
But unfortunately, I'm still waiting for an efficient and affordable 2.5/10GB switch on the market.

In the meantime, let's try to make the cluster with a full mesh (routed setup) connectivity and all VM bridged within an EVPN/VxLan managed by Proxmox SDN on top of it.

Note: The following guidelines requires some advanced networking knowledges, I tried to simplify as musch as possible.

Infrastructure

Rich (BB code):

                 ┌────────────────────────┐
                 │          Node1         │
                 ├────────┬────────┬──────┤
                 │enp2s0f0│enp2s0f1│ vmbr0├───────────────┐
                 └─────┬──┴──┬─────┴──────┘               |
                       │     │                            |
┌───────┬─────┐        │     │        ┌─────┬───────┐     |
│       │ eno1├────────┘     └────────┤eno1 │       │     |
│ Node2 ├─────┤                       ├─────┤ Node3 │     |
│       │ eno2├───────────────────────┤eno2 │       │     |
|       ├─────┤                       ├─────┤       |     |
│       |vmbr0|                       |vmbr0|       |     |
└───────┴──┬──┘                       └──┬──┴───────┘     |
           |                             |                |
           |                             |                |
           └───────┐        ┌────────────┘                |
                   |        |                             |
                   |        |        ┌────────────────────┘
                   |        |        |
                ┌────────────────────────┐
                │           SW           │
                └────────────────────────┘

Node Name	Management IP	NIC 1 Name	NIC 2 Name	NIC 3 Name
Node 1	192.168.0.100	vmbr0	enp2s0f0	enp2s0f1
Node 2	192.168.0.101	vmbr0	eno1	eno2
Node 3	192.168.0.102	vmbr0	eno1	eno2

Step 1: Prepare the underlying network with OpenFabric

Follow the Proxmox guide Full Mesh Network for Ceph Server with a few adaptations below.
OpenFabric extends the IS-IS protocol which provides an efficient link-state routing protocol between nodes without flooding the network.

According to the version of Proxmox, you may install FRR on each node with the following command.

Code:

apt install frr

Update the FRR daemons settings within "/etc/frr/daemons" to enable the OpenFabric deamon.

Code:

[...]
fabricd=yes
[...]

Important note: the FRR settings are overridden by Proxmox SDN, that's why it is "not" compatibe with Proxmox EVPN.
However, it's possible to add local settings which Proxmox SDN handle it well.

Create the local FRR config file and update PVE interface definition on each node according to the table below.

Node Name	Loopback IP (<lo_IP>)	OpenFabric Netword ID <o_NID>	NIC Name 1 (<NIC1>)	NIC Name 2 (<NIC2>)	NIC's MTU <MTU>
Node 1	172.16.0.1/32	49.0001.1111.1111.1111.00	enp2s0f0	enp2s0f1	9000
Node 2	172.16.0.2/32	49.0001.2222.2222.2222.00	eno1	eno2	9000
Node 3	172.16.0.3/32	49.0001.3333.3333.3333.00	eno1	eno2	9000

Update "/etc/frr/frr.conf" and create "/etc/frr/frr.conf.local" based on the following template on all nodes:

Code:

interface lo
 ip address <lo_IP>
 ip router openfabric 1
 openfabric passive
!
interface <NIC1>
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
interface <NIC2>
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
line vty
!
router openfabric 1
 net <o_NID>
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180

Update "/etc/network/interfaces" based on the following on all nodes:

Code:

[...]
auto <NIC1>
iface <NIC1> inet static
        mtu <MTU>

auto <NIC2>
iface <NIC2> inet static
        mtu <MTU>
[...]
post-up /usr/bin/systemctl restart frr.service

source /etc/network/interfaces.d/*

Note: Adjust the MTU according to the lower capability of all your inter-connected NICs.

Apply all changes by running the following command without rebooting on all nodes.

Bash:

ifreload -a

Check the results within on of your node with the FFR command.

Bash:

vtysh -c 'show openfabric route'

Code:

Area 1:
IS-IS L2 IPv4 routing table:

 Prefix         Metric  Interface  Nexthop     Label(s)
 --------------------------------------------------------
 172.16.0.1/32  0       -          -           -
 172.16.0.2/32  20      enp2s0f0   172.16.0.2  -
 172.16.0.3/32  20      enp2s0f1   172.16.0.3  -

Step 2: Setup you EVPN

2.1 - Create an EVPN Controler

In the background, an EVPN Controller is a BGP instance which manage routes within tunnelled networks (in that case VxLan nets).

Open your Proxmox Admin web UI
Open Datacenter > SDN > Options section
Add an EVPN Controller
- ID: myEVPN
- ASN: 65000
  (BGP ASN number must be within a private range not already used within your network)
- Peers: 172.16.0.1, 172.16.0.2, 172.16.0.3
  (All node loopback IPs)

cf. Proxmox SDN documentation SDN Controllers - EVPN Controller

2.2 - Create an EVPN zone

An EVPN zone is a VxLan zone which the routing is handled by a EVPN Controller.

Open your Proxmox Admin web UI
Open Datacenter > SDN > Zones section
Add an EVPN zone
- ID: evpnPRD
- Controller: myEVPN
- VRF-VXLAN Tag: 10000
- MTU: 8950

cf. Proxmox SDN documentation SDN Controllers - EVPN Zone

Important Notes:

The EVPN "Primary Exit Node" seems to be required within the Web UI, select one of your nodes which will carry on outgoing EVPN traffic.
If you don't want PVE handles outgoing traffic directly, make sure you do not configure any related VNet's subnet gateway.
Adjust the MTU according to your NIC's MTU defined previously minus 50 bytes if under IPv4, minus 70 under IPv6.
MTU Considerations for VXLAN

2.3 - Create a VxLan VNet

Open your Proxmox Admin web UI
Open Datacenter > SDN > VNets section
Add a VNet
- Name: vxnet1
- Zone: evpnPRD
- Tag: 10500 (VxLAN ID)

cf. Proxmox SDN documentation SDN Controllers - VNets

2.4 - Add subnets within your VxLan VNet

Follow the Proxmox SDN documentation SDN Controllers - Subnets

Important Note: If you don't want PVE handles outgoing traffic directly, make sure you do not configure any related VNet's subnet gateway.

2.5 - Apply SDN changes to all your nodes

Open your Proxmox Admin web UI
Open Datacenter > SDN
Click on Apply

Wait till all changes are applied to your nodes.

2.6 - Fixing up FRR config

The Proxmox SDN EVPN plugin seems not resolving properly loopback IPs provided within the EVPN Controller which results on messing up the FRR config file.

Within each node, update "/etc/frr/frr.conf" as the following based on the Step 1 Table.

bgp router-id XXXX.XXXX.XXXX => must be the IP of the lookback address defined for OpenFabric
neighbor XXXX.XXXX.XXXX => One line per remaining neighbor

Sample for Node 1:

Code:

[...]
interface lo
 ip address 172.16.0.1/32
 ip router openfabric 1
 openfabric passive
[...]
router bgp 65000
 bgp router-id 172.16.0.1
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 no bgp default ipv4-unicast
 coalesce-time 1000
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65000
 neighbor VTEP bfd
 neighbor 172.16.0.2 peer-group VTEP
 neighbor 172.16.0.3 peer-group VTEP
[...]
router bgp 65000 vrf vrf_evpnPRD
 bgp router-id 172.16.0.1
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
exit
[...]

Step 3: Check connectivity

Ping from one node all other nodes with their loopback IPs defined in Step 1

Check EVPN Controller with the command: vtysh -c 'show bgp summary'
You should see all neighbors, example from Node 1:

Code:

L2VPN EVPN Summary (VRF default):
BGP router identifier 172.16.0.1, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 11, using 2112 bytes of memory
Peers 2, using 1449 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor           V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
Node2(172.16.0.2) 4      65000     50922     50841        0    0    0 1d18h20m            5        4 N/A
Node3(172.16.0.3) 4      65000     50886     50834        0    0    0 1d18h20m            8        4 N/A

Total number of neighbors 2

Create VMs (KVM or LXC) on each node
1. Attach their NICs to vxnet1
2. Assign manually an IP within the range defined into the Step 2.4
3. Try to ping each VMs

Side notes

Packets lost

In case of disappearing packets or wrong CRC checks within virtualized machines, check your NIC hardware on each node, it's important to know that some cards integrate nice features regarding to tunneled links but it can be a nightmare to troubleshoot if they are not identical in between nodes (eg. Intel vs Mellanox vs Emulex).
In my case, I was playing with virtual routers (Vyos, Bird, Calico, Celium) for a complex topology on top on the PVE VxLan's which involves BGP, VRRP, LDP, eBPF, an so on.
I had 2 PVE Hosts with an Intel x520 (chip: Intel 82599ES) and one with an HPE 557spf+ (chip: Emulex Skyhawk).
The root cause of my problems was the HPE NIC one "Node 1" which integrates the VxLan UDP checksum offload feature (rx-udp_tunnel-port-offload) whether the others don't.
To fix my issue, I had to disable it on the HPE NIC with the following command.

Bash:

ethtool -K enp2s0f0 rx-udp_tunnel-port-offload off
ethtool -K enp2s0f1 rx-udp_tunnel-port-offload off

Proxmox firewall & OpenVSwitch conflicts

Attention, if you use Proxmox firewall, VM NICs will be handled by OpenVSwitch then your VM attached to EVPN VNets will suffer of wired behaviors.

OpenFabric warnings

You will see some of the following messages from syslog.

Code:

fabricd[1234]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers

I didn't have enough time to troubleshoot that point.
That early draft-white-openfabric-06.txt protocol seems requiring a spine / leaf network topology.
As we are using OpenFabric with direct attached nodes and single loopback IPs, it could make sense that message shows up.

Have fun.

hvisage · Apr 10, 2024

Instead of OpenFabric, have you tried this with ISIS?

vherrlein · Apr 14, 2024

It was mainly to try OpenFabric as IGP protocol.
Of course I could use ISIS or even OSPF with their drawbacks too.
In my case, OpenFabric is most flexible protocol for future topology changes and more import without flooding the network.

hvisage · Apr 14, 2024

Question: Have you tested/used OpenFabric with more than 2 nodes on the same interface (ie. a vlan/ethernet switch)? ran into a problem which I suspect makes it outside of OpenFabrics design criteria, so as I've implemented your stuff, it worked, but then found troubles on the links where there are 3 nodes on the same link.

vherrlein · Apr 14, 2024

Well in terms of topology if you have a look on the first diagram, it’s the case but with 2 links.
In your case if I understand well, you have a single hardware link on each node connected to a switch, if so, your setup would be even simpler by assigning the IP to your NIC instead of the loop back one, and not using OpenFabric.
Then your switch will do the rest by broadcasting ARP requests.
As reminder the IGP is there only for the keeping the reachability in between nodes within an “unusual” network infrastructure .

gpoudrel · Apr 15, 2024

Hello,

Your tutorial is really interesting. I applied it.

About "2.6 - Fixing up FRR config", I did not have to do this, it seems it does now work.
About "OpenFabric: Could not find two T0 routers", you can get ride of this message by adding "fabric-tier 0" in router openfabric 1 section.
As your nodes are on the same tier, 0 is probably the good numbers. If you had nodes + upstream exit-nodes (it is my case), you can do fabric-tier 1 for nodes and fabric-tier 0 for exit-nodes.

Code:

router openfabric 1
[...]
fabric-tier 0

vherrlein · Apr 15, 2024

@gpoudrel nice catch.
I did the change into the local conf "/etc/frr/frr.conf.local" of each node and applied again Proxmox SDN then it worked fine

Unfortunatelly, I can't update the original post with your discovery.

According to OpenFabric specs, it means nodes with tier 0 are at the edge of the network, which in our case of full mesh topology is perfect.
Just a warning, the specs requies 2 tier 0 routers in order to calculate properly routers locations and reduce flooding.
https://datatracker.ietf.org/doc/html/draft-white-openfabric-06#section-4

The Proxmox guide Full Mesh Network for Ceph Server should be updated also.

hvisage · Apr 15, 2024

vherrlein said:
As reminder the IGP is there only for the keeping the reachability in between nodes within an “unusual” network infrastructure .

True,
I'm looking at it from the perspective of a switch/network link failure , so that is an extra link...

Og, OpenFabric routing protocol doesn't work with more than 2 nodes on the same link - design criteria

hvisage · Apr 20, 2024

vherrlein said:
As reminder the IGP is there only for the keeping the reachability in between nodes within an “unusual” network infrastructure .

Yes 100% correct!
But having only a single switch and LACP/etc. stil is a SPOF on the switch, thus the idea is/was to have the direct linked interfaces too, as we are moving towards a leaf-spline/"CLOS". Even more "fun" is when that swith ports dis connect, you "lose" (interface down) the IP etc.

So yes, it is more understanding the use and design criterias of each IGP

the one thing I only (the day before last) noticed, is that the OpenFabric DRAFT specifically mentioned in there the removals of IS-IS specifics as its criteria is direct links only, no "broadcast links"

(I missed it the first scan of the draft ;( )

vherrlein · Apr 20, 2024

hvisage said:
the one thing I only (the day before last) noticed, is that the OpenFabric DRAFT specifically mentioned in there the removals of IS-IS specifics as its criteria is direct links only, no "broadcast links" (I missed it the first scan of the draft ;( )

For being even more precise :

Data center network fabrics only contain point-to-point links; because of this, there is no reason to support any broadcast link types […]

https://datatracker.ietf.org/doc/html/draft-white-openfabric-06#section-2.2

telvenes · Dec 19, 2024

I have a 3-node cluster, and I followed your instructions, and everything works perfectly.

But I have another problem: I have a k3s Kubernetes cluster using the ceph-csi storage plugin. I really want to access the cluster network of Ceph from my VMs. The way I’ve done it now is by adding a route from the cluster IP to the management IP in the VM to access it, but this limits me to 1 Gbit instead of the much faster FRR network I have, which is 100 Gbit and is crucial for storage performance. Does anyone have suggestions on how this can be resolved?

vherrlein · Jan 6, 2025

@telvenes, make sure :
1. The Ceph Public Network is defined within a vNet associated to the Zone (evpn) defined for FRR
2. All your K3s Node must have a nic connected to a vNet associated to the same or another Zone (evpn) defined for FRR
3. If K3s node's nics subnet differs to the Ceph Public Network, and you enabled nftables on all of your Proxmox nodes, add a VNet Firewall forward rule in between VNet

Simplest way but less secure, add a NIC to all your K3s nodes onto the same VNet (and subnet) of your Ceph Public Network which must be within the evpn zone.

BTW, in terms of infrastructure design, it's always a good practice to separate the public traffic (eg. web front, services, ...) of the private one (eg. remote storage) which you can optimize for, even in a virtual environment.

mrkhachaturov · Jan 9, 2025

@vherrlein Thank you for the detailed instructions. I am currently working on my new homelab and have encountered some difficulties. The network you described is being used for Ceph, and I would like to understand how to best organize access to this network from other virtual machines or the external local network.

Lab Equipment Overview

Router:
Mikrotik CCR2004-1G-12S-2XS

Switches:

SW01: Mikrotik CRS504-4XQ-IN, equipped with four XQ+BC0003-XS+ cables
SW02: Mikrotik CRS310-8G-25+IN
SW03: Mikrotik CRS310-8G-25+IN

Proxmox Hosts:
Minisforum MS-01
I have successfully deployed a Proxmox cluster comprising six Minisforum MS-01 machines. Each host is outfitted with dual 10G NICs, dual 2.5G NICs, and dual Thunderbolt 4 NICs, effectively functioning as 25G network interface cards (NICs). Notably, one of the 2.5G NICs is connected to the switches as an access port for Intel vPro, while all other ports are connected to the switches' trunk ports.

In addition to this setup, I have configured three VLAN networks:

VLAN 30 is designated for Proxmox Corosync on the 2.5G port.
VLAN 60 and VLAN 70 utilize the bonded 10G NIC ports for Virtual Machines (VMs) and Kubernetes (K8s).

SW01 is connected to the router using two 25G ports configured in a bond (802.3ad, Layer 3+4). Additionally, SW02 and SW03 are linked via two 10G ports, also set up in a bonding configuration.
The router acts as a DHCP server and serves as the main gateway to the internet.

----
Following this guide I configured Thunderbolt networking

Bash:

root@pve01:~# vtysh -c 'show openfabric route'
Area 1:
IS-IS L2 IPv4 routing table:

 Prefix        Metric  Interface  Nexthop    Label(s)
 ------------------------------------------------------
 10.0.0.81/32  0       -          -          -      
 10.0.0.82/32  20      en06       10.0.0.82  -      
 10.0.0.83/32  30      en06       10.0.0.82  -      
 10.0.0.84/32  20      en05       10.0.0.84  -      
 10.0.0.85/32  30      en05       10.0.0.84  -      
 10.0.0.86/32  40      en05       10.0.0.84  -      
                       en06       10.0.0.82  -      

IS-IS L2 IPv6 routing table:

root@pve01:~# vtysh -c 'show bgp summary'

L2VPN EVPN Summary (VRF default):
BGP router identifier 10.0.0.81, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 17, using 3264 bytes of memory
Peers 5, using 3623 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor         V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
pve02(10.0.0.82) 4      65000       366       366        0    0    0 00:18:07            1        1 N/A
pve03(10.0.0.83) 4      65000       371       368        0    0    0 00:15:53            1        1 N/A
pve04(10.0.0.84) 4      65000       366       367        0    0    0 00:18:05            1        1 N/A
pve05(10.0.0.85) 4      65000       370       371        0    0    0 00:15:39            1        1 N/A
pve06(10.0.0.86) 4      65000       377       373        0    0    0 00:15:48            1        1 N/A

Total number of neighbors 5

___
Proxmox hosts can ping each other using their loopback interface IP addresses (I have used 10.0.0.8x instead of 172.16.0.x).

SDN configuration is new for me in Proxmox, and I am having difficulty understanding the response to @telvenes' question.

1. The Ceph Public Network is defined within a vNet associated to the Zone (evpn) defined for FRR
2. All your K3s Node must have a nic connected to a vNet associated to the same or another Zone (evpn) defined for FRR
3. If K3s node's nics subnet differs to the Ceph Public Network, and you enabled nftables on all of your Proxmox nodes, add a VNet Firewall forward rule in between VNet

I have defined for Ceph cluster and public network ip range of my loopback interface:

YAML:

[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 10.0.0.81/24
    fsid = 7b54fd80-5f11-416b-ad5b-6e0ce7cb0694
    mon_allow_pool_delete = true
    mon_host = 10.0.0.81 10.0.0.82 10.0.0.83 10.0.0.84 10.0.0.85 10.0.0.86
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 10.0.0.81/24

If I create a subnet in a VXLAN VNet, for example, 192.168.40.0/24, how can I define IP addresses for the Proxmox hosts within this subnet? As I understand it, SDN is primarily focused on VMs and does not extend to the Proxmox hosts themselves.

Or do you mean to define a subnet that includes the loopback IP interface addresses?

Could you please provide some details on how to configure the subnet and integrate it with Ceph?

Thank you in advance!

vherrlein · Jan 9, 2025

Hi @mrkhachaturov,
Thanks for your feedback, the advice given to @telvenes was not totally clear for the point 1.
By answering your questions, I hope it would be clearer what "The Ceph Public Network is defined within a vNet associated to the Zone (evpn) defined for FRR" imply.

mrkhachaturov said:
If I create a subnet in a VXLAN VNet, for example, 192.168.40.0/24, how can I define IP addresses for the Proxmox hosts within this subnet?

The easy way is giving an IP to the gateway vNet (eg. 192.168.40.254), not used by an existing VM, then the SDN will assign that IP to all your Proxmox nodes within the eVPN VRF.

Therefore, you'll be able to update your Ceph config file such as:

INI:

[global]
...
public_network = 10.0.0.81/24, 192.168.40.0/24

cf: [Red Hat] Ceph network configuration - Configuring multiple public networks to the cluster

The advantage, single IP reaching the local PVE host wherever the VM is, avoiding cross network calls.

mrkhachaturov said:
As I understand it, SDN is primarily focused on VMs and does not extend to the Proxmox hosts themselves.

Or do you mean to define a subnet that includes the loopback IP interface addresses?

It could be the second approach, more complex as you should care manually dummy iface on each PVE host to make sure your dummy IP is well associated to the relevant eVPN VRF.

For your information: for each eVPN zone created by the SDN, a VRF is created on each PVE node behind the scene by Proxmox via FFR
You can check by using one the following commands.

Bash:

ip vrf show

#or

vtysh -c 'show vrf'

Hope it would help.

mrkhachaturov · Jan 9, 2025

Hi @vherrlein,

Thank you for your feedback.

Proxmox does not provide a specific guide for configuring multiple public Ceph networks. However, we can either define multiple networks before bootstrapping Ceph or add them afterward.
cf: [IBM] Configuring multiple public networks to the cluster

If we specify additional public networks in the ceph.conf file before creating the monitors, the monitors will automatically acquire IP addresses from the defined public network subnets.

Conversely, if we add additional public networks to the ceph.conf file after the bootstrap and the monitors have already been created, we will need to remove and recreate the monitors for each host.

Once this is done, executing the ceph mon stat command will show that the monitors are available on each host with the corresponding public network IP addresses.

Below are my examples of defining additional public networks (VLANs 60 and 80):

After executing the command pveceph init --network 10.0.0.81/24, I edited the ceph.conf file to include the additional public networks.

INI:

  GNU nano 7.2                                                                                           ceph.conf                                                                                                  
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.0.0.81/24
        fsid = 5576dc38-3708-4536-8dad-bf709a212bcc
        mon_allow_pool_delete = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.0.0.81/24, 10.1.60.1/24, 10.1.80.1/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

After creating the monitor on the pve01 via the web UI, we can run the command ceph mon stat in the shell to verify the setup:

Bash:

root@pve01:/etc/ceph# ceph mon stat
e1: 1 mons at {pve01=[v2:10.0.0.81:3300/0,v1:10.0.0.81:6789/0,v2:10.1.60.1:3300/0,v1:10.1.60.1:6789/0,v2:10.1.80.1:3300/0,v1:10.1.80.1:6789/0]} removed_ranks: {} disallowed_leaders: {}, election epoch 3, leader 0 pve01, quorum 0 pve01

System also changed ceph.conf automatically:

INI:

root@pve01:/etc/ceph# cat ceph.conf
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 10.0.0.81/24
    fsid = 9bcee10a-e2fa-45cf-8308-1e834bc24881
    mon_allow_pool_delete = true
    mon_host =  10.0.0.81 10.1.60.1 10.1.80.1
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 10.0.0.81/24, 10.1.60.1/24, 10.1.80.1/24

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.pve01]

I provided this context in order to ask my question:

The easy way is giving an IP to the gateway vNet (eg. 192.168.40.254), not used by an existing VM, then the SDN will assign that IP to all your Proxmox nodes within the eVPN VRF.
Attention: All Proxmox nodes will have the same IP and act as a gateway, meaning if a VM uses that IP as default route it will be forwarded to the default route of the Proxmox host. To harden, add relevant firewall rules at VNet and/or PVE host level.

Ceph cannot obtain any IP address from the 192.168.40.0/24 subnet because the Proxmox host does not have any available addresses in this range. As a result, I cannot use this network to connect to Ceph, for example, from a K8S VM via Ceph-CSI.

As I can guess, the only option would be this one:

It could be the second approach, more complex as you should care manually dummy iface on each PVE host to make sure your dummy IP is well associated to the relevant eVPN VRF.

Can you please provide an example of how to create a dummy interface on the PVE host on top of the network described in this guide?

Thank you very much!

vherrlein · Jan 9, 2025

According to your screenshoot

mrkhachaturov said:
View attachment 80468

the additionnal public network to add into the `ceph.conf` is 10.0.0.0/24

INI:

[global]
...
  public_network = 10.0.0.81/24, 10.1.60.1/24, 10.1.80.1/24, 10.0.0.0/24
...

Regarding to your question on how to create a dummy iface on each PVE node, here a sample for one host based on your screenshoots

Code:

# nano /etc/network/interfaces.d/zDummies

iface vxnet1_dummy0 inet static
    address 10.0.0.5/24
    hwaddress 0A:BC:11:BA:1C:81
    bridge_ports vxlan_vxnet1
    bridge_stp off
    bridge_fd 0
    mtu 8950
    ip-forward on
    arp-accept on
    vrf vrf_evpnPRD
    pre-up ip link add vxnet1_dummy0 type dummy

# ifreload -a

Note: You should adapt the MTU according to your underlaying network.
In my homelab, the MTU of OpenFabric is 9000, so the MTU for the VXLAN on top of it is 8950.

mrkhachaturov · Jan 10, 2025

@vherrlein, thank you.

[TUTORIAL] [Full mesh (routed setup) + EVPN] it is feasible even by using SDN!

Member

Infrastructure​

Step 1: Prepare the underlying network with OpenFabric​

Step 2: Setup you EVPN​

2.1 - Create an EVPN Controler​

2.2 - Create an EVPN zone​

2.3 - Create a VxLan VNet​

2.4 - Add subnets within your VxLan VNet​

2.5 - Apply SDN changes to all your nodes​

2.6 - Fixing up FRR config​

Step 3: Check connectivity​

Side notes​

Packets lost​

Proxmox firewall & OpenVSwitch conflicts​

OpenFabric warnings​

Renowned Member

Member

Renowned Member

Member

New Member

Member

Renowned Member

Renowned Member

Member

Active Member

Member

New Member

Lab Equipment Overview​

Member

New Member

Member

New Member

We value your privacy

Infrastructure

Step 1: Prepare the underlying network with OpenFabric

Step 2: Setup you EVPN

2.1 - Create an EVPN Controler

2.2 - Create an EVPN zone

2.3 - Create a VxLan VNet

2.4 - Add subnets within your VxLan VNet

2.5 - Apply SDN changes to all your nodes

2.6 - Fixing up FRR config

Step 3: Check connectivity

Side notes

Packets lost

Proxmox firewall & OpenVSwitch conflicts

OpenFabric warnings

Lab Equipment Overview