Trying to configure a failover for CEPH

ConnectedEarth

New Member
Sep 23, 2023
3
0
1
Hi,
I have a 3 node Proxmox cluster in a data center setup. They are connected via public gateway to internet over 1 GBPS connection. Also these hosts are equipped with 10GBPS connection and connected to a private switch. Please see attached diagram.
Cluster Setup.png

I am planning to configure CEPH private network over 10 GPBS private link.

What I am trying to achieve:

A redundancy for CEPH private network. If 10 GBPS link fails, then the traffic should move over to the 1 GBPS link automatically.

After reading through various threads, I tried implementing FRR and OpenFabric to achieve this.

Here is my configuration of Open Fabric

Node 2

Code:
Current configuration:
!
frr version 8.4.4
frr defaults traditional
hostname CE-FS-Node2
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
interface eno4
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface enp129s0
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface lo
 ip router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0001.2222.2222.2222.00
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180
exit
!
end

Node 1

Code:
Current configuration:
!
frr version 8.4.4
frr defaults traditional
hostname CE-FS-Node1
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
interface eno3
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface enp129s0
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface lo
 ip router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0001.1111.1111.1111.00
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180
exit
!
end

Here is the /etc/network/interfaces

Node 2

Code:
auto lo
iface lo inet loopback
auto lo:0
iface lo:0 inet static
        address 10.15.15.6




iface eno4 inet manual


iface eno3 inet manual


iface eno1 inet manual


iface eno2 inet manual


auto enp129s0
iface enp129s0 inet manual


auto vmbr0
iface vmbr0 inet static
        address xx.xxx.xxx.xx/29
        gateway xx.xxx.xxx.xx
        bridge-ports eno4
        bridge-stp off
        bridge-fd 0


auto vmbr1
iface vmbr1 inet static
        address 10.0.0.6/24
        bridge-ports enp129s0
        bridge-stp off
        bridge-fd 0
#10 GPBPS Private LAN


auto vmbr2
iface vmbr2 inet static
        address 192.168.1.10/24
        bridge-ports none
        bridge-stp off
        bridge-fd 0


post-up   echo 1 > /proc/sys/net/ipv4/ip_forward
post-up   iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE

Node 1

Code:
auto lo
iface lo inet loopback
auto lo:0
iface lo:0 inet static
        address 10.15.15.3




iface eno4 inet manual


iface eno3 inet manual


iface eno1 inet manual


iface eno2 inet manual


auto enp129s0
iface enp129s0 inet manual


auto vmbr0
iface vmbr0 inet static
        address xx.xxx.xxx.99/29
        gateway xx.xxx.xxx.97
        bridge-ports eno4
        bridge-stp off
        bridge-fd 0


auto vmbr1
iface vmbr1 inet static
        address 10.0.0.6/24
        bridge-ports enp129s0
        bridge-stp off
        bridge-fd 0
#10 GPBPS Private LAN


auto vmbr2
iface vmbr2 inet static
        address 192.168.1.10/24
        bridge-ports none
        bridge-stp off
        bridge-fd 0


post-up   echo 1 > /proc/sys/net/ipv4/ip_forward
post-up   iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE


And they these nodes are showing up on the network topology

from Node 2

Code:
root@CE-FS-Node2:~# vtysh -c "show openfabric topology"
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
CE-FS-Node2                                                           
10.15.15.6/32        IP internal  0                                     CE-FS-Node2(4)
CE-FS-Node1          TE-IS        10     CE-FS-Node1          enp129s0  CE-FS-Node2(4)
                                         CE-FS-Node1          eno4     
10.15.15.3/32        IP TE        20     CE-FS-Node1          enp129s0  CE-FS-Node1(4)
                                         CE-FS-Node1          eno4     

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

But when I ping from Node2 to Node1, I get destination host unreachable error

Code:
root@CE-FS-Node2:~# ping 10.15.15.3
PING 10.15.15.3 (10.15.15.3) 56(84) bytes of data.
From 10.15.15.6 icmp_seq=1 Destination Host Unreachable
From 10.15.15.6 icmp_seq=2 Destination Host Unreachable
From 10.15.15.6 icmp_seq=3 Destination Host Unreachable

If someone can help point out the issue in my configuration, that will be really appreciated.

Perhaps, the way I am trying to configure will not work as these machines are connected to switches. Is that what is wrong here? If so, what else can I use to achieve desired effect?

Thanks
 
My two cents - Is your 10G NIC redundant or is each of those links to the private switch a single link?

If so I would invest in a dual port card, if and only if your switches are in some sort of MLAG (or whatever other comparable technology for whatever your switch stack is).

That way you can loose a link and or a switch and be fine. I'm not sure if you can do what you are looking for but I'll let others chime in.

Also ceph over a 1G NIC would... probably grind to a halt. We had issues even with 10G but we had a full NVMe setup so it wasn't hard to burry the link.
 
Thanks for the suggestion! I am totally noob and had no idea that even 10 GBPS is not enough, just started getting into CEPH. Maybe, I should just stick with simple cluster and not got to CEPH.

Yes, they are single link.
 
Ceph is fine. If you have SSDs (which I would recommend) its pretty easy to fill the pipe with ceph traffic and or VM traffic.

1GB links aren't much and I'd personally avoid if possible.

We have 100GB links and between vmotions/ceph traffic during maintenance - we will bust to 8GBp/s.
 
Yes. each server is equipped with 2 TB SSDs. The failover I am trying to achieve is to allow for some breathing room. Still not sure, why I cant ping each other, something I am sure a noob mistake I am doing, that I do not have enough understanding to figure it out.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!