Trying to configure a failover for CEPH

ConnectedEarth · Oct 30, 2023

Hi,
I have a 3 node Proxmox cluster in a data center setup. They are connected via public gateway to internet over 1 GBPS connection. Also these hosts are equipped with 10GBPS connection and connected to a private switch. Please see attached diagram.

I am planning to configure CEPH private network over 10 GPBS private link.

What I am trying to achieve:

A redundancy for CEPH private network. If 10 GBPS link fails, then the traffic should move over to the 1 GBPS link automatically.

After reading through various threads, I tried implementing FRR and OpenFabric to achieve this.

Here is my configuration of Open Fabric

Node 2

Code:

Current configuration:
!
frr version 8.4.4
frr defaults traditional
hostname CE-FS-Node2
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
interface eno4
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface enp129s0
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface lo
 ip router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0001.2222.2222.2222.00
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180
exit
!
end

Node 1

Code:

Current configuration:
!
frr version 8.4.4
frr defaults traditional
hostname CE-FS-Node1
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
interface eno3
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface enp129s0
 ip router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface lo
 ip router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0001.1111.1111.1111.00
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180
exit
!
end

Here is the /etc/network/interfaces

Node 2

Code:

auto lo
iface lo inet loopback
auto lo:0
iface lo:0 inet static
        address 10.15.15.6




iface eno4 inet manual


iface eno3 inet manual


iface eno1 inet manual


iface eno2 inet manual


auto enp129s0
iface enp129s0 inet manual


auto vmbr0
iface vmbr0 inet static
        address xx.xxx.xxx.xx/29
        gateway xx.xxx.xxx.xx
        bridge-ports eno4
        bridge-stp off
        bridge-fd 0


auto vmbr1
iface vmbr1 inet static
        address 10.0.0.6/24
        bridge-ports enp129s0
        bridge-stp off
        bridge-fd 0
#10 GPBPS Private LAN


auto vmbr2
iface vmbr2 inet static
        address 192.168.1.10/24
        bridge-ports none
        bridge-stp off
        bridge-fd 0


post-up   echo 1 > /proc/sys/net/ipv4/ip_forward
post-up   iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE

Node 1

Code:

auto lo
iface lo inet loopback
auto lo:0
iface lo:0 inet static
        address 10.15.15.3




iface eno4 inet manual


iface eno3 inet manual


iface eno1 inet manual


iface eno2 inet manual


auto enp129s0
iface enp129s0 inet manual


auto vmbr0
iface vmbr0 inet static
        address xx.xxx.xxx.99/29
        gateway xx.xxx.xxx.97
        bridge-ports eno4
        bridge-stp off
        bridge-fd 0


auto vmbr1
iface vmbr1 inet static
        address 10.0.0.6/24
        bridge-ports enp129s0
        bridge-stp off
        bridge-fd 0
#10 GPBPS Private LAN


auto vmbr2
iface vmbr2 inet static
        address 192.168.1.10/24
        bridge-ports none
        bridge-stp off
        bridge-fd 0


post-up   echo 1 > /proc/sys/net/ipv4/ip_forward
post-up   iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE

And they these nodes are showing up on the network topology

from Node 2

Code:

root@CE-FS-Node2:~# vtysh -c "show openfabric topology"
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
CE-FS-Node2                                                           
10.15.15.6/32        IP internal  0                                     CE-FS-Node2(4)
CE-FS-Node1          TE-IS        10     CE-FS-Node1          enp129s0  CE-FS-Node2(4)
                                         CE-FS-Node1          eno4     
10.15.15.3/32        IP TE        20     CE-FS-Node1          enp129s0  CE-FS-Node1(4)
                                         CE-FS-Node1          eno4     

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

But when I ping from Node2 to Node1, I get destination host unreachable error

Code:

root@CE-FS-Node2:~# ping 10.15.15.3
PING 10.15.15.3 (10.15.15.3) 56(84) bytes of data.
From 10.15.15.6 icmp_seq=1 Destination Host Unreachable
From 10.15.15.6 icmp_seq=2 Destination Host Unreachable
From 10.15.15.6 icmp_seq=3 Destination Host Unreachable

If someone can help point out the issue in my configuration, that will be really appreciated.

Perhaps, the way I am trying to configure will not work as these machines are connected to switches. Is that what is wrong here? If so, what else can I use to achieve desired effect?

Thanks

cfgmgr · Oct 30, 2023

My two cents - Is your 10G NIC redundant or is each of those links to the private switch a single link?

If so I would invest in a dual port card, if and only if your switches are in some sort of MLAG (or whatever other comparable technology for whatever your switch stack is).

That way you can loose a link and or a switch and be fine. I'm not sure if you can do what you are looking for but I'll let others chime in.

Also ceph over a 1G NIC would... probably grind to a halt. We had issues even with 10G but we had a full NVMe setup so it wasn't hard to burry the link.

ConnectedEarth · Oct 30, 2023

Thanks for the suggestion! I am totally noob and had no idea that even 10 GBPS is not enough, just started getting into CEPH. Maybe, I should just stick with simple cluster and not got to CEPH.

Yes, they are single link.

cfgmgr · Oct 30, 2023

Ceph is fine. If you have SSDs (which I would recommend) its pretty easy to fill the pipe with ceph traffic and or VM traffic.

1GB links aren't much and I'd personally avoid if possible.

We have 100GB links and between vmotions/ceph traffic during maintenance - we will bust to 8GBp/s.

ConnectedEarth · Oct 30, 2023

Yes. each server is equipped with 2 TB SSDs. The failover I am trying to achieve is to allow for some breathing room. Still not sure, why I cant ping each other, something I am sure a noob mistake I am doing, that I do not have enough understanding to figure it out.

Search

Search

Trying to configure a failover for CEPH

ConnectedEarth

New Member

cfgmgr

Member

ConnectedEarth

New Member

cfgmgr

Member

ConnectedEarth

New Member