Ceph with 2 Cluster Networks

anowak

New Member
Mar 13, 2025
6
0
1
Hi All,

I was googling around and understood that it was best to have two fault domains when using Ceph. So I set up two separate networks. When using the Proxmox GUI it only asks for a single back-end cluster network, which is fine. I then go into the /etc/ceph/ceph.conf file and put
Code:
[global]
   ...
   ...
   cluster_network = 10.10.10.0/24, 10.10.20.0/24
   ...

Save and restart Ceph services

Code:
systemctl restart ceph-*

When I try adding OSD I get an error.

Code:
command '/sbin/ip address show to '10.10.10.0/26, 10.10.20.0/26' up' failed: exit code 1 (500)

Does this mean I can only configure it via the .conf file?
 
Hi abamalu,

Yes, I get that you need a public network which is the front end but what I to configure two cluster networks.
My systems have four NICs.. two in a bond for Management/Front-end Traffic and two on separate networks for back-end traffic

1776229846236.png

1776229993389.png

If this is not how Ceph is supposed to operate then I'll need to reconfigure the network to have Cluster network in a single subnet.

States in the Ceph documenation this should be possible
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#id3
 
Last edited:
Sorry that was a typo .... I was just giving an example and stuffed up the addressing... public network is 192.168.20.0/24

Just going through the doco again - is that best practice or is a bond a better idea?
 
With 2 cluster_network (like cluster_network = 10.10.10.0/24, 10.10.20.0/24, but ensure that the 2 networks are seperated) you are avoiding the single stream limit with LACP hashing. You should also have advantage for replication because there are more routes to your OSDs. Shoud one NIC be offline, then Ceph switch the traffic immediately.

I prefer the setup without bond.
 
rather then quoting, I'll try to address all possible alternatives.

ceph carries traffic on two seperate networks- public (host) and private (OSD-to-OSD.) Think of this as the host bus and disk bus on a RAID subsystem.

While you can have both comingled, they're technically two seperate traffic streams. If your question was with regards of carrying those forms of traffic seperately then yes, this is supported and recommended.

If, however, your intention is to have multiple subnets for the SAME traffic type... see this thread: https://forum.proxmox.com/threads/ceph-multi-public-network-setup-cephfs-on-separate-network.180546

tldr- its sorta possible but really not.

Next question- to LAG or not to LAG? that depends on your reason to do so. IF you have multiple switches and your intention is to create a HA path model, then definitely LAG. It also depends on how many physical interfaces are present on your hosts; if you have some hosts with 2 and some with 4 (as an example) then it only makes sense if you use a single lag across two interfaces on all hosts regardless of having additional ports available on some- and thats if each interface is attached to a different switch. Otherwise, separate networks on individual interfaces used for the disparate traffic types.
 
Thanks alexskysilk, I think I now understand.... I was stupid enough being new to Ceph asking AI for initial advice which led me down a rabbit hole of setting up a traditional storage with dual fault domains for storage back-end. Going through those links looks like Ceph OSD requires to talk on all paths so even if they are on separate subsets they need to be routed.

I've gone with what is shown in the Ceph documentation.

1776293691163.png

- 2x NIC for Public network in an LACP bond.
- 2x NIC for Cluster network in an LACP bond.

What I thought I wanted was to "avoid the single stream limit with LACP hashing" .... basically I wanted more paths but looks like this is not how Ceph works.

Appreciate everyone's input.
 
- 2x NIC for Public network in an LACP bond.
- 2x NIC for Cluster network in an LACP bond.
Just be sure you do NOT mix other traffic along with these, most especially corosync. if you have more then 4 interfaces keep the other forms of traffic on different interfaces. If you dont- consider only using two interfaces for ceph and two interfaces for other traffic.