A preamble for anyone interested: Ceph uses two networks for communication:
- public_network where clients can talk to the cluster, this is where monitors, clients and OSDs communicate though
- cluster_network where internal inter-OSD communication is done, e.g. for replication and heartbeat. By default its set to use the same subnet as the public network, but optionally one could use a different subnet for performance reasons.
See [1] for more details.
To change Ceph's cluster network:
- Make sure each pair of nodes can ping each other on the desired subnet.
- Change the cluster_network in /etc/ceph/ceph.conf. Since ceph.conf is a symlink to the replicated /etc/pve cluster filesystem, this change will be replicated to all cluster nodes. Make sure you don't edit the IPs of the monitors on this step.
- To see if the change is picked up properly, restart a single OSD (systemctl restart ceph-osd@<ID>) and check that it is recognized as `up` and `in`. If this works, restart all OSDs on a node, and check that they are recognized as up and in. You can check whether the services are listening in the correct net by running `ss -tulpn | grep ceph`. Repeat this with the OSDs on all nodes.
For a production cluster, it is advisable to be careful and make sure that the cluster health returns to HEALTH_OK after each step, do note that it might take a few seconds for Ceph to adjust itself to the changes and report HEALTH_OK. As long as you have multiple active and reachable monitors at all times, the Ceph cluster should stay operational.
[1]
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/
Hello,
I am trying to follow along with what you posted, but I am not sure it is working correctly for me. After restarting all of my OSD's I have the following output from "ss -tulpn | grep ceph". Not sure how to read this. Are some OSDs working with the cluster network, and some are not?
Advice appreciated.
tcp LISTEN 0 512 206.180.209.205:3300 0.0.0.0:* users
("ceph-mon",pid=2179,fd=26))
tcp LISTEN 0 512 206.180.209.205:6810 0.0.0.0:* users
("ceph-osd",pid=738815,fd=19))
tcp LISTEN 0 512 206.180.209.205:6811 0.0.0.0:* users
("ceph-osd",pid=738157,fd=19))
tcp LISTEN 0 512 206.180.209.205:6808 0.0.0.0:* users
("ceph-osd",pid=738157,fd=18))
tcp LISTEN 0 512 206.180.209.205:6809 0.0.0.0:* users
("ceph-osd",pid=739458,fd=19))
tcp LISTEN 0 512 206.180.209.205:6814 0.0.0.0:* users
("ceph-osd",pid=736985,fd=18))
tcp LISTEN 0 512 206.180.209.205:6815 0.0.0.0:* users
("ceph-osd",pid=739458,fd=23))
tcp LISTEN 0 512 206.180.209.205:6812 0.0.0.0:* users
("ceph-osd",pid=739458,fd=22))
tcp LISTEN 0 512 206.180.209.205:6813 0.0.0.0:* users
("ceph-osd",pid=738815,fd=22))
tcp LISTEN 0 512 206.180.209.205:6802 0.0.0.0:* users
("ceph-osd",pid=737743,fd=18))
tcp LISTEN 0 512 206.180.209.205:6803 0.0.0.0:* users
("ceph-osd",pid=737743,fd=19))
tcp LISTEN 0 512 206.180.209.205:6800 0.0.0.0:* users
("ceph-mgr",pid=2178,fd=27))
tcp LISTEN 0 512 206.180.209.205:6801 0.0.0.0:* users
("ceph-mgr",pid=2178,fd=28))
tcp LISTEN 0 512 206.180.209.205:6806 0.0.0.0:* users
("ceph-osd",pid=739458,fd=18))
tcp LISTEN 0 512 206.180.209.205:6807 0.0.0.0:* users
("ceph-osd",pid=738815,fd=18))
tcp LISTEN 0 512 206.180.209.205:6804 0.0.0.0:* users
("ceph-osd",pid=737743,fd=22))
tcp LISTEN 0 512 206.180.209.205:6805 0.0.0.0:* users
("ceph-osd",pid=737743,fd=23))
tcp LISTEN 0 512 206.180.209.205:6789 0.0.0.0:* users
("ceph-mon",pid=2179,fd=27))
tcp LISTEN 0 512 206.180.209.205:6832 0.0.0.0:* users
("ceph-osd",pid=739152,fd=22))
tcp LISTEN 0 512 206.180.209.205:6833 0.0.0.0:* users
("ceph-osd",pid=739152,fd=23))
tcp LISTEN 0 512 206.180.209.205:6826 0.0.0.0:* users
("ceph-osd",pid=740230,fd=18))
tcp LISTEN 0 512 206.180.209.205:6827 0.0.0.0:* users
("ceph-osd",pid=740230,fd=19))
tcp LISTEN 0 512 206.180.209.205:6824 0.0.0.0:* users
("ceph-osd",pid=738030,fd=22))
tcp LISTEN 0 512 206.180.209.205:6825 0.0.0.0:* users
("ceph-osd",pid=738030,fd=23))
tcp LISTEN 0 512 206.180.209.205:6830 0.0.0.0:* users
("ceph-osd",pid=739152,fd=18))
tcp LISTEN 0 512 206.180.209.205:6831 0.0.0.0:* users
("ceph-osd",pid=739152,fd=19))
tcp LISTEN 0 512 206.180.209.205:6828 0.0.0.0:* users
("ceph-osd",pid=740230,fd=22))
tcp LISTEN 0 512 206.180.209.205:6829 0.0.0.0:* users
("ceph-osd",pid=740230,fd=23))
tcp LISTEN 0 512 206.180.209.205:6818 0.0.0.0:* users
("ceph-osd",pid=736985,fd=22))
tcp LISTEN 0 512 206.180.209.205:6819 0.0.0.0:* users
("ceph-osd",pid=738815,fd=23))
tcp LISTEN 0 512 206.180.209.205:6816 0.0.0.0:* users
("ceph-osd",pid=736985,fd=19))
tcp LISTEN 0 512 206.180.209.205:6817 0.0.0.0:* users
("ceph-osd",pid=738157,fd=22))
tcp LISTEN 0 512 206.180.209.205:6822 0.0.0.0:* users
("ceph-osd",pid=738030,fd=18))
tcp LISTEN 0 512 206.180.209.205:6823 0.0.0.0:* users
("ceph-osd",pid=738030,fd=19))
tcp LISTEN 0 512 206.180.209.205:6820 0.0.0.0:* users
("ceph-osd",pid=736985,fd=23))
tcp LISTEN 0 512 206.180.209.205:6821 0.0.0.0:* users
("ceph-osd",pid=738157,fd=23))
tcp LISTEN 0 512 192.168.1.2:6830 0.0.0.0:* users
("ceph-osd",pid=739152,fd=24))
tcp LISTEN 0 512 192.168.1.2:6831 0.0.0.0:* users
("ceph-osd",pid=739152,fd=25))
tcp LISTEN 0 512 192.168.1.2:6828 0.0.0.0:* users
("ceph-osd",pid=739152,fd=20))
tcp LISTEN 0 512 192.168.1.2:6829 0.0.0.0:* users
("ceph-osd",pid=739152,fd=21))
tcp LISTEN 0 512 192.168.1.2:6826 0.0.0.0:* users
("ceph-osd",pid=740230,fd=24))
tcp LISTEN 0 512 192.168.1.2:6827 0.0.0.0:* users
("ceph-osd",pid=740230,fd=25))
tcp LISTEN 0 512 192.168.1.2:6824 0.0.0.0:* users
("ceph-osd",pid=740230,fd=20))
tcp LISTEN 0 512 192.168.1.2:6825 0.0.0.0:* users
("ceph-osd",pid=740230,fd=21))
tcp LISTEN 0 512 192.168.1.2:6822 0.0.0.0:* users
("ceph-osd",pid=738030,fd=24))
tcp LISTEN 0 512 192.168.1.2:6823 0.0.0.0:* users
("ceph-osd",pid=738030,fd=25))
tcp LISTEN 0 512 192.168.1.2:6820 0.0.0.0:* users
("ceph-osd",pid=738030,fd=20))
tcp LISTEN 0 512 192.168.1.2:6821 0.0.0.0:* users
("ceph-osd",pid=738030,fd=21))
tcp LISTEN 0 512 192.168.1.2:6818 0.0.0.0:* users
("ceph-osd",pid=736985,fd=25))
tcp LISTEN 0 512 192.168.1.2:6819 0.0.0.0:* users
("ceph-osd",pid=738157,fd=25))
tcp LISTEN 0 512 192.168.1.2:6816 0.0.0.0:* users
("ceph-osd",pid=736985,fd=24))
tcp LISTEN 0 512 192.168.1.2:6817 0.0.0.0:* users
("ceph-osd",pid=738157,fd=24))
tcp LISTEN 0 512 192.168.1.2:6814 0.0.0.0:* users
("ceph-osd",pid=736985,fd=21))
tcp LISTEN 0 512 192.168.1.2:6815 0.0.0.0:* users
("ceph-osd",pid=738815,fd=25))
tcp LISTEN 0 512 192.168.1.2:6812 0.0.0.0:* users
("ceph-osd",pid=736985,fd=20))
tcp LISTEN 0 512 192.168.1.2:6813 0.0.0.0:* users
("ceph-osd",pid=738815,fd=24))
tcp LISTEN 0 512 192.168.1.2:6810 0.0.0.0:* users
("ceph-osd",pid=739458,fd=24))
tcp LISTEN 0 512 192.168.1.2:6811 0.0.0.0:* users
("ceph-osd",pid=739458,fd=25))
tcp LISTEN 0 512 192.168.1.2:6808 0.0.0.0:* users
("ceph-osd",pid=738815,fd=21))
tcp LISTEN 0 512 192.168.1.2:6809 0.0.0.0:* users
("ceph-osd",pid=738157,fd=21))
tcp LISTEN 0 512 192.168.1.2:6806 0.0.0.0:* users
("ceph-osd",pid=738815,fd=20))
tcp LISTEN 0 512 192.168.1.2:6807 0.0.0.0:* users
("ceph-osd",pid=738157,fd=20))
tcp LISTEN 0 512 192.168.1.2:6804 0.0.0.0:* users
("ceph-osd",pid=739458,fd=20))
tcp LISTEN 0 512 192.168.1.2:6805 0.0.0.0:* users
("ceph-osd",pid=739458,fd=21))
tcp LISTEN 0 512 192.168.1.2:6802 0.0.0.0:* users
("ceph-osd",pid=737743,fd=24))
tcp LISTEN 0 512 192.168.1.2:6803 0.0.0.0:* users
("ceph-osd",pid=737743,fd=25))
tcp LISTEN 0 512 192.168.1.2:6800 0.0.0.0:* users
("ceph-osd",pid=737743,fd=20))
tcp LISTEN 0 512 192.168.1.2:6801 0.0.0.0:* users
("ceph-osd",pid=737743,fd=21))