How to change ceph Internal cluster network

maomaocake

Member
Feb 13, 2022
43
3
13
21
as the title suggest, how do I change ceph's internal cluster network? I just added a faster NIC and can't figure out how to get the cluster to change network
 
A preamble for anyone interested: Ceph uses two networks for communication:
  • public_network where clients can talk to the cluster, this is where monitors, clients and OSDs communicate though
  • cluster_network where internal inter-OSD communication is done, e.g. for replication and heartbeat. By default its set to use the same subnet as the public network, but optionally one could use a different subnet for performance reasons.

See [1] for more details.

To change Ceph's cluster network:
  1. Make sure each pair of nodes can ping each other on the desired subnet.
  2. Change the cluster_network in /etc/ceph/ceph.conf. Since ceph.conf is a symlink to the replicated /etc/pve cluster filesystem, this change will be replicated to all cluster nodes. Make sure you don't edit the IPs of the monitors on this step.
  3. To see if the change is picked up properly, restart a single OSD (systemctl restart ceph-osd@<ID>) and check that it is recognized as `up` and `in`. If this works, restart all OSDs on a node, and check that they are recognized as up and in. You can check whether the services are listening in the correct net by running `ss -tulpn | grep ceph`. Repeat this with the OSDs on all nodes.

For a production cluster, it is advisable to be careful and make sure that the cluster health returns to HEALTH_OK after each step, do note that it might take a few seconds for Ceph to adjust itself to the changes and report HEALTH_OK. As long as you have multiple active and reachable monitors at all times, the Ceph cluster should stay operational.

[1] https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/
 
Last edited by a moderator:
A preamble for anyone interested: Ceph uses two networks for communication:
  • public_network where clients can talk to the cluster, this is where monitors, clients and OSDs communicate though
  • cluster_network where internal inter-OSD communication is done, e.g. for replication and heartbeat. By default its set to use the same subnet as the public network, but optionally one could use a different subnet for performance reasons.

See [1] for more details.

To change Ceph's cluster network:
  1. Make sure each pair of nodes can ping each other on the desired subnet.
  2. Change the cluster_network in /etc/ceph/ceph.conf. Since ceph.conf is a symlink to the replicated /etc/pve cluster filesystem, this change will be replicated to all cluster nodes. Make sure you don't edit the IPs of the monitors on this step.
  3. To see if the change is picked up properly, restart a single OSD (systemctl restart ceph-osd@<ID>) and check that it is recognized as `up` and `in`. If this works, restart all OSDs on a node, and check that they are recognized as up and in. You can check whether the services are listening in the correct net by running `ss -tulpn | grep ceph`. Repeat this with the OSDs on all nodes.

For a production cluster, it is advisable to be careful and make sure that the cluster health returns to HEALTH_OK after each step, do note that it might take a few seconds for Ceph to adjust itself to the changes and report HEALTH_OK. As long as you have multiple active and reachable monitors at all times, the Ceph cluster should stay operational.

[1] https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/
Hello,

I am trying to follow along with what you posted, but I am not sure it is working correctly for me. After restarting all of my OSD's I have the following output from "ss -tulpn | grep ceph". Not sure how to read this. Are some OSDs working with the cluster network, and some are not?

Advice appreciated.

tcp LISTEN 0 512 206.180.209.205:3300 0.0.0.0:* users:(("ceph-mon",pid=2179,fd=26))
tcp LISTEN 0 512 206.180.209.205:6810 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=19))
tcp LISTEN 0 512 206.180.209.205:6811 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=19))
tcp LISTEN 0 512 206.180.209.205:6808 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=18))
tcp LISTEN 0 512 206.180.209.205:6809 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=19))
tcp LISTEN 0 512 206.180.209.205:6814 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=18))
tcp LISTEN 0 512 206.180.209.205:6815 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=23))
tcp LISTEN 0 512 206.180.209.205:6812 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=22))
tcp LISTEN 0 512 206.180.209.205:6813 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=22))
tcp LISTEN 0 512 206.180.209.205:6802 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=18))
tcp LISTEN 0 512 206.180.209.205:6803 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=19))
tcp LISTEN 0 512 206.180.209.205:6800 0.0.0.0:* users:(("ceph-mgr",pid=2178,fd=27))
tcp LISTEN 0 512 206.180.209.205:6801 0.0.0.0:* users:(("ceph-mgr",pid=2178,fd=28))
tcp LISTEN 0 512 206.180.209.205:6806 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=18))
tcp LISTEN 0 512 206.180.209.205:6807 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=18))
tcp LISTEN 0 512 206.180.209.205:6804 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=22))
tcp LISTEN 0 512 206.180.209.205:6805 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=23))
tcp LISTEN 0 512 206.180.209.205:6789 0.0.0.0:* users:(("ceph-mon",pid=2179,fd=27))
tcp LISTEN 0 512 206.180.209.205:6832 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=22))
tcp LISTEN 0 512 206.180.209.205:6833 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=23))
tcp LISTEN 0 512 206.180.209.205:6826 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=18))
tcp LISTEN 0 512 206.180.209.205:6827 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=19))
tcp LISTEN 0 512 206.180.209.205:6824 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=22))
tcp LISTEN 0 512 206.180.209.205:6825 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=23))
tcp LISTEN 0 512 206.180.209.205:6830 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=18))
tcp LISTEN 0 512 206.180.209.205:6831 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=19))
tcp LISTEN 0 512 206.180.209.205:6828 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=22))
tcp LISTEN 0 512 206.180.209.205:6829 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=23))
tcp LISTEN 0 512 206.180.209.205:6818 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=22))
tcp LISTEN 0 512 206.180.209.205:6819 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=23))
tcp LISTEN 0 512 206.180.209.205:6816 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=19))
tcp LISTEN 0 512 206.180.209.205:6817 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=22))
tcp LISTEN 0 512 206.180.209.205:6822 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=18))
tcp LISTEN 0 512 206.180.209.205:6823 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=19))
tcp LISTEN 0 512 206.180.209.205:6820 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=23))
tcp LISTEN 0 512 206.180.209.205:6821 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=23))
tcp LISTEN 0 512 192.168.1.2:6830 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=24))
tcp LISTEN 0 512 192.168.1.2:6831 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=25))
tcp LISTEN 0 512 192.168.1.2:6828 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=20))
tcp LISTEN 0 512 192.168.1.2:6829 0.0.0.0:* users:(("ceph-osd",pid=739152,fd=21))
tcp LISTEN 0 512 192.168.1.2:6826 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=24))
tcp LISTEN 0 512 192.168.1.2:6827 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=25))
tcp LISTEN 0 512 192.168.1.2:6824 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=20))
tcp LISTEN 0 512 192.168.1.2:6825 0.0.0.0:* users:(("ceph-osd",pid=740230,fd=21))
tcp LISTEN 0 512 192.168.1.2:6822 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=24))
tcp LISTEN 0 512 192.168.1.2:6823 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=25))
tcp LISTEN 0 512 192.168.1.2:6820 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=20))
tcp LISTEN 0 512 192.168.1.2:6821 0.0.0.0:* users:(("ceph-osd",pid=738030,fd=21))
tcp LISTEN 0 512 192.168.1.2:6818 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=25))
tcp LISTEN 0 512 192.168.1.2:6819 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=25))
tcp LISTEN 0 512 192.168.1.2:6816 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=24))
tcp LISTEN 0 512 192.168.1.2:6817 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=24))
tcp LISTEN 0 512 192.168.1.2:6814 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=21))
tcp LISTEN 0 512 192.168.1.2:6815 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=25))
tcp LISTEN 0 512 192.168.1.2:6812 0.0.0.0:* users:(("ceph-osd",pid=736985,fd=20))
tcp LISTEN 0 512 192.168.1.2:6813 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=24))
tcp LISTEN 0 512 192.168.1.2:6810 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=24))
tcp LISTEN 0 512 192.168.1.2:6811 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=25))
tcp LISTEN 0 512 192.168.1.2:6808 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=21))
tcp LISTEN 0 512 192.168.1.2:6809 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=21))
tcp LISTEN 0 512 192.168.1.2:6806 0.0.0.0:* users:(("ceph-osd",pid=738815,fd=20))
tcp LISTEN 0 512 192.168.1.2:6807 0.0.0.0:* users:(("ceph-osd",pid=738157,fd=20))
tcp LISTEN 0 512 192.168.1.2:6804 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=20))
tcp LISTEN 0 512 192.168.1.2:6805 0.0.0.0:* users:(("ceph-osd",pid=739458,fd=21))
tcp LISTEN 0 512 192.168.1.2:6802 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=24))
tcp LISTEN 0 512 192.168.1.2:6803 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=25))
tcp LISTEN 0 512 192.168.1.2:6800 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=20))
tcp LISTEN 0 512 192.168.1.2:6801 0.0.0.0:* users:(("ceph-osd",pid=737743,fd=21))
 
Could you please post your `/etc/pve/ceph.conf`, and the output of `pveceph status` please.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!