Putting Ceph on dedicated network

macos_user

New Member
Mar 6, 2024
6
1
3
So I had my 3 node cluster setup and running with Ceph. It has nothing of value on it so data loss is currently not a concern, it has not been placed into use yet. I now tried moving my Ceph network to an isolated vlan on a switch with no network capabilities. I will also be doing the same for Corosync and Cluster but wanted to do one at a time.

The whole cluster was on 192.168.1.0/24 at first on vmbr0

I moved them to enp66s0f1 of each node and set that interface IP to 192.168.10.x/24 for each node (this interface is different then the original), but did not set this up as a bridge, bond, etc.

I checked i can ping the other nodes in the cluster from the ceph ip to the other nodes ceph ip

i then changed the /etc/ceph/ceph.conf file to reflect a cluster network of 192.168.1.0/24 (as it was) and the mon ip's to 192.168.10.x/24 (as setup for each node, i cant remember if it wanted a subnet mask in that line or not) with a public network of 192.168.10.0/24

Now, all 3 managers that were set up, are missing. I checked on the switch, and the 3 ports are Up. The switch has a router interface set for that vlan of 192.168.10.254/24

I've tried restarting nodes, restarting mons, ceph, etc. Ceph just keeps showing unknown status for the cluster, pools, mons, etc. When I try to create a manager, it spins and spins and eventually times out.

I'm at a loss here. This has to be something simple I'm missing. Any help would be appreciated here
 
Overall, the procedure sounds okay, except for the MONs.

The reason is that the MONs have their internal monmap as well. The way it will work is to switch the public and cluster networks in the global section of the ceph config. Then restart the OSDs/MGRs/MDS per node.

After that, destroy and re-create each MON, one at a time. This way, the internal monmap and the MON specific config in the ceph.conf file will be handled by the Ceph / PVE tooling.

Whenever you restart services, especially OSDs, do it only one node at a time and wait for Ceph to be OK before you continue.
The target units can be very helpful. For example systemctl restart ceph-osd.target to restart all OSDs on a node.

So I would suggest that you switch back the mon_host to the previous settings and the IPs in the MON specific sections as well. Restart all services and check the status of the cluster. Then destroy and re-create the MONs one node at a time.

You can use ss -tulpn | grep ceph to verify that all services are listening on the new network once you are done.
 
Overall, the procedure sounds okay, except for the MONs.

The reason is that the MONs have their internal monmap as well. The way it will work is to switch the public and cluster networks in the global section of the ceph config. Then restart the OSDs/MGRs/MDS per node.

After that, destroy and re-create each MON, one at a time. This way, the internal monmap and the MON specific config in the ceph.conf file will be handled by the Ceph / PVE tooling.

Whenever you restart services, especially OSDs, do it only one node at a time and wait for Ceph to be OK before you continue.
The target units can be very helpful. For example systemctl restart ceph-osd.target to restart all OSDs on a node.

So I would suggest that you switch back the mon_host to the previous settings and the IPs in the MON specific sections as well. Restart all services and check the status of the cluster. Then destroy and re-create the MONs one node at a time.

You can use ss -tulpn | grep ceph to verify that all services are listening on the new network once you are done.
I appreciate this. I had reverted everything back and it is operational as it should be.

I had left the cluster network in the global section of the config as I was going to set this up on a seperate network as well at a later date. I was going to take this one step at a time as I'm still learning this. Do the public and cluster need to be on the same network, or can they be different networks?
 
The cluster network can be put into a different subnet. It is used for the inter-OSD replication traffic. So putting it on a different physical network can move the load away from the Ceph Public network which is used for all the other Ceph traffic. Services communicating and also the clients (VMs) talking to Ceph.
 
The cluster network can be put into a different subnet.
Thank you. That's what I thought. I'm not totally sure what happened here. I'm going to try this again one step at a time, maybe I missed something minor the first time. Weird part is all my managers disappeared and I wasn't able to delete or create any monitors.

Appreciate the help!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!