Moving cluster_network to public_network


Jul 25, 2023

We are going to cutover to some new 100G networking infrastructure. We are currently at 7.4 latest with Ceph 17.2.6 via a 4 node cluster.
A cluster that I'm now (newly) responsible for has two networking interfaces. One is for VM access the other is for Ceph. This card will be ripped out and replaced with a dual port 100G mellanox.

Since we have a rather large pipe and for simplicity sake, my thought/plan was to just collapse the "ceph" network on to the main one. Oddly enough our non-production cluster is already configured in such a manner. The existing network (public I suppose?) is already locked down pretty tightly and I'm not worried about ceph traffic going over the public. We have pretty big pipes here.

Since we have to swap out the existing card with the new 100G card, activities will need be performed via the IPMI/OOB interface.

This is out setup, note the IP's arent ours but you'll get the picture from the configuration.

# cat /etc/pve/ceph.conf
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network =
     fsid = 49518e81-d415-4220-9a47-a95cad855dc6
     mon_allow_pool_delete = true
     mon_host =
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network =

     keyring = /etc/pve/priv/$cluster.$name.keyring

     keyring = /var/lib/ceph/mds/ceph-$id/keyring

     host = hypervisor1
     mds standby for name = pve

     host = hypervisor2
     mds_standby_for_name = pve

     host = hypervisor3
     mds_standby_for_name = pve

     host = hypervisor4
     mds_standby_for_name = pve

     public_addr =

     public_addr =

     public_addr =

     public_addr =

Right now - via the UI and the ceph.conf above. The monitors and the metadata are already listening on the public network. This is fine and desired. if I check the ceph-OSD's - I see those listening on BOTH the public_network and cluster_network. Reading the ceph documentation, looks like the cluster_network is just for OSDs, but on my configuration they seem to be both?

# ss -tunap|grep ceph-osd |head
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874904,fd=21))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874904,fd=19))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874904,fd=24))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874904,fd=22))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874904,fd=25))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874904,fd=23))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874908,fd=20))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874908,fd=18))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874908,fd=21))
tcp   LISTEN    0      512      *     users:(("ceph-osd",pid=3874908,fd=19))

With that said, I have the following questions.

1. Can I drive reconfiguring the network /etc/network/interfaces, assuming ifupdown2 is installed? I wont have access to the UI. Looks like its possible here thanks to this link:
2. If I touch the ceph.conf to get the public_network and cluster_network (to match), what needs to be restarted? I know i need to restart the OSD's, but do I need to touch the monitors and the metadata because they are already in the proper network?
3. The cluster_network will need to route to the public_network so as I move nodes one at a time, the OSD as they come up on the public can talk to those on the private? This might not be needed since it looks like they might already be listening on both?

Based on the answer of some of the question above, I think, this is what might need to happen. Depending on the above answers the below might need to change a bit.

1. Check for valid VM backup via PBS
2. Migrate VMs off of the 1st targeted host.
3. Power down host
4. Insert in new 100G adapter
5. Reconfigure /etc/network/interfaces with the new adapter names. Reprocess those changes. Check to see if bond0 is online. If everything works here, then the UI should be available.
6. If the OSDs are already presently listening on both networks... Just need to wait for ceph health to clear. Update the cluster_network to match the public_network. Make sure that OSD traffic can touch old network/new network. (This might not be needed depending on some of the answer to the above, are they already doing this?). The configuration will still be updated to reflect the change on the network.
7. Restart OSDs
8. Validate Ceph goes green
9. Remove the old configuration from the host
10. Migrate VMs back over
11. Repeat all steps above for the next host.

What else am I missing here? Look forward to any feedback!


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!