[SOLVED] Ceph Public/Private And What Goes Over the network

Donovan Hoare

Renowned Member
Nov 16, 2017
30
6
73
44
Good Day All.

Im new to ceph. i have set up a 6 node cluster.
Its a mixture of SSD AND sas drives.
With the sas drives i use a ssd partition for the db.

Now what im experiencing is that my VMs are slow.
Boot is slow Opening programs are slow etc etc.

the 10.0.45.0/24 network is 10Gig
the public network is 192.168..14.0/24 is on a bonded 1GB network.
This happens to also be the iprange that i connect to proxmox gui on.

I assumeed the public network would not carry much traffic.
and all ceph traffic was on the cluster network
However, i read a post that the public network is traffic-heavy and when he changed it to a separate network of 40G it went faster.

So my question is.
a) What flows over the Public Network.
b) Would my VM boot speed and opening programs be faster if the public network is a different physical network.

c) if b=yes how would i change the public network safely

Regards

Here is my config:
Code:
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 10.0.45.21/24
    fsid = 32e62262-67a6-4129-9464-773375643266
    mon_allow_pool_delete = true
    mon_host = 192.168.14.21 192.168.14.22 192.168.14.23
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 192.168.14.21/23
 
have a look at the diagram at https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/
The clients would be the VMs in your case.

The Ceph Public network is mandatory and a lot of traffic is going over it. The optional Ceph cluster network can be used to move the inter OSD replication traffic to a different network to distribute the load more.

You can move the Ceph Public network over to a different subnet. For example the current Ceph Cluster one to get it to use 10G instead of 1G.
For this, you can change the public_network to the same value as the cluster_network line. Then restart the OSD services.

E.g. per node with systemctl restart ceph-osd.target. The MONs and MGR can be destroyed and recreated one by one.

Between each restart and destry/recreate step, wait for the cluster to be healthy before you continue.
 
So To confirm hwo todo this.

on EACH node i edit /etc/pve/ceph.conf
i change
Code:
public_network = 192.168.14.21/23
to
public_network = 10.0.45.21/24

save the file
then run
Code:
systemctl restart ceph-osd.target

Then do i restart the 1st node then delete and create a mon, then restart the second then destroy and recreate teh mon

Or do i restart all the ceph-osd then only destroy and recreate the montiros and managers. and wait for for the status to become healthy before i do the next monitor and manager
 
Last edited:
Ok So i went ahead.
So exact order. You edit the file on any node.

then restart the services on each node one by one, each time waiting for the cluster to become heealthy / or no yellow degrade items.

then you do the same with the monitors then after the monitors the managers
 
I assume it worked? You can also use ss -tulpn | grep ceph to see if there are still Ceph services listening on the old IP addresses.
 
Im a bit confused in what the traffic flow looks like. In my case Ceph is only used for VMs/CTs inside a Proxmox cluster.

- can or should the Ceph public network be the same as the Proxmox mgmt or cluster ring network?
- can the Ceph public network be a non-routed switched only network and be not connected to the Proxmox mgmt or cluster ring network?
 
https://docs.ceph.com/en/quincy/architecture/#smart-daemons-enable-hyperscale The diagram at the end of this section shows the flow of data.
The network diagram might also help: https://docs.ceph.com/en/reef/rados/configuration/network-config-ref/

Client (VMs) to the primary OSD will go over the Ceph Public network. As well as all the other mgmt traffic between Ceph clients and Ceph services, also between Ceph services directly.

If that network becomes loaded, the optional Ceph Cluster can be used to move the replication traffic between OSDs to a different network. In the first diagram the traffic from primary OSD to secondary & tertiary.

Both networks need to be fast!

- can or should the Ceph public network be the same as the Proxmox mgmt or cluster ring network?
No. You can configure a Corosync (Proxmox VE cluster network), on it, but it should only be one of many for redundancy and not the only one!

- can the Ceph public network be a non-routed switched only network and be not connected to the Proxmox mgmt or cluster ring network?
Sure. Configure a different IP subnet on the network and configure it when setting up Ceph. If you have a smaller number of nodes, and don't need any VMs or external clients to also access Ceph, you can consider setting up a full-mesh network. https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server

This way you can spend the money you save on switches into faster NICs for Ceph :)
 
  • Like
Reactions: Irre Levant
Hope it's OK that I resurrect this thread and add another question here -

I'm currently setting up kubernetes on guests on the proxmox cluster and want them to directly provision storage on ceph instead of on their local disks, does that mean that I would need the kuberenetes guests to all have a leg on the ceph public network (adding a whole bunch of potential bridge points between networks) or is performance through a router/firewall good enough for clients?
 
does that mean that I would need the kuberenetes guests to all have a leg on the ceph public network
Yes. For Kubernetes nodes to interact with Ceph directly (via RBD or CephFS), they need to be on the Ceph public network. Without that direct line of sight, the Kubernetes workers won't be able to map the block devices or mount the filesystems natively.

or is performance through a router/firewall good enough for clients?
It depends entirely on your current baseline. Routing or firewalling your storage traffic will introduce a latency penalty compared to keeping it on a local, flat Layer 2 network.

However, if your VMs/containers are already traversing virtual bridges or vSwitches inside Proxmox to get to the rest of your network, passing through a virtual router on the same host won't feel any worse than what you are already experiencing. That said, for a storage cluster where microseconds matter, keeping the Ceph public network flat, non-routed, and isolated is always the gold standard.

Choosing between a virtual bridge and PCIe passthrough for your network interface involves tradeoffs that go way beyond simple throughput:
  • Virtual Bridges: Give you flexibility, native Proxmox migration capabilities, and easy management, but they introduce host CPU overhead for packet processing.
  • NIC Passthrough (SR-IOV): Gives you near-bare-metal performance and lower latency, but you lose the ability to easily live-migrate the VM, and you tie the guest directly to the physical hardware.
An Architectural Reality Check: If your specific use case requires a single container to push enough IO and network throughput that a virtual bridge becomes your bottleneck, it might be time to zoom out. At that scale of intensity, virtualization layers like Proxmox VE add unnecessary translation tax. If a single workload is that demanding, deploying Kubernetes directly on bare metal without the hypervisor layer is usually the more efficient design.
 
  • Like
Reactions: gurubert
Thanks, it's a lab env although some workloads will probably be production due to rigidity in the rest of the org (not a great situation but not something i can change), I wasn't (and am still not) planning on passing through NICs my main concern was yes/no having a router there.

(also due to rigidity since I already at the time of setup asked for some vlan assignments and we're 2 month in and I'm still begging for my vlans and lacp).