Ceph monitor network interface

;piod · Feb 19, 2017

I am new to Ceph but followed the Ceph Server wiki and was up and running with Ceph on a dedicated 10gbe network/subnet without any issues.

As per the instructions, issuing "pveceph createmon" instantiated a Ceph Monitor bound to the interface on the Ceph specific private network I defined with "pveceph init --network subnet".

The Ceph docs seem to indicate that the Ceph monitor should bind to the public interface. With the monitors listening on the private interface my Ceph cluster is not accessible to other machines on my public network. Perhaps this is by design? Or I missed something?

I had another question about the Ceph config. Does Proxmox change the defaults? If so is there detail of which ones are changed and why? Long story short my config seems to be different to the Jewel defaults but I may have caused that.

Thanks.

Mike_TBF · Feb 24, 2017

Hi,

I am pretty new to Proxmox with Ceph too, but had a little time to get accustomed to it.
Since Proxmox configures Ceph as a "cluster-internal storage" and the vhosts have access to the cluster network, there is no need for the monitors to be reachable from the outside. At least this is probably the reasoning behind Proxmox's default.
That said, when using "sub-optimal" networking (we do 2x1Gbit/s LACP) on the cluster network, I noticed laggy behavior. For us it was much better to put the Ceph monitors, together with the Corosync ring0, on a dedicated network. If you want to access the Ceph cluster from hosts outside your Proxmox cluster there would be no harm in putting the monitors on a public interface either, security considerations aside of course.

So far I still did not have time to check the client traffic. When using a separate "cluster" and "public" network for Ceph, with the monitors on the "public" net, which one do the VMs use for their virtual disk traffic? I believe it should be the network defined as ceph public, but I am not yet 100% sure.

udo · Feb 25, 2017

;piod said:
I am new to Ceph but followed the Ceph Server wiki and was up and running with Ceph on a dedicated 10gbe network/subnet without any issues.

As per the instructions, issuing "pveceph createmon" instantiated a Ceph Monitor bound to the interface on the Ceph specific private network I defined with "pveceph init --network subnet".

The Ceph docs seem to indicate that the Ceph monitor should bind to the public interface. With the monitors listening on the private interface my Ceph cluster is not accessible to other machines on my public network. Perhaps this is by design? Or I missed something?

I had another question about the Ceph config. Does Proxmox change the defaults? If so is there detail of which ones are changed and why? Long story short my config seems to be different to the Jewel defaults but I may have caused that.

Thanks.

Hi,
"pveceph init --network subnet" select the network, which ceph docs discribed as public network, because all MONs, OSDs and clients (like qemu) need access to this network.
If you use an second ceph-network (private/cluster) it's only used by the OSD-nodes to sync the data between the OSDs (replica-data and if OSDs are added or die).

Udo

;piod · Feb 26, 2017

Hi Udo. Following the Proxmox Ceph instructions:

https://pve.proxmox.com/wiki/Ceph_Server#Create_initial_Ceph_configuration

Specifies "pveceph init --network 10.10.10.0/24" where 10.10.10.0/24 is the "ceph private network" defined earlier in the wiki.

EDIT: What is the best means to correct this such that the monitors listen on the public network? In /etc/ceph/ceph.conf it has an entry for the public subnet and for the monitor interfaces/ports. Should I change it here on all nodes and restart Ceph?

udo · Feb 26, 2017

;piod said:
Hi Udo. Following the Proxmox Ceph instructions:

https://pve.proxmox.com/wiki/Ceph_Server#Create_initial_Ceph_configuration

Specifies "pveceph init --network 10.10.10.0/24" where 10.10.10.0/24 is the "ceph private network" defined earlier in the wiki.

EDIT: What is the best means to correct this such that the monitors listen on the public network? In /etc/ceph/ceph.conf it has an entry for the public subnet and for the monitor interfaces/ports. Should I change it here on all nodes and restart Ceph?

Hi,
if I understand your config right, you don't need to change anything!
Unfortunality is the ceph-public network named in the wiki as private network! (like I wrote before ).
I assume this is done to point this as different network from the VM-guest (public IPs).
Ceph use an different naming (public and cluster) - see here: http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/

Udo

;piod · Feb 26, 2017

Hi Udo, thanks for the input. I just want to provide a little more detail to make sure we are on the same page.

I followed the instructions in the Proxmox Ceph wiki. These had me setup a private network just for Ceph on 10.10.10.0/24.
As you mention Ceph calls this the "cluster network" which is just used for OSD traffic: (heartbeat, object replication and recovery traffic).

As implied this network is separate from my local "public" network and inaccessible from devices outside the Promox nodes.

According to the Ceph docs, I should have two entries in my ceph.conf:
cluster network = 10.10.10.0/24
public network = 192.168.0.0/24 # for example

But after following the wiki my ceph.conf has:
cluster network = 10.10.10.0/24
public network = 10.10.10.0/24

This doesn't seem correct to me, but it might be by design for Proxmox. It does mean that I can't access the Ceph cluster from my public network.

udo · Feb 26, 2017

;piod said:
Hi Udo, thanks for the input. I just want to provide a little more detail to make sure we are on the same page.

I followed the instructions in the Proxmox Ceph wiki. These had me setup a private network just for Ceph on 10.10.10.0/24.
As you mention Ceph calls this the "cluster network" which is just used for OSD traffic: (heartbeat, object replication and recovery traffic).

hi,
no - these setup you an ceph public network! not an ceph-cluster network (for replication).

As implied this network is separate from my local "public" network and inaccessible from devices outside the Promox nodes.

right

According to the Ceph docs, I should have two entries in my ceph.conf:
cluster network = 10.10.10.0/24
public network = 192.168.0.0/24 # for example

But after following the wiki my ceph.conf has:
cluster network = 10.10.10.0/24
public network = 10.10.10.0/24

no, with one network for ceph only your settings are ok.
With an additional ceph-cluster network it's should look like this (if your pve-network is on 192.168.0.0/24):

Code:

public network = 10.10.10.0/24
cluster network = 10.10.11.0/24

Udo

;piod · Feb 26, 2017

Thank you Udo for your patience while I figure this out.

udo said:
no - these setup you an ceph public network! not an ceph-cluster network (for replication).

I think I'm finally grasping what you are saying.

Udo's explanation: What the Proxmox Ceph Wiki explicitly labels as the "ceph private network" is in fact the "ceph public network" which is itself a distinct network from the regular public network (where workstations / clients already live). The "ceph cluster network" is a separate private network/subnet just for OSD traffic and not covered in the Proxmox Ceph Wiki.

I hope you don't mind me saying but this does not add up to me. The Proxmox Ceph Wiki should (or does) describe setting up the Ceph cluster network and the Ceph public network should be the public network (where workstations / clients live). As things stand Proxmox configures Ceph to listen exclusively on the private network defined for Ceph as per the Proxmox Wiki. This seems to be an error.

Thank you.

Mike_TBF · Feb 26, 2017

;piod said:
Thank you Udo for your patience while I figure this out.
I hope you don't mind me saying but this does not add up to me. The Proxmox Ceph Wiki should (or does) describe setting up the Ceph cluster network and the Ceph public network should be the public network (where workstations / clients live). As things stand Proxmox configures Ceph to listen exclusively on the private network defined for Ceph as per the Proxmox Wiki. This seems to be an error.
Thank you.

No, this is not an error, it is a matter of design philosophy.
The Proxmox Wiki assumes your Ceph cluster network to be 10Gbit/s, and thus significantly faster than the Proxmox public network. It also assumes that only your Proxmox hosts will access Ceph. In this case it is probably a good idea to do it this way.
Keep in mind we are dealing with two separate clusters on the same host machines:

a Ceph cluster with its "public" and "cluster" networks, let's call them ceph-pub and ceph-cluster
Since only Proxmox hosts will access Ceph, the ceph-pub network might as well not be reachable from the outside.
a Proxmox cluster with a public network to communicate with the outside world, let's call this proxmox-pub (and maybe even a separate network for Corosync traffic)

Since the Proxmox nodes are itself Ceph clients, it makes perfect sense to make ceph-pub a network that is not reachable from the outside. Whether this is really a good idea depends mostly on the speed and latency of your network interfaces.

Examples (correct me if I'm wrong):

when a VM reads from an RBD store, it will use the ceph-pub network to talk to a monitor and all relevant OSDs that hold the data it needs.
when a VM writes to an RBD store, it will use the ceph-pub network to talk to a monitor and send its data to the acting OSD in each affected Ceph placement group (PG). The acting OSD will then use the ceph-cluster network to replicate the data to the other OSDs according to the pool's replication factor (the pool's size attribute)
when OSDs talk to each other, for example during failover or rebalancing, they will use the ceph-cluster network
when other machines talk to your VMs or Proxmox hosts, they will use the proxmox-pub network
(and then the VMs will use the other networks to talk to their storage, see 1. and 2.)

Thus, when you only have one 10Gbit interface per host, it makes sense to put the ceph-pub and ceph-cluster interfaces on the same (private) network segment.

In our own test cluster we only have 1 Gbit/s interfaces, but fortunately a lot of them.
We use 3 different networks on each Proxmox/Ceph host (each with 2 ethernet ports bonded by LACP to increase throughput):

one network for Proxmox public traffic, i.e. VMs and their communication partners
one network for ceph-public traffic and Corosync (on a private network)
one network for ceph-cluster traffic (also on a private network)

So even though we use separate Ceph public and Ceph private networks, they are still isolated from the rest or our network and only accessible from the cluster hosts. We did this to increase throughput, and so far it seems to work.

In conclusion, there is no single "right" way to structure your networks. It really depends on your needs and the hardware you have available. The Proxmox Wiki only gives you a starting point.
And yes, it (intentionally) differs from the Ceph docs, since this is not a general-purpose Ceph cluster, but one exclusively meant to service your Proxmox hosts on the same hardware.
I must admit, it took me a while to figure that out myself.

As a side note: be very careful when trying to change the Corosync or Ceph networks in a running Proxmox/Ceph cluster. It is possible, but you can save yourself a lot of grief by reading the corresponding docs first.
I think I made all possible mistakes in this area already, so feel free to ask if you are unsure.

;piod · Feb 27, 2017

Thanks for the input Mike_TBF.

Mike_TBF said:
Since the Proxmox nodes are itself Ceph clients, it makes perfect sense to make ceph-pub a network that is not reachable from the outside. Whether this is really a good idea depends mostly on the speed and latency of your network interfaces.

Perfectly reasonable explanation! This is what I meant when I originally suggested this architecture might be "by design". If this is the end of it the Wiki could be modified at some point to reflect the correct Ceph terminology and we can move on

. Some additional language like your above would also clear up confusion.

Mike_TBF said:
Examples (correct me if I'm wrong):

when a VM reads from an RBD store, it will use the ceph-pub network to talk to a monitor and all relevant OSDs that hold the data it needs.

when a VM writes to an RBD store, it will use the ceph-pub network to talk to a monitor and send its data to the acting OSD in each affected Ceph placement group (PG). The acting OSD will then use the ceph-cluster network to replicate the data to the other OSDs according to the pool's replication factor (the pool's size attribute)

when OSDs talk to each other, for example during failover or rebalancing, they will use the ceph-cluster network

when other machines talk to your VMs or Proxmox hosts, they will use the proxmox-pub network
(and then the VMs will use the other networks to talk to their storage, see 1. and 2.)

These examples make sense (esp for your 3 NIC setup) but following the Proxmox Ceph Wiki and the above design philosophy there is no (need for a) ceph-cluster network - ceph will route everything over the "ceph-public" network if no ceph-cluster network is defined.

Mike_TBF said:
Thus, when you only have one 10Gbit interface per host, it makes sense to put the ceph-pub and ceph-cluster interfaces on the same (private) network segment.

OK - maybe we are getting to the crux of the matter! If we defined a ceph-cluster network (dedicated 10gbe) and a ceph-public network (1gbe - bound to the public interface of the Proxmox node) would Ceph performance from Proxmox be worse than if the ceph-public network was on the 10gbe private network?

Mike_TBF · Feb 27, 2017

;piod said:
OK - maybe we are getting to the crux of the matter! If we defined a ceph-cluster network (dedicated 10gbe) and a ceph-public network (1gbe - bound to the public interface of the Proxmox node) would Ceph performance from Proxmox be worse than if the ceph-public network was on the 10gbe private network?

standing by my examples above, I very much believe it would be worse. After all, the VMs would need to got through the 1gbe network when talking to their storage. This would be particularly bad for VMs that deliver storage content to clients, like file servers etc, since all data they read from disk and send to their clients would need to go through this "bottleneck" twice.

That being said, I did so many tests with Proxmox/Ceph that I am not sure I tested exactly this case. I only have 2gbe with bonding, but it should still make a noticeable difference. So give me a few days and I may be able to give you something more conclusive.

alexskysilk · Feb 28, 2017

One little point of order; Ceph traffic networks benefit primarily from transport latency speed. While 10gbe network has 10 times faster (lower) latency then 1gbit ethernet, multiple LACP links do not actually decrease latency at all. Consequently, the only practical use from using gigabit bonds for Ceph is for fault tolerance.

udo · Feb 28, 2017

alexskysilk said:
One little point of order; Ceph traffic networks benefit primarily from transport latency speed. While 10gbe network has 10 times faster (lower) latency then 1gbit ethernet, multiple LACP links do not actually decrease latency at all. Consequently, the only practical use from using gigabit bonds for Ceph is for fault tolerance.

Hi,
I agree 100% about the latency. But multible 1GB-Links can makes sense too (esp. during rebuild).

Udo

MikeUB · Mar 1, 2017

Mike_TBF said:
That being said, I did so many tests with Proxmox/Ceph that I am not sure I tested exactly this case. I only have 2gbe with bonding, but it should still make a noticeable difference. So give me a few days and I may be able to give you something more conclusive.

so, let me follow through on this promise.
The test bed for the results below is:
Hardware
3x HP ProLiant DL380 G6 Server 2x Xeon X5670 Six Core 2.93 GHz
96GB RAM, 2x P410 RAID Controllers with 512MB battery backed write cache
each with 6x 2TB 2.5" desktop HDDs, journals on same disks (see below).
each with 8x1 GBit/s ethernet ports operating at maximum speed (tested)
Network
Zyxel 48 port L2 switch with 1Gbit/s per port
separate port-based VLANs for each test with separate LACP groups as necessary.
Ceph RBD pool
size 3 (3x replication)
256 PGs

about the OSDs: many other sources and my own tests on different hardware indicate that you can keep your journals on the same spinners if you have disk controllers with write back cache that properly order your writes. Unfortunately, these P410 controllers do not seem to do well, although I cannot rule out a misconfiguration yet. As it is, a Ceph OSD benchmark (ceph tell osd.x bench) only results in about 33MB/s write performance each, which is probably capping the test results in some cases.
If anyone would like to discuss this in detail we can start a different thread. I have an interesting comparison to an Areca controller here.

In all tests, I kept the setup the same and just changed the networks, changing the Ceph monitors and Proxmox storage config as appropriate.
When I write "2gbps" this means 2x1gbps with LACP (bond-xmit_hash_policy layer3+4)
All network write ops are zeros though netcat that go through a full capacity 1gbps network into a Linux VM (raw throughput tested at 112 MB/s) and from there to a file on its RBD backed disk.

first test: 1gbps proxmox-public, 2gbps ceph-public, 2gbps ceph-cluster

RADOS bench 60s write: 94.5 MB/s, read: 312 MB/s
network write through VM: 99.5 MB/s

second test: 1gbps proxmox-public, 1gbps ceph-public, 2gbps ceph-cluster

RADOS bench 60s write: 106.6 MB/s, read: 171.3 MB/s
network write through VM: 90.2 MB/s

third test: 1gbps proxmox-public and ceph-public, 2gbps ceph-cluster

RADOS bench 60s write: 116.8 MB/s, read: 163.3 MB/s
network write through VM: 77.9 MB/s

fourth test: 1gbps proxmox-public, 4gbps ceph-public and ceph-cluster

RADOS bench 60s write: 115.2 MB/s, read: 329.0 MB/s
network write through VM: 76.9 MB/s

Unfortunately the RADOS write benchmark is pretty inconclusive and I suspect my slow OSDs capping and distorting the performance. In another test cluster with similar network setup as in the first test, but more potent harddisk controllers, it goes above 160MB/s write speed.
Still, when you use the same network for ceph-public and VM traffic, there is a clear tendency for slowing down. Keep in mind I mostly had uni-directional traffic. The effect would probably be worse in a real-world scenario with more bi-directional load.
An interesting observation is test 4: it shows the limits of LACP and Ceph in small clusters. Ceph and LACP can in fact reach higher performance than a single ethernet link (not obvious here due to OSD performance cap), but the low "network/VM" throughput indicates it cannot nearly utilize all 4 links. Due to the nature of LACP this would probably change with more hosts, but here it clearly has no use.

In conclusion: the Proxmox Wiki is most certainly right: if you have a 10 GBit/s interface, put ceph-public and ceph-cluster on the same interface.
However, if you are fooling around with LACP in small clusters, separate your networks as much as possible, which includes keeping your VM client traffic out of them.

P.S.: sorry for using 2 different forum accounts. I work at different institutions and probably need to consolidate them.

soglonou · Mar 29, 2017

I have a brief on configuring a network traffic analyzer zenoss case under proxmox please help me

Search

Search

Ceph monitor network interface

;piod

New Member

Mike_TBF

New Member

udo

Distinguished Member

;piod

New Member

udo

Distinguished Member

;piod

New Member

udo

Distinguished Member

;piod

New Member

Mike_TBF

New Member

;piod

New Member

Mike_TBF

New Member

alexskysilk

Distinguished Member

udo

Distinguished Member

MikeUB

New Member

soglonou

New Member