Apply/Commit Latency on Ceph

brucexx

Renowned Member
Mar 19, 2015
239
9
83
We are running hummer on a dedicated 3 node cluster on top of Proxmox. Using 10Gb for cluster and public networks (separate cards). All the same 16 x 10K SAS drives spread evenly among 3 nodes.

We are not having any issues but I see the latency is almost never below in single digits, almost always >10 and up to 85 ms. Is this normal I have no way of comparing with other clusters ?

I saw people complaining about high delay but posting pictures where the delay was way above 100ms.

Thanks for you help.
 
What latency are you referring to? Network latency between the servers?
How did you measure the latency?

In a similar setup, our latency on the storage network is 0.011ms average
 
@SilverNodashi out of curiosity which tool do you use for measuring that ? ioping ?
 
We are not having any issues but I see the latency is almost never below in single digits, almost always >10 and up to 85 ms. Is this normal I have no way of comparing with other clusters ?
What version of ceph are you using and do you have a separate journal device? Check in the disk manufactures docs how the latency of the drive is, that is already an indication, how far away you are from the latency of the sole disk.
 
the network latency is fine 0.1ms between servers. I am referring to latency on OSDs. The apply / commit latency. This is when the cluster idle.

Screen Shot 2017-11-09 at 7.14.39 AM.png
 
ok so I found explanation of what exactly the apply/commit latency is (before I was just comparing it to what other people were posting) which makes sens for non ssd drives and journal that is on the OSDs (no separate journal drive).

Question now what can I do to improve these numbers ? Any ideas ?

The goal is to future proof the ceph storage to handle tripe the load of today's use , we are currently using it for about 70 VMs but would like to run in a year or two 150-200 VMs - not even sure if this is possible without switching to SSD and adding additional server, would updating to the newer/new ceph l luminous help ?

Thx
 
The new Ceph luminous uses bluestore as its default, this eliminates the double write penalty seen by filestore (no extra journal device). With Luminous drive classes have been introduced and you can create pools with only one type of class. This is helpful when you want separate pools for different I/O use cases, eg. VMs with databases compared to archives.

You need to benchmark (fio, rados bench, cbt) your cluster and establish a baseline. After that you can estimate how much hardware you need for expansion to host the double/triple of your current load.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!