Ceph server feedback

Florent

Member
Apr 3, 2012
91
2
8
Hi,

I'm using Ceph with Proxmox and it seems very good for now.

I have 2 suggestions that may help people :

* Work with partitions and not whole disk when creating OSDs.

I bypassed that using :

Code:
ceph-disk prepare --cluster ceph --cluster-uuid 42081905-1a6b-4b9e-8984-145afe0f22f6 --fs-type xfs /dev/sdb3

ceph-disk activate /dev/sdb3

And adding in /etc/pve/ceph.conf:


Code:
[osd.0] 
          host = test1 
          devs = /dev/sdb3 
          osd mkfs type = xfs

And it works fine with Proxmox.

* When creating an image via RBD, let choose striping settings.

As described here : http://ceph.com/docs/master/man/8/rbd/#striping

People may want objects size larger than 4MB (default value, I think too low for drive images files).

Just 3 new fields on HDD adding to let choose that :)

Thank you :)
 
  • Like
Reactions: EuroDomenii
* Work with partitions and not whole disk when creating OSDs.

IMHO a bad suggestion. Would you mind to explain why?

* When creating an image via RBD, let choose striping settings.

We tested that several times, but never observed any performance gain.
So why/when should someone do that?
 
IMHO a bad suggestion. Would you mind to explain why?
Not really any explanation, but just to let people choose their style. There is no pro or cons for each one I think. Just a choice.
It could be useful (it's my case) for system with only 2 disks. I make 2 OSD with a partition of each disk, leaving the rest for the system himself (in software raid for exemple).
Of course it is not more or less performant, but why not working on partitions than on disk?

We tested that several times, but never observed any performance gain.
So why/when should someone do that?
I was thinking about performances yes. I didn't test it yet so ok :)
 
Not really any explanation, but just to let people choose their style. There is no pro or cons for each one I think. Just a choice.
It could be useful (it's my case) for system with only 2 disks. I make 2 OSD with a partition of each disk, leaving the rest for the system himself (in software raid for exemple).
Of course it is not more or less performant, but why not working on partitions than on disk?
Using partitions as part of a storage array could be very bad for the overall cluster performance. This is especially the case when using spindle disks since a single disk, depending on near line SAS or SAS is limited to 100-200 IOPS. Using partitions will create a hostile environment where partitions is competing with each other for IOPS and since there is nothing controlling access to the individual partitions you could end up in a situation where disk access to one partition steals all IOPS causing other partitions on the same disk to be completely unresponsive.
 
Not really any explanation, but just to let people choose their style. There is no pro or cons for each one I think. Just a choice.

Using partitions is bad, because you cannot replace the OSD easily. Also, it is a bad idea because of the performance issue mentioned by mir.
 
I come back about this idea :

* When creating an image via RBD, let choose striping settings.

As described here : http://ceph.com/docs/master/man/8/rbd/#striping

People may want objects size larger than 4MB (default value, I think too low for drive images files).

Just 3 new fields on HDD adding to let choose that :)

Even if it does not increase performance, it is also a storage strategy. I'm thinking about "stripe_count" parameter which permits to distribute stripes among multiple OSDs as explained here : http://eu.ceph.com/docs/master/architecture/#data-striping

First graph is stripe_count=1 (default) and second is stripe_count=4 if I understand documentation.

This is not the same thing as number of replicas per stripe, but how it is stored.

What do you think about that ?

I forgot my idea for partitions :p You're right :)
 
I totally agree with dietmar and mir on the partition based osd idea. I know what you are trying to express Florent, but it will cause world of heart ache if partitions were used as OSDs instead of the entire disk. It will increase No performance instead i think it just might reduce it significantly due to multiple streams of I/O. CEPH is really not meant to be used with very very small numbers of HDD such as 2 in your case. Not that it cannot be used but it is geared toward production environment where redundancy and uptime is important.
In a real scenario with several pools, few dozens of OSDs, managing cluster can become a task on its own. When partition added to the scenario, its just nightmare.

I have 4 pools, with 3 data tier to put data based on fast, medium and slow access requirement. If i used partitions for this task, it would have brought down the performance to its knees. Because all pools and 18 of the OSDs are all part of two clusters on same hardware.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!