CEPH placement group and storage usefull capacity

Alright.
By the way, is cache tiering interesting with replicated pools (I will store VMs on it ) since I'll mostly use hard drives (maybe SSDs for logs) ?
I may invest in few SSDs to create a cache tiering pool to increase performances.
 
Nevermind :
KNOWN BAD WORKLOADS
The following configurations are known to work poorly with cache tiering.

  • RBD with replicated cache and erasure-coded base: This is a common request, but usually does not perform well. Even reasonably skewed workloads still send some small writes to cold objects, and because small writes are not yet supported by the erasure-coded pool, entire (usually 4 MB) objects must be migrated into the cache in order to satisfy a small (often 4 KB) write. Only a handful of users have successfully deployed this configuration, and it only works for them because their data is extremely cold (backups) and they are not in any way sensitive to performance.
  • RBD with replicated cache and base: RBD with a replicated base tier does better than when the base is erasure coded, but it is still highly dependent on the amount of skew in the workload, and very difficult to validate. The user will need to have a good understanding of their workload and will need to tune the cache tiering parameters carefully.
I found that yesterday a few minutes after posting my question...
 
So for now I think I'll stick to hard drives and maybe SSDs for journals.

But I wonder what is written in those journals. Is it logs, metadata, will storing journals on the OSD really inpact the performances ?
 
Don't, create two pools one for the SSDs (enterprise class) and one for the HDDs.
http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#known-bad-workloads

Seems like a good idea since running applications that needs high I/O on spinning disks would be nonsense. I thought about something like this :upload_2019-1-16_16-23-52.png

Obviously I'll need to modify the CRUSH map to map a specific pool to a specific type of OSD.

I found what I was looking for https://forum.proxmox.com/threads/ceph-ssd-and-hdd-pools.42032/
 
Last edited:
rule replicated_ssd {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit

}

I'm not sure about what the two lines in red does.
 
Asides from a SSD only pool, is possible when creating a pool to target specifically some OSDs ? Or does it goes against the fundamental princip of CEPH wich is to dynamically rebalance data ?
 
Yeah, so you can make a pool wich targets a specific class (HDD, SSD, NVME) but you can't specifically target an OSD by his ID or his name in the CRUSH map wich makes sense since an average production cluster hosts way more than 10 OSD.

By the way SAS and SATA hard drives would still be considered as HDD by CEPH ?
Am I right ?
 
Last edited:
Is it possible to select on which OSD your pool will be affected (in the case I stay with a hard drive only cluster) ?

Targeting an host instead of a class would be good enough.
 
Last edited:
Yeah, so you can make a pool wich targets a specific class (HDD, SSD, NVME) but you can't specifically target an OSD by his ID or his name in the CRUSH map wich makes sense since an average production cluster hosts way more than 10 OSD.
OSD IDs are reusable and not fixed. When you edit the CRUSH map then things like this can be possible. If you don't have a deep understand of what will be happening with your data, I advise against it. To give you an idea and older post, but still valid for the most part.
http://cephnotes.ksperis.com/blog/2015/02/02/crushmap-example-of-a-hierarchical-cluster-map

Is it possible to select on which OSD your pool will be affected (in the case I stay with a hard drive only cluster) ?
See above.
 
Thanks Alwin !

From what I've read, the data placement and replication wich such scenario would be awful.

https://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

This article looks promising, without the cons of the uneven replication of the link you posted.
Since OSD have a number (device 0 osd.0 class, hdd device 1 osd.1 class hdd), if an OSD fails and gets removed, does the numbers of all remaining OSD stay the same or are they decremented by 1 ?
 
Last edited:
I tried to agregate OSD by buckets likes this :
pool ssd {
id -9
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.455
item osd.2 weight 0.454

}

pool sas {
id -10
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.455
item osd.3 weight 0.454
}

with a rule for each of them :

rule ssd {
ruleset 3
type replicated
min_size 1
max_size 10
step take ssd
step choose firstn 0 type osd
step emit
}

rule sas {
ruleset 4
type replicated
min_size 1
max_size 10
step take sas
step choose firstn 0 type osd
step emit
}

But as soon as I want to recompile the crush map I get this error : bucket type 'pool' is not defined
 
I found why ceph prompted an error message : at the beginning of of crush map, I need to add pool to the types of buckets :

# types type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
type 11 pool
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!