Multiple Ceph pools possible?

Volker Lieder

Well-Known Member
Nov 6, 2017
48
3
48
44
Hi,

we want to create two ceph pools.
One with SAS osds, and one with ssd as osds.
Is it possible to maintain that in proxmox 5.1 via webinterface or cli or do we have to configure one ceph configuration by hand?
Regards,
Volker
 
  • Like
Reactions: afrugone
(assuming you are on PVE 5.1/Ceph Luminous)

you need to configure CRUSH rules accordingly, but then you can create and use pools using the GUI. if your OSDs are correctly identified as HD/SSD, you can use the new device class feature to easily create such a ruleset (see http://ceph.com/community/new-luminous-crush-device-classes/), then just select the right one when creating a pool and everything should work as expected :) if they are not, you need to manually tell Ceph which device belongs to which class (also described in the same link)
 
  • Like
Reactions: El Tebe
Hello,

Thank you for asking this question.
I have the exact same need.

There are 12 SSD OSD and 4 HDD OSD within my architecture (PVE 5.1 with integrated Ceph Luminous).
I updated the CRUSH map adding datacenter levels and then I created two replication rules using these commands.

Code:
ceph osd crush rule create-replicated replicated-ssd datacenter host ssd
ceph osd crush rule create-replicated replicated-hdd datacenter host hdd

Afterwards I created two pools (rbd-ssd and rbd-hdd) with the aforementioned replication rules.

However, I encounter some issues :

I followed the 1.3 model of this article for my OSD tree. I have created different hostname compared to the default one, and noticed the GUI feature for OSD does not work anymore. I cannot tackle the split between SSD and HDD using different OSD tree than the default one. Furthermore, when I reboot every PVE node, I noticed that the default OSD tree is created again and every OSD are placed there again.

When I set RBD storage, via PVE GUI, the whole storage space is not only the sum of my SSD or HDD space, but the sum of all OSD.
I have also noticed that the performance became very bad when I have included the HDD OSD (write average of 80MB/s). It feels like Ceph integration within PVE is not capable of using different pools based on custom replication rules to use different hard drive (SSD and SAS).

Best regards,
Saiki
 
Last edited:
please post your OSD tree and crush maps.
 
Hello Fabian,

Thanks for your reply.
Please find the two versions of my crush map configurations.

For the old version, these are the commands I have used to setup the replication rules :

Code:
ceph osd crush rule create-replicated replicated-ssd root-ssd datacenter
ceph osd crush rule create-replicated replicated-hdd root-hdd datacenter

For the new version :

Code:
ceph osd crush rule create-replicated replicated-ssd default datacenter ssd
ceph osd crush rule create-replicated replicated-hdd default datacenter hdd

The crush map reset after PVE nodes reboot concerns the old version. The default crush map (version new without datacenter level) is created and the whole OSD were placed within this OSD tree. This issue does not occur on the new version of the crush map.

I deliberately did not set the SAS OSD on the new crush map tree due to performance issue.
My performance issue occurs on both crush map tree.

Best regards,
Saiki
 

Attachments

  • crush-map-old.png
    crush-map-old.png
    111.4 KB · Views: 122
  • crush-map-new.png
    crush-map-new.png
    87.8 KB · Views: 125
Last edited:
that is not a crush map - you can find the crush map on the "configuration" tab. if you censor your host names, please replace them with meaningful identifiers.
 
My bad, please find the crush map configuration

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host datacenter1-node2 {
id -3 # do not change unnecessarily
id -4 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.873
item osd.1 weight 0.873
item osd.2 weight 0.873
}
host datacenter1-node1 {
id -5 # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.3 weight 0.873
item osd.4 weight 0.873
item osd.5 weight 0.873
}
datacenter datacenter1 {
id -11 # do not change unnecessarily
id -14 class ssd # do not change unnecessarily
# weight 5.239
alg straw2
hash 0 # rjenkins1
item datacenter1-node2 weight 2.620
item datacenter1-node1 weight 2.620
}
host datacenter2-node1{
id -7 # do not change unnecessarily
id -8 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.6 weight 0.873
item osd.7 weight 0.873
item osd.8 weight 0.873
}
host datacenter2-node2{
id -9 # do not change unnecessarily
id -10 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.9 weight 0.873
item osd.10 weight 0.873
item osd.11 weight 0.873
}
datacenter datacenter2 {
id -12 # do not change unnecessarily
id -13 class ssd # do not change unnecessarily
# weight 5.239
alg straw2
hash 0 # rjenkins1
item datacenter2-node1 weight 2.620
item datacenter2-node2 weight 2.620
}
root default {
id -1 # do not change unnecessarily
id -2 class ssd # do not change unnecessarily
# weight 10.478
alg straw2
hash 0 # rjenkins1
item datacenter1 weight 5.239
item datacenter2 weight 5.239
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicated-ssd {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type datacenter
step emit
}
rule replicated-hdd {
id 2
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type datacenter
step emit
}

# end crush map
 
I followed the 1.3 model of this article for my OSD tree.
To which article are you referring to?

Furthermore, when I reboot every PVE node, I noticed that the default OSD tree is created again and every OSD are placed there again.
Is this still the case?

When I set RBD storage, via PVE GUI, the whole storage space is not only the sum of my SSD or HDD space, but the sum of all OSD.
This is known and worked on.

I have also noticed that the performance became very bad when I have included the HDD OSD (write average of 80MB/s). It feels like Ceph integration within PVE is not capable of using different pools based on custom replication rules to use different hard drive (SSD and SAS).
PVE can handle different pools with different rulesets. Here, most likely all OSDs have been used.
Code:
ceph osd crush tree --show-shadow

I deliberately did not set the SAS OSD on the new crush map tree due to performance issue.
My performance issue occurs on both crush map tree.
See above.

The CURSH map can be tested before applied to the cluster.
Code:
crushtool -i crush.map --test --show-X
 
Hi Alwin,

Thank you for your help.

To which article are you referring to?

I am refering to this article : http://cephnotes.ksperis.com/blog/2015/02/02/crushmap-example-of-a-hierarchical-cluster-map

Is this still the case?
I eventually have this issue solved. I chose to use class in order to separate hdd and ssd and it works fine.
This issue probably occurs when I have the node name changed and I guess PVE could not match it with real node names.

This is known and worked on.
I understand that it is a known issue and PVE team is currently working to fix it, am I correct ?

Thanks again.

Best regards,
Saiki
 
Thought so. ;) This involves the use of virtual host names for the crushmap, this is not covered by the gui.

I eventually have this issue solved. I chose to use class in order to separate hdd and ssd and it works fine.
This issue probably occurs when I have the node name changed and I guess PVE could not match it with real node names.
This is also the recommended way, as it involves less hassle on configuration.

I understand that it is a known issue and PVE team is currently working to fix it, am I correct ?
Yep, exactly.
 
  • Like
Reactions: HE_Cole

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!