Multiple Ceph pools possible?

Volker Lieder · Nov 6, 2017

Hi,

we want to create two ceph pools.
One with SAS osds, and one with ssd as osds.
Is it possible to maintain that in proxmox 5.1 via webinterface or cli or do we have to configure one ceph configuration by hand?
Regards,
Volker

fabian · Nov 6, 2017

(assuming you are on PVE 5.1/Ceph Luminous)

you need to configure CRUSH rules accordingly, but then you can create and use pools using the GUI. if your OSDs are correctly identified as HD/SSD, you can use the new device class feature to easily create such a ruleset (see http://ceph.com/community/new-luminous-crush-device-classes/), then just select the right one when creating a pool and everything should work as expected

if they are not, you need to manually tell Ceph which device belongs to which class (also described in the same link)

Saiki · Nov 7, 2017

Hello,

Thank you for asking this question.
I have the exact same need.

There are 12 SSD OSD and 4 HDD OSD within my architecture (PVE 5.1 with integrated Ceph Luminous).
I updated the CRUSH map adding datacenter levels and then I created two replication rules using these commands.

Code:

ceph osd crush rule create-replicated replicated-ssd datacenter host ssd
ceph osd crush rule create-replicated replicated-hdd datacenter host hdd

Afterwards I created two pools (rbd-ssd and rbd-hdd) with the aforementioned replication rules.

However, I encounter some issues :

I followed the 1.3 model of this article for my OSD tree. I have created different hostname compared to the default one, and noticed the GUI feature for OSD does not work anymore. I cannot tackle the split between SSD and HDD using different OSD tree than the default one. Furthermore, when I reboot every PVE node, I noticed that the default OSD tree is created again and every OSD are placed there again.

When I set RBD storage, via PVE GUI, the whole storage space is not only the sum of my SSD or HDD space, but the sum of all OSD.
I have also noticed that the performance became very bad when I have included the HDD OSD (write average of 80MB/s). It feels like Ceph integration within PVE is not capable of using different pools based on custom replication rules to use different hard drive (SSD and SAS).

Best regards,
Saiki

fabian · Nov 8, 2017

please post your OSD tree and crush maps.

Saiki · Nov 8, 2017

Hello Fabian,

Thanks for your reply.
Please find the two versions of my crush map configurations.

For the old version, these are the commands I have used to setup the replication rules :

Code:

ceph osd crush rule create-replicated replicated-ssd root-ssd datacenter
ceph osd crush rule create-replicated replicated-hdd root-hdd datacenter

For the new version :

Code:

ceph osd crush rule create-replicated replicated-ssd default datacenter ssd
ceph osd crush rule create-replicated replicated-hdd default datacenter hdd

The crush map reset after PVE nodes reboot concerns the old version. The default crush map (version new without datacenter level) is created and the whole OSD were placed within this OSD tree. This issue does not occur on the new version of the crush map.

I deliberately did not set the SAS OSD on the new crush map tree due to performance issue.
My performance issue occurs on both crush map tree.

Best regards,
Saiki

fabian · Nov 8, 2017

that is not a crush map - you can find the crush map on the "configuration" tab. if you censor your host names, please replace them with meaningful identifiers.

Saiki · Nov 8, 2017

My bad, please find the crush map configuration

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host datacenter1-node2 {
id -3 # do not change unnecessarily
id -4 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.873
item osd.1 weight 0.873
item osd.2 weight 0.873
}
host datacenter1-node1 {
id -5 # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.3 weight 0.873
item osd.4 weight 0.873
item osd.5 weight 0.873
}
datacenter datacenter1 {
id -11 # do not change unnecessarily
id -14 class ssd # do not change unnecessarily
# weight 5.239
alg straw2
hash 0 # rjenkins1
item datacenter1-node2 weight 2.620
item datacenter1-node1 weight 2.620
}
host datacenter2-node1{
id -7 # do not change unnecessarily
id -8 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.6 weight 0.873
item osd.7 weight 0.873
item osd.8 weight 0.873
}
host datacenter2-node2{
id -9 # do not change unnecessarily
id -10 class ssd # do not change unnecessarily
# weight 2.620
alg straw2
hash 0 # rjenkins1
item osd.9 weight 0.873
item osd.10 weight 0.873
item osd.11 weight 0.873
}
datacenter datacenter2 {
id -12 # do not change unnecessarily
id -13 class ssd # do not change unnecessarily
# weight 5.239
alg straw2
hash 0 # rjenkins1
item datacenter2-node1 weight 2.620
item datacenter2-node2 weight 2.620
}
root default {
id -1 # do not change unnecessarily
id -2 class ssd # do not change unnecessarily
# weight 10.478
alg straw2
hash 0 # rjenkins1
item datacenter1 weight 5.239
item datacenter2 weight 5.239
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicated-ssd {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type datacenter
step emit
}
rule replicated-hdd {
id 2
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type datacenter
step emit
}

# end crush map

Alwin · Nov 9, 2017

Saiki said:
I followed the 1.3 model of this article for my OSD tree.

To which article are you referring to?

Saiki said:
Furthermore, when I reboot every PVE node, I noticed that the default OSD tree is created again and every OSD are placed there again.

Is this still the case?

Saiki said:
When I set RBD storage, via PVE GUI, the whole storage space is not only the sum of my SSD or HDD space, but the sum of all OSD.

This is known and worked on.

Saiki said:
I have also noticed that the performance became very bad when I have included the HDD OSD (write average of 80MB/s). It feels like Ceph integration within PVE is not capable of using different pools based on custom replication rules to use different hard drive (SSD and SAS).

PVE can handle different pools with different rulesets. Here, most likely all OSDs have been used.

Code:

ceph osd crush tree --show-shadow

Saiki said:
I deliberately did not set the SAS OSD on the new crush map tree due to performance issue.
My performance issue occurs on both crush map tree.

See above.

The CURSH map can be tested before applied to the cluster.

Code:

crushtool -i crush.map --test --show-X

Saiki · Nov 21, 2017

Hi Alwin,

Thank you for your help.

Alwin said:
To which article are you referring to?

I am refering to this article : http://cephnotes.ksperis.com/blog/2015/02/02/crushmap-example-of-a-hierarchical-cluster-map

Alwin said:
Is this still the case?

I eventually have this issue solved. I chose to use class in order to separate hdd and ssd and it works fine.
This issue probably occurs when I have the node name changed and I guess PVE could not match it with real node names.

Alwin said:
This is known and worked on.

I understand that it is a known issue and PVE team is currently working to fix it, am I correct ?

Thanks again.

Best regards,
Saiki

Alwin · Nov 22, 2017

Saiki said:
I am refering to this article : http://cephnotes.ksperis.com/blog/2015/02/02/crushmap-example-of-a-hierarchical-cluster-map

Thought so.

This involves the use of virtual host names for the crushmap, this is not covered by the gui.

Saiki said:
I eventually have this issue solved. I chose to use class in order to separate hdd and ssd and it works fine.
This issue probably occurs when I have the node name changed and I guess PVE could not match it with real node names.

This is also the recommended way, as it involves less hassle on configuration.

Saiki said:
I understand that it is a known issue and PVE team is currently working to fix it, am I correct ?

Yep, exactly.

Saiki · Nov 23, 2017

Hi Alwin,

Thank you again for your help.

Best regards,
Saiki

Search

Search

Multiple Ceph pools possible?

Volker Lieder

Well-Known Member

fabian

Proxmox Staff Member

Saiki

New Member

fabian

Proxmox Staff Member

Saiki

New Member

Attachments

fabian

Proxmox Staff Member

Saiki

New Member

Alwin

Proxmox Retired Staff

Saiki

New Member

Alwin

Proxmox Retired Staff

Saiki

New Member