[SOLVED] ceph add new hosts and create completely new pool

liska_

Member
Nov 19, 2013
115
3
18
Hi,
now I have a running ceph three-node cluster with two ssd storage nodes and one monitor.
What I would like to achieve is adding another two storage nodes with spinning drives and create a new pool and keep these two pools separated.
I suppose I need to edit crushmap something like here http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/ to add the new hosts and drives but I am not sure how does it deal with pveceph.

Should I install the new nodes with pveinstall and then create osds via gui or rather via cli ceph-disk zap or it does not matter?
Will it not lunch a rebuilding of the current ceph pool?

This is my current output of ceph osd tree
# id weight type name up/down reweight
-1 1.68 root default
-2 0.84 host cl2
0 0.21 osd.0 up 1
1 0.21 osd.1 up 1
2 0.21 osd.2 up 1
3 0.21 osd.3 up 1
-3 0.84 host cl1
4 0.21 osd.4 up 1
5 0.21 osd.5 up 1
6 0.21 osd.6 up 1
7 0.21 osd.7 up 1


Thank you for all the answers
 
Last edited:
Maybe this article by Mr. Han will help? http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

In this article he was creating SSD and Sata Pool on same node. Whereas you are just creating the SSD pool on different nodes. Basically you will have to create new ruleset. Then assign a pool to the ruleset. When you can create new OSD start with weight 0. Then assign the OSD to desired pool and reweight again to desired value.
Pay attention to the step III in the article where you have to change [osd] portion in ceph.conf. This will prevent auto crush update and keep the changes you make.

Besides the pool creation, everything else needs to be done from CLI. You will have to maintain one of the pool completely from CLI since proxmox GUI as of now only works for single line of OSDs. Any new OSD created by GUI will always goto root ruleset.
 
Thank you very much for your reply Wasim.
I have some more newbie question about this procedure. I did not realize I have to make such a "big" changes to add new nodes and pool when I put this storage to production, so I would like to be sure if this way is correct.

1) I will add this line to /etc/pve/ceph.conf to [osd] section
osd crush update on start = false
2) I will install two new nodes with pveceph install -version firefly
3) i will prepare every drive with
ceph-disk zap /dev/sdX
ceph osd create ----- here I get Y right?
mdkir /var/lib/ceph/osd/ceph-Y
mkfs -t xfs /dev/sdX
mount -o user_xattr /dev/sdX /var/lib/ceph/osd/ceph-Y
ceph osd -i Y --mkfs --mkkey
ceph auth ... ----- is this necessary in default proxmox scenario??

4) i will add to my current crush map
Code:
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7

# buckets
host cl2 {
        id -2           # do not change unnecessarily
        # weight 0.840
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.210
        item osd.1 weight 0.210
        item osd.2 weight 0.210
        item osd.3 weight 0.210
}
host cl1 {
        id -3           # do not change unnecessarily
        # weight 0.840
        alg straw
        hash 0  # rjenkins1
        item osd.4 weight 0.210
        item osd.5 weight 0.210
        item osd.6 weight 0.210
        item osd.7 weight 0.210
}
root default {
        id -1           # do not change unnecessarily
        # weight 1.680
        alg straw
        hash 0  # rjenkins1
        item cl2 weight 0.840
        item cl1 weight 0.840
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

following lines:

Code:
host cl3 {
        id -4           # do not change unnecessarily  ----- I suppose I have to increase this number everytime ??
        # weight 0.0  ----- Is this just comment or I shuld put here a sum off all my drives I want to use with ceph??
        alg straw
        hash 0  # rjenkins1
        item osd.10 weight 0.0
        item osd.11 weight 0.0
}

host cl4 {
        id -5           # do not change unnecessarily  
        # weight 0.0
        alg straw
        hash 0  # rjenkins1
        item osd.12 weight 0.0
        item osd.13 weight 0.0
}

root spinning {
        id -6           # do not change unnecessarily
        # weight 0.0
        alg straw
        hash 0  # rjenkins1
        item cl3 weight 0.0  ----- what weight should be here? Zero or the sum of all the drives?
        item cl4 weight 0.0
}

# rules
rule spinning {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take spinning
        step chooseleaf firstn 0 type host
        step emit
}

Is it necessary to add news osds to a device part?

5) start the drives and reweigth
/etc/init.d/ceph start osd.Y
ceph osd reweight Y 0.15

I am little lost in this case as I have to merge more howtos. Thanks a lot for help and patience.
 
I have some more newbie question about this procedure. I did not realize I have to make such a "big" changes to add new nodes and pool when I put this storage to production, so I would like to be sure if this way is correct.
Once you are familiar with Crushmap and some Ceph CLI commands, this is really not that big changes. There are just so much of Ceph, it takes time to get used to it.

1) I will add this line to /etc/pve/ceph.conf to [osd] section
osd crush update on start = false
2) I will install two new nodes with pveceph install -version firefly
Upto this are fine.

3) i will prepare every drive with
ceph-disk zap /dev/sdX
ceph osd create ----- here I get Y right?
mdkir /var/lib/ceph/osd/ceph-Y
mkfs -t xfs /dev/sdX
mount -o user_xattr /dev/sdX /var/lib/ceph/osd/ceph-Y
ceph osd -i Y --mkfs --mkkey
ceph auth ... ----- is this necessary in default proxmox scenario??
Since both your SSD and HDD pools are in same cluster crushmap, you really do not need to do all the manual work. You can use pveceph command all the way:
#ceph-disk zap
#pveceph createosd /dev/sdX

Immediately after createosd, #ceph osd crush reweight osd.X 0. This will prevent unnecessary rebalancing.
I am assuming host cl3 and cl4 are 2 new nodes you are going to add. Is rule spinning is what you are going to use for your SSD pool?

Code:
host cl3 {
        id -4           # do not change unnecessarily  ----- I suppose I have to increase this number everytime ??
[/quote]
Yes you do need to increase the number everytime you add a host manually. 

[quote="liska_, post: 106989"]
        # weight 0.0  ----- Is this just comment or I shuld put here a sum off all my drives I want to use with ceph??[/quote]
You start with weight 0. Then as you reweight/add OSDs, crushmap will change the value automatically based on the size of OSD. 

[quote="liska_, post: 106989"]
Is it necessary to add news osds to a device part?[/quote]
Not sure about your question. 

Before you proceed, i suggest you spend significant amount of time understanding the crushmap used in this Ceph documentation:
http://ceph.com/docs/master/rados/operations/crush-map/

I think this is exactly what you are trying to accomplish. Learning and understanding this crushmap will go a very long way.
 
Thank you very much for your answer. I have got few more questions.
I want to add spinning drives from older servers cl3 and cl4 to current ssd ones cl1 and cl2.
I am glad that step three can be more simple. What if I set ceph osd set noin and ceph osd set noup as you have written in your excellent book?
Do I still need to set reweight to zero after creating osd or I can just put all the values in the new crushmap?
And what about weight of each item in root spinning? This is also automatically changed?
I try to go through the documentation of crushmap, but these are questions to which I could not find answers anywhere. I must say setting glusterfs was far more easier, but the possibilities of ceph are much bigger.
 
I am glad that step three can be more simple. What if I set ceph osd set noin and ceph osd set noup as you have written in your excellent book?
Do I still need to set reweight to zero after creating osd or I can just put all the values in the new crushmap?
And what about weight of each item in root spinning? This is also automatically changed?
I try to go through the documentation of crushmap, but these are questions to which I could not find answers anywhere. I must say setting glusterfs was far more easier, but the possibilities of ceph are much bigger.
I find it easier to work with OSDs once they are part of the cluster. With noin option new OSDs will not be added to the cluster. In last several months Crushmap manipulation became much easier through CLI Ceph commands. At the time that book was written it was necessary to extract/inject crushmaps. But lot of those now can be done realtime through commands. Once the OSDs are added to the cluster in whatever nodes they are, you can move them around with #ceph osd crush commands. By reweighting to 0 you can move around the OSDs without starting rebalance.

Gluster is indeed easier to setup. But Ceph features outweighs the gluster benefit in my opinion. :)
 
I finally finished this setup. I will make some notes about this procedure.
On new nodes after command
pveceph install -version fireflyit was also necessary to run following command without parameters as is mentioned on wiki.

pveceph init

Then I prepared drives with zap command and then create osd
ceph-disk zap /dev/sdb
pveceph createosd /dev/sdb

In this step, the osd was added to ceph but not to any pool but with weigth 0.0 so no reweighting was happening. I added new buckets and created ruleset spinning

ceph osd crush add-bucket cl3 host
ceph osd crush add-bucket cl4 host
ceph osd crush add-bucket spinning root
ceph osd crush move cl3 root=spinning
ceph osd crush move cl4 root=spinning
ceph osd crush rule create-simple spinning_ruleset spinning host

I have not found any way how to move newly created osds to correct pool so I removed them, stopped osd daemons and addedd them again. I believe there is some better way, but I have not found anything useful.

ceph osd out osd.25 ; /etc/init.d/ceph stop osd.25 ; ceph osd crush add osd.25 0.0 host=cl4 ; /etc/init.d/ceph start osd.25

After adding all drives as osds, I created new pool and set correct min_size and other parameters
ceph osd pool create spinning 1024 1024 replicated spinning_ruleset
ceph osd pool set spinning size 2

Then I set correct weight and activated osds
ceph osd crush reweight osd.25
ceph osd in osd.25

Now I have two healthy separated pools on different hosts and drives. Ceph is really impressive.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!