Proxmox 6.4.14 - Creating Additional SSD Pool under default HELP!

orenr

New Member
Oct 26, 2021
20
1
3
30
Hello all,
We have a cluster of 9 servers with hdd ceph pool.

We have Recently purchased SSD disks for our SSD pool

Now the situation is
That when we need to create a new crush rule for ssd:
ceph osd crush rule create-replicated replicated-ssd defaulthost ssd

But getting the following error:

Error ENOENT: root item default-ssd does not exist


But when we are creating an osd with our ssd disk only then... when running the "ceph osd crash" command
The command was successful.
And only then we can create our SSD-POOL and attach it to the new crush rule

yes we have created an ssd class before all of it
And in meanwhile it starting to rebalance the new SSD OSD disk with the already existing HDD OSD's data

1656337951672.png



The main issue that we are failing with creating a new completely separate SSD pool.
How can we be really sure that we are having a full segregation between SSD Disks OSD's and HDD Disks OSD's ????
 
What is the output of the following commands?
Code:
ceph osd crush rule ls
ceph osd df tree

The overall plan, if you only have the replicated_rule, is to create new rules, one for each device class that you want to use. Then assign all pools to one of the new rules, so that you don't have any pools left using the replicated_rule which does not make any distinction. This is necessary because if you have pools that span multiple device classes, in combination with other pools that are limited to that device class, the autoscaler will have troubles deciding on the optimal pg_num for those pools.
 
Hi aaron,

Thank you for your reply.

See below outputs:

root@Proxmox4-APC:~# ceph osd crush rule ls replicated_rule replicated_ssd


1656399957893.png


Ill explain a bit more of our environment:
We have 9 nodes as you can see:

When we created the Ceph storage, at first all OSD are HDD disks, and we have created it on default bucket , as though it came with the default replicated rule. We didn't planned to be executing and building an SSD Ceph Storage so we sticked with the default configs because we didn't know otherwise and was focused to create Ceph storage and were doing it also with a simple guide. Bottom line:
We did not specified or created an additional rule or additional hdd bucket for it before creating the ceph pool.
we used what was there by default.

Then we decided to have an additional pool, of SSD's disks.
Now we understood that there is another way of architecture that we had to perform before creating our current pool in case we wanted to make a segregation as mentioned above. (Again we didnt know otherwise at that time).

I'm pasting what we did here again and the outcomes:
1. Inserted the SSD's Disks in hba mode.
2. e tried to create a crush rule for ssd
3. proxmox/ceph cluster didnt allowed us to create it since it didnt detect the SSD
4. After we created the OSD disk, Only then we could created the crush rule
5.Then after creating the crush rule, we created the sdd-pool and assigned it to the new rule.
6. we took the SSD osd out and implemented the OSD back again (So far we were having the ssd-pool associated with the ssd rule)
7. As you can see in the attachment - the data of the HDD default ceph pool is replicating the data on the SSD OSD.
8. The pg's in the ceph dashboard are still in yellow and not get being taken/speeded to the ssd disk we just inserted. (seems like that ceph literly do not know what to do with it)
9. Regarding to the PG's it could only happened if inserted the SSD- (am i right?)


So above our steps trying create "ceph SSD pool".

Now our questions:
1. Is there a way that from this point we are standing right now, to create ssd-pool that will be completely separated from the hdd-pool(which is under' defualt bucket? ) and its replication?
2. is there a way to be able to create the ssd-pool with some simple steps?
in case there isn't a way:

3. Is there a way to create a new bucket - assign an hdd crush rule - and then move the existing osd's under it?
By the Way the current hdd OSD's are hosting lot of data of around 200 vm's
If Yes - it's something that can be risky and outcome with data lose or it's just a simple reassigning?

4. in case 3 question option is possible... After doing though we need the ssd-pool to be arranged in the same way? .

Hope we gave all the necessary background and information to help us out.
 
If that output is current, then you seem to only have one SSD OSD. If you did not change the defaults, Ceph wants to create 3 replicas on different nodes. You will need to add more SSDs into your nodes and create OSDs on them to give Ceph the chance to create the replicas.

In order to separate the pools on HDDs and SSDs, you will also need a rule that limits to HDDs.
Code:
ceph osd crush rule create-replicated replicated-ssd default host ssd
ceph osd crush rule create-replicated replicated-hdd default host hdd

You should be able to create these rules, as they use a dash instead of an underscore.

Then assign the rules to your pools. Each pool should either use the SSD or HDD rule, the replicated rule should not be used anymore as it does not make a distinction in device class and will place data on both, SSDs and HDDs.

If you check the Crush Map data (in the GUI: <node> -> Ceph -> Configuration) you can see the CRUSH rules at the bottom. The newly created ones should look like this:
Code:
rule replicated-ssd {
    id 1
    type replicated
    step take default class ssd
    step chooseleaf firstn 0 type host
    step emit
}
rule replicated-hdd {
    id 2
    type replicated
    step take default class hdd
    step chooseleaf firstn 0 type host
    step emit
}

When you compare it with the replicated_rule, you will see that these rules have one additional line that defines which device class to choose: step take default class ssd

Once you have enough SSD OSDs in your cluster for Ceph to create the needed replicas, assigned all pools one of the SSD or HDD rule, you should see Ceph moving data around.

There is no need to create new buckets in the CRUSH map to separate by device class.

Once everything is working fine, you can clean up the initially create replicated_ssd rule by running:
Code:
ceph osd crush rule rm replicated_ssd
 
Hi aaron,

Thank you very much!!! for your answers and prompt reply.

Highly Appreciated.


Ill might be repeating again on some staff we have both mentioned here in previous replies, But it just to be sure that im 100% percent what i need to do (Very sensitive production area so i cannot make mistakes :):cool:)

Three Questions:
Q1: (Hypothetically speaking )
At the moment i have the following rules configure in my crush map(Production env, we dont have a lab env unfortunately):

# rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } #rule replicated_ssd { id 1 type replicated min_size 1 max_size 10 step take default class ssd step chooseleaf firstn 0 type host step emit }

****( i will change from underscore to dash but for the question itself ill keep same scheme )******
So as I understand. . . I will need to create the hdd rule means the following rule : replicated_hdd

ceph osd crush rule create-replicated replicated-hdd default host hdd


and i should get the following in the crush map :

# rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule replicated_ssd { id 1 type replicated min_size 1 max_size 10 step take default class ssd step chooseleaf firstn 0 type host step emit } rule replicated_hdd { id 2 type replicated min_size 1 max_size 10 step take default class hdd step chooseleaf firstn 0 type host step emit }


Currently our ceph pool(which is from hdd called "Data-pool-1") is on the replicated_rule -
(The pool we created at start when we didnt know otherwise. Cannot rename or delete the current pool since it holds production data)

So after creating the replicated_hdd i just need to edit the pool profile to be the replicated_hdd rule instead of replicated_rule, for the separation i want right (as much as i understood from you) ?
or ....
That it will automatically detects this rule and will shift over all hdd osd's under it?
In case none of the above is the option, then how to shift hdd osd's under the new hdd rule?

Q2:
Should i take out the SSD osd out before performing the creation of the second crush rule? or can i take it from the point where im standing now?

Q3:
You have suggested at the end of your last response to delete the crush rule for the ssd.
Why deleting it? Dont should i keep it in the crush map for future SSD disks and it ceph SSD-POOL(will be called Fast-Pool-1) operation?


Thanks in advance for your reply
 
****( i will change from underscore to dash but for the question itself ill keep same scheme )******
Doesn't seem to be necessary, the current rule looks okay and you can keep it.
So after creating the replicated_hdd i just need to edit the pool profile to be the replicated_hdd rule instead of replicated_rule, for the separation i want right (as much as i understood from you) ?
Yes. For those pools, Ceph will move the data so that it is only stored on OSDs of the device class hdd.
Right now, with only one OSD of type ssd, there isn't too much that needs to be done. You might see a slight decrease in overall pool size (used + available) because the pool has overall less space available if it is only allowed to use OSDs of type hdd.

Should i take out the SSD osd out before performing the creation of the second crush rule? or can i take it from the point where im standing now?
No need. As I mentioned, in order to use ssd OSDs, you need more than just one in your cluster. Ideally, each node is set up identical regarding the OSDs. The same number and size of OSDs of each type. This way, the load is spread evenly.

You have suggested at the end of your last response to delete the crush rule for the ssd.
Why deleting it? Dont should i keep it in the crush map for future SSD disks and it ceph SSD-POOL(will be called Fast-Pool-1) operation?
Only if you would use a different naming scheme to remove the (in this case) unused rule.
 
aaron thank you very much.

As you understood from above that we are on production environment,

Just to make sure when I reassign the rule of the current pool to the replicate_hdd, should i expect a downtime or data corruption due to this shifting?

Thanks again in advance for your help
 
There should not be any issue regarding data corruption. Depending on how much data needs to be moved, you can expect higher load on the cluster.

If possible, set up a test cluster that you can test such operations on before you do it on the production system. You could even do so virtualized with a few VMs to test functionality.
 
Since we are not able to reproduce it in a test environment and the second issue is that we cannot create something at the same scale as we have in production
production data scale is 29T in used out of 70T --> 29T/70T

Do you have any suggestions of the way we would perform this as most safe as possible ?
or just "ride or die"/"get rich or dying trying"?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!