Apologies for the length of this, but if I had come across this post when I was looking for answers to this problem, I would have FULLY understood the solution to my problem instead of the almost 10 hours of useless documentation reading and internet searching I did to still NOT find the answer.
Huge thanks to Toranaga above for leading me down the right path. And whomever is maintaining the docs on the Proxmox wiki might consider reading the semi-rant at the bottom of the page and determining if maybe it might merit a blurb or two in the way-over-linked
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_device_classes article.
------------
So following what I wrote above gives the following error:
# ceph osd crush rule create-replicated replicated_rule_hddpool2 default host hdd2
Error EINVAL: device class hdd2 does not exist
HOWEVER that's because apparently you need a device on the device class BEFORE you can create a rule for the device class...i.e. the "rm-device-class/set-device-class" and "crush rule create" lines above are reversed.
Code:
# Remove the current device class on the OSDs I want to move to the new pool.
$> ceph osd crush rm-device-class osd.$OSDNUM
# Add new device classes to the OSDs to move.
$> ceph osd crush set-device-class hdd2 osd.$OSDNUM
# Create a new crush rule for a new pool.
$> ceph osd crush rule create-replicated replicated_rule_hdd2 default host hdd2
# Create a new CEPH Pool associated with the new CRUSH Rule.
$> ceph osd pool set hddpool2 crush_rule replicated_rule_hdd2
So because this was impossible for me to find in information "out there" for anyone who runs across this trying to find the same thing I have:
In the Code Above:
- $OSDNUM is the OSD Identifier. When you do "ceph osd tree" it will show the OSDs on your hosts, each OSD will be named "osd.#" where # is a consecutive identifier for the OSD. Probably didn't need to mention that, but lets call this "comprehensive" documentation.
- hdd2 Is a user defined label for a new device class. As noted below, this can be ANYTHING you'd like it to be. This value is arbitrary and carries NO significance within Ceph at all. (See Below)
- There must be AT LEAST one OSD known by Ceph on the new device class before running the "ceph osd crush rule" command. Otherwise you will get "Error EINVAL: device class <CLASSNAME> does not exist". This error DOES NOT mean that the device class names are a list of known values, it means that Ceph couldn't find an OSD with that device class on it in the cluster already. Run "rm-device-class" and "set-device-class" first.
- replicated_rule_hdd2 is a user defined name for a new CRUSH Ruleset. Without modification, you will likely have the rule "replicated_rule" already defined in your Crushmap...you can use anything you want in place of this text EXCEPT the name of any existing rule you have in your crushmap.
- hddpool2 is another arbitrarily defined name, this time it's the name of a new pool in Ceph which will get set to use the new crush rule.
The first two commands are simply removing and adding a distinct label to each OSD you want to create a new pool for.
The third command is creating a Ceph "Crushmap" rule associating the above "distinct label" to a unique crushmap rule.
The fourth command creates a new pool and tells that pool to use the new crushmap rule created by the third command above.
Thus this boils down to:
- Create a Label
- Assign the Label to a new Rule
- Assign the Rule to a new Pool
Note that the 4th command can be replaced by using the Proxmox GUI for Ceph to create a new Pool. After running the "ceph osd crush rule" command the new rule will immediately show up in the Pool GUI's dropdown for selection when clicking the "create" button in the Ceph Pool interface.
And that's it.
The most important lesson I learned from this exercise:
The Device Class is NOT a sacrosanct value. It is nothing more than a text "tag" you can apply and remove from OSD Devices. I could have called my new device class "fred" or "new-device-class" or "I_hate_world_of_warcraft", it has no meaning to Ceph what-so-ever. Just because the terms HDD, SDD and NVME DO have meaning in the technical world and SOUND like they are important to "get right" , this is simply not the case. These tags DO NOT set some arbitrary tuning information within Ceph or cause Ceph to deal with the OSDs any differently.
The problem with ALL of the documentation on the net with regards to "device classes" is that it all talks about separating OSDs by speed and thus makes the tags "HDD" and "SDD" sound like they have some importance or meaning...after all "HDD" and "SDD" are two VERY different devices with VERY different performance profiles, so it must be important that one device's
class is set to "hdd" and another devices
class is set to "sdd" right? In fact, no, there is
no importance to the device class AT ALL other than to group OSDs with class "$A" separately from another group of OSDs with class "$B"...the fact that "$A" in this case is named "HDD" is utterly irrelevant.
Even the term "device class" is a misnomer that creates confusion by assigning what actually amounts to a text tag a misleading importance by calling it a "device class" and then using names for those tags that have REAL technical implications with very district and different profiles. Because the term "device class" and the tag text "hdd", "sdd" and "nvme" DO match industry definitions of distinct devices AND we refer to these categories as "classes" of storage devices, one is incorrectly lead to assume an importance of this text tag that simply does not exist.
The
only reason to tag an HDD as the "hdd" device class tag or an SDD as the "sdd" device class tag is because Ceph does some automatic magic for you by detecting the ACTUAL device class of your devices and assigning a matching ceph device class tag to the text of the ACTUAL device class of the OSD device when you bring the OSD online. So by default your HDDs will get the tag "hdd" assigned to them by Ceph...however it is still important to understand that just because Ceph assigned the text label "hdd" to a disk that reports itself as being a spinning disk with platters using magnetism to store data, this is simply correlation...in the same way that "All Criminals drink water, so drinking water causes criminality" is a correlation. Ceph could have assigned the text "NVME" to this spinning disk with platters that uses magnetism to store data and it wouldn't matter in the slightest to how Ceph handles that OSD.
If you decide to assign "john" to your HDD Devices and "sara" to your SDD Devices there's nothing stopping you from doing so, however it's possible that upon reboot/restart of Ceph it will re-tag your devices back to "hdd" and "sdd"...NOT because it's important to Ceph that a spinning disk is assigned the label "hdd" but because it MIGHT be important and convenient to you the Ceph user that your spinning rust and your "storage on chip" disks be in different pools...the fact that the labels used to separate your "fast" disks from your "slow" disks matches some industry standard definition is unfortunate and confusing.
To stop that, the following appears to be the defined method:
Code:
Modify your local /etc/ceph/ceph.conf to include an osd entry:
[osd]
osd_class_update_on_start = false