[SOLVED] Ceph - Setting existing pool device class

Jun 8, 2016
344
69
68
47
Johannesburg, South Africa
We have upgraded a cluster to PVE 5.1 with Ceph Luminous and have completed migrating all our OSDs to BlueStore. We will be adding dedicated SSD OSDs in the near future and would like to utilise the device class feature.

We currently have 3 pools defined:
  • rbd
  • cephfs_data
  • cephfs_metadata
PS: All pools are triple replicated with 4 hdd OSDs per host (6 hosts, so 24 OSDs)

Is there an easy way to update the existing pools, so that they don't start consuming SSDs?


Also, I understand that the cache tiering code is no longer actively maintained, or that RedHat have advised that they intend stopping active development of it. Should I be reading up about a possible replacement?

I was planning on adding 12 x SSDs (2 per host) and using this as both a cache tier and a dedicated SSD pool. Any suggestions or warnings? (We've selected Intel DC S4600 devices, rated at 3.2 full daily write endurance over 5 years and 65k random write IOPS performance).


Lastly, I assume renaming the existing 'rbd' pool to 'rbd-hdd' should just require us to rename the actual pool and subsequently update /etc/pve/storage.cfg?
 
Is there an easy way to update the existing pools, so that they don't start consuming SSDs?
There is a patch for our docs on the way, but meantime here the ceph link:
http://docs.ceph.com/docs/master/rados/operations/crush-map/#device-classes
http://ceph.com/community/new-luminous-crush-device-classes/

With 'ceph osd crush tree --show-shadow', you should already see the different device classes. Then you need to add a ruleset and set the pool to use it (see links above).

Also, I understand that the cache tiering code is no longer actively maintained, or that RedHat have advised that they intend stopping active development of it. Should I be reading up about a possible replacement?
AFAIK, there is nothing yet that could be considered a replacement. But yes, this is sure a good idea.

I was planning on adding 12 x SSDs (2 per host) and using this as both a cache tier and a dedicated SSD pool. Any suggestions or warnings? (We've selected Intel DC S4600 devices, rated at 3.2 full daily write endurance over 5 years and 65k random write IOPS performance).
Never tried that, but two different workloads on the same storage might interfere with each other too much. For sure needs testing.

Lastly, I assume renaming the existing 'rbd' pool to 'rbd-hdd' should just require us to rename the actual pool and subsequently update /etc/pve/storage.cfg?
You cannot rename a pool, you need to move your data into a new pool and delete the old one. If you only want to change the storage name in PVE, then you can do so by editing the storage.cfg, but all vmid.conf needs updating too.
 
I tested this on a cluster with mixed hdd/ssd OSDs and subsequently prepared our primary cluster as well. We essentially create a new crush rule to exclusively use hdd devices and update our existing pools to use the new crush rule. You can remove the old default crush rule when pools are updated to use the new rule, whilst data is being moved around.

I was initially concerned that it may invalidate data on ssd devices but it's clever and reliable.

NB: Applying a new rule, even though it references the same OSDs as they were exclusively hdds, still resulted in 50% of the placement groups being misplaced. We initiated the process on a Friday afternoon, to reduce overhead impact during primary production hours.

Commands:
Code:
ceph osd crush tree --show-shadow;
ceph osd lspools;
ceph osd crush rule ls;
ceph osd crush rule dump;

ceph osd crush rule create-replicated replicated_hdd default host hdd;
# If you already have ssd OSDs, otherwise only after you add your first SSD only OSD:
#ceph osd crush rule create-replicated replicated_ssd default host ssd;
for f in rbd cephfs_data cephfs_metadata; do
  ceph osd pool set $f crush_rule replicated_hdd;
done

ceph osd crush rule rm replicated_ruleset;

ceph osd pool rename rbd rbd_hdd;
mv /etc/pve/priv/ceph/rbd.keyring /etc/pve/priv/ceph/rbd_hdd.keyring;
vi /etc/pve/storage.cfg;
  # Change 'pool rbd' to 'pool rbd_hdd'
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!