[SOLVED] Ceph - Setting existing pool device class

David Herselman · Nov 29, 2017

We have upgraded a cluster to PVE 5.1 with Ceph Luminous and have completed migrating all our OSDs to BlueStore. We will be adding dedicated SSD OSDs in the near future and would like to utilise the device class feature.

We currently have 3 pools defined:

rbd
cephfs_data
cephfs_metadata

PS: All pools are triple replicated with 4 hdd OSDs per host (6 hosts, so 24 OSDs)

Is there an easy way to update the existing pools, so that they don't start consuming SSDs?

Also, I understand that the cache tiering code is no longer actively maintained, or that RedHat have advised that they intend stopping active development of it. Should I be reading up about a possible replacement?

I was planning on adding 12 x SSDs (2 per host) and using this as both a cache tier and a dedicated SSD pool. Any suggestions or warnings? (We've selected Intel DC S4600 devices, rated at 3.2 full daily write endurance over 5 years and 65k random write IOPS performance).

Lastly, I assume renaming the existing 'rbd' pool to 'rbd-hdd' should just require us to rename the actual pool and subsequently update /etc/pve/storage.cfg?

Alwin · Nov 29, 2017

David Herselman said:
Is there an easy way to update the existing pools, so that they don't start consuming SSDs?

There is a patch for our docs on the way, but meantime here the ceph link:
http://docs.ceph.com/docs/master/rados/operations/crush-map/#device-classes
http://ceph.com/community/new-luminous-crush-device-classes/

With 'ceph osd crush tree --show-shadow', you should already see the different device classes. Then you need to add a ruleset and set the pool to use it (see links above).

David Herselman said:
Also, I understand that the cache tiering code is no longer actively maintained, or that RedHat have advised that they intend stopping active development of it. Should I be reading up about a possible replacement?

AFAIK, there is nothing yet that could be considered a replacement. But yes, this is sure a good idea.

David Herselman said:
I was planning on adding 12 x SSDs (2 per host) and using this as both a cache tier and a dedicated SSD pool. Any suggestions or warnings? (We've selected Intel DC S4600 devices, rated at 3.2 full daily write endurance over 5 years and 65k random write IOPS performance).

Never tried that, but two different workloads on the same storage might interfere with each other too much. For sure needs testing.

David Herselman said:
Lastly, I assume renaming the existing 'rbd' pool to 'rbd-hdd' should just require us to rename the actual pool and subsequently update /etc/pve/storage.cfg?

You cannot rename a pool, you need to move your data into a new pool and delete the old one. If you only want to change the storage name in PVE, then you can do so by editing the storage.cfg, but all vmid.conf needs updating too.

David Herselman · Nov 29, 2017

Thanks Alwin,

Many thanks for the links, will read up...

Ceph documentation appears to indicate that renaming a pool can be done by running:

Code:

ceph osd pool rename {current-pool-name} {new-pool-name}

Reference: http://docs.ceph.com/docs/jewel/rados/operations/pools/

I'll test this in a small cluster first and provide feedback, should it help others...

Alwin · Nov 29, 2017

David Herselman said:
Ceph documentation appears to indicate that renaming a pool can be done by running:

Thanks, didn't know that.

David Herselman · Dec 10, 2017

I tested this on a cluster with mixed hdd/ssd OSDs and subsequently prepared our primary cluster as well. We essentially create a new crush rule to exclusively use hdd devices and update our existing pools to use the new crush rule. You can remove the old default crush rule when pools are updated to use the new rule, whilst data is being moved around.

I was initially concerned that it may invalidate data on ssd devices but it's clever and reliable.

NB: Applying a new rule, even though it references the same OSDs as they were exclusively hdds, still resulted in 50% of the placement groups being misplaced. We initiated the process on a Friday afternoon, to reduce overhead impact during primary production hours.

Commands:

Code:

ceph osd crush tree --show-shadow;
ceph osd lspools;
ceph osd crush rule ls;
ceph osd crush rule dump;

ceph osd crush rule create-replicated replicated_hdd default host hdd;
# If you already have ssd OSDs, otherwise only after you add your first SSD only OSD:
#ceph osd crush rule create-replicated replicated_ssd default host ssd;
for f in rbd cephfs_data cephfs_metadata; do
  ceph osd pool set $f crush_rule replicated_hdd;
done

ceph osd crush rule rm replicated_ruleset;

ceph osd pool rename rbd rbd_hdd;
mv /etc/pve/priv/ceph/rbd.keyring /etc/pve/priv/ceph/rbd_hdd.keyring;
vi /etc/pve/storage.cfg;
  # Change 'pool rbd' to 'pool rbd_hdd'

Search

Search

[SOLVED] Ceph - Setting existing pool device class

David Herselman

Renowned Member

Alwin

Proxmox Retired Staff

David Herselman

Renowned Member

Alwin

Proxmox Retired Staff

David Herselman

Renowned Member