How to create custom CRUSH/Bucket on Proxmox Ceph Cluster

avbdev

New Member
Sep 7, 2024
2
0
1
Hello Everyone,

I am in the process of building a home lab with three nodes, utilizing various storage devices (NVMe, SSD, and HDD). To organize these, I created three custom CRUSH buckets (`nvme-crush`, `ssd-crush`, and `hdd-crush`) to segregate the storage devices accordingly. However, I encountered an issue: while the Ceph cluster remains healthy when the OSDs are part of the default CRUSH hierarchy, it begins to show warnings about inactive placement groups (`pg`) once I move the devices into the custom buckets.

Below are the details of the nodes and their respective storage capacities:

Node1:

  • 1.8 TB - NVMe
  • 3 x 2 TB - SSD
Node2:

  • 819 GB - NVMe
  • 3 x 4 TB - HDD
Node3:

  • 819 GB
My intention is as follows:

  • nvme-crush: to include all NVMe devices
  • ssd-crush: to include all SSD devices
  • hdd-crush: to include all HDD devices
The goal is to create storage pools based on these CRUSH buckets, using them for specific workloads based on application needs. For example, HDD storage would be ideal for log storage. The plan is to establish three distinct storage pools: `nvme-pool`, `ssd-pool`, and `hdd-pool`.

Does Proxmox support the use of custom CRUSH hierarchies or buckets? If so, could anyone provide guidance on how to configure this setup?

Thank you in advance for your assistance!
 
You can use the default Ceph tools to modify the crush map, though if your intention is only to separate on the device class, individual crush maps are overkill.

Make sure you have the device classes assigned correctly. You can use custom device classes as well!
Then you can create new rules that specify the device class: https://docs.ceph.com/en/latest/rados/operations/crush-map/#device-classes
Code:
ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>

ceph osd crush rule create-replicated replicated-nvme default host nvme
Then you can assign the rules to the pools (GUI) or use them when you create new pools. Keep in mind, that once you use device specific rules, each pool should be assigned to a specific device class and the default "replicated_rule" should not be used anymore!

The nodes and disks they have will be a bit of an issue. For ceph, each node is ideally built the same way or you have groups of nodes that have the same storage layout, with 3 nodes per group as a minimum. Everything else will end in shoehorning Ceph into something it wasn't really designed for :)

In smaller clusters, ZFS pools (named the same across the nodes) + Replication can be a good compromise to have disk images available on the other nodes. The shortest possible interval for the async replication is each minute.
 
  • Like
Reactions: avbdev
You can use the default Ceph tools to modify the crush map, though if your intention is only to separate on the device class, individual crush maps are overkill.

Make sure you have the device classes assigned correctly. You can use custom device classes as well!
Then you can create new rules that specify the device class: https://docs.ceph.com/en/latest/rados/operations/crush-map/#device-classes
Code:
ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>

ceph osd crush rule create-replicated replicated-nvme default host nvme
Then you can assign the rules to the pools (GUI) or use them when you create new pools. Keep in mind, that once you use device specific rules, each pool should be assigned to a specific device class and the default "replicated_rule" should not be used anymore!

The nodes and disks they have will be a bit of an issue. For ceph, each node is ideally built the same way or you have groups of nodes that have the same storage layout, with 3 nodes per group as a minimum. Everything else will end in shoehorning Ceph into something it wasn't really designed for :)

In smaller clusters, ZFS pools (named the same across the nodes) + Replication can be a good compromise to have disk images available on the other nodes. The shortest possible interval for the async replication is each minute.
Thank you so much for your guidance. This made things a lot simpler :cool:
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!