How to create multiple Ceph storage pools in Proxmox?

victorhooi

Active Member
Apr 3, 2018
250
20
38
37
We have three servers running a 3-node Proxmox/Ceph setup.

Each has a single 2.5" SATA for Proxmox, and then a single M.2 NVNe drive and a single Intel Optane PCIe NVMe drive.

I'd like to use both NVMe drives for *two* separate Ceph storage pools in Proxmox. (I.e. one composed of the three M.2 drives, and s second composed of three Optane drives.)

Is this possible via the Proxmox Ceph GUI or the Proxmox Ceph commands/scripts?

Thanks,
Victor
 
Sorry, I'm a bit confused =(

To be clear - you're saying that the only way to do this is to use device classes, right?

I had tried creating OSDs on the first set of disks, then creating a Ceph Pool. Afterwards, I added OSDs on the other set of disks - but it seems to have simply integrated them into the first Ceph Pool. I had perhaps naively thought this would let me create two isolated Ceph Pools.

So at this point - I should create a new Ceph Pool, right?

Then write two different rules to somehow distribute the data? Any chance you could maybe provide a template, or some hint to get me started please?

I already have data in the pool, so I assume once I do that, it will start rebalancing all the data - so should probably schedule it after hours.

Interestingly - I can already see from the GUI that there is a difference in latency to the OSDs on the two types of drives:

fziFqDP.png
 
Also - if I list the OSD hierarchy - they're all class "ssd".

Code:
root@vwnode1:~# ceph osd crush tree --show-shadow
ID CLASS WEIGHT  TYPE NAME
-2   ssd 4.01990 root default~ssd
-4   ssd 1.33997     host vwnode1~ssd
 0   ssd 0.10840         osd.0
 1   ssd 0.10840         osd.1
 2   ssd 0.10840         osd.2
 3   ssd 0.10840         osd.3
12   ssd 0.22659         osd.12
13   ssd 0.22659         osd.13
14   ssd 0.22659         osd.14
17   ssd 0.22659         osd.17
-6   ssd 1.33997     host vwnode2~ssd
 4   ssd 0.10840         osd.4
 5   ssd 0.10840         osd.5
 6   ssd 0.10840         osd.6
 7   ssd 0.10840         osd.7
15   ssd 0.22659         osd.15
18   ssd 0.22659         osd.18
20   ssd 0.22659         osd.20
22   ssd 0.22659         osd.22
-8   ssd 1.33997     host vwnode3~ssd
 8   ssd 0.10840         osd.8
 9   ssd 0.10840         osd.9
10   ssd 0.10840         osd.10
11   ssd 0.10840         osd.11
16   ssd 0.22659         osd.16
19   ssd 0.22659         osd.19
21   ssd 0.22659         osd.21
23   ssd 0.22659         osd.23
-1       4.01990 root default
-3       1.33997     host vwnode1
 0   ssd 0.10840         osd.0
 1   ssd 0.10840         osd.1
 2   ssd 0.10840         osd.2
 3   ssd 0.10840         osd.3
12   ssd 0.22659         osd.12
13   ssd 0.22659         osd.13
14   ssd 0.22659         osd.14
17   ssd 0.22659         osd.17
-5       1.33997     host vwnode2
 4   ssd 0.10840         osd.4
 5   ssd 0.10840         osd.5
 6   ssd 0.10840         osd.6
 7   ssd 0.10840         osd.7
15   ssd 0.22659         osd.15
18   ssd 0.22659         osd.18
20   ssd 0.22659         osd.20
22   ssd 0.22659         osd.22
-7       1.33997     host vwnode3
 8   ssd 0.10840         osd.8
 9   ssd 0.10840         osd.9
10   ssd 0.10840         osd.10
11   ssd 0.10840         osd.11
16   ssd 0.22659         osd.16
19   ssd 0.22659         osd.19
21   ssd 0.22659         osd.21
23   ssd 0.22659         osd.23

However, half of them are from the Intel Optane drive (very low latency), and the other half are from a HP EX920 SSD (slower latency).
 
I had tried creating OSDs on the first set of disks, then creating a Ceph Pool. Afterwards, I added OSDs on the other set of disks - but it seems to have simply integrated them into the first Ceph Pool. I had perhaps naively thought this would let me create two isolated Ceph Pools.
Did you go through the link I provided earlier? I would like to know what might have been unclear on that section to improve the documentation. [0]

So at this point - I should create a new Ceph Pool, right?

Then write two different rules to somehow distribute the data? Any chance you could maybe provide a template, or some hint to get me started please?

I already have data in the pool, so I assume once I do that, it will start rebalancing all the data - so should probably schedule it after hours.
Exactly. See the link [0].

Also - if I list the OSD hierarchy - they're all class "ssd".
Ceph tries to detect the media type, but doesn't always find the proper one. Therefore you can set the type by hand yourself.
Code:
$ ceph osd crush rm-device-class osd.ID osd.ID
$ ceph osd crush set-device-class nvme osd.ID osd.ID

[0] https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_device_classes
 
  • Like
Reactions: takeokun
I just want to follow up that I was able to do this succesfully!

The two types of disks I am using for my two Ceph pools are:
  • Intel Optane 900P (480GB)
  • Samsung 960 EVO (1TB)
To be honest - both disks are actually NVMe disks.

However, I am cheating a bit - I used Ceph to change the device class of the Optane disks to "nvme", and left the Samsung 960's as "ssd".

The commands are:

Code:
# ceph osd crush rm-device-class osd.11 osd.11 osd.9 osd.8 osd.7 osd.6 osd.5 osd.4 osd.3 osd.2 osd.1 osd.0
osd.11 belongs to no class, done removing class of osd(s): 0,1,2,3,4,5,6,7,8,9,11
# ceph osd crush rm-device-class osd.10
done removing class of osd(s): 10
# ceph osd crush set-device-class nvme osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 osd.9 osd.10 osd.11
set osd(s) 0,1,2,3,4,5,6,7,8,9,10,11 to class 'nvme'

Then we create the rules:
Code:
# ceph osd crush rule create-replicated optane-only default host nvme
# ceph osd lspools
1 vm_storage
# ceph osd pool set vm_storage crush_rule optane-only
set pool 1 crush_rule to optane-only
Ceph will then begin to start moving data around.

I also created a rule for SSD only:
Code:
# ceph osd crush rule create-replicated ssd-only default host ssd

I then used the Proxmox GUI to create a new Ceph pool, and for the Crush rule, I chose "ssd-only", instead of the default "replicated_rule".

My first question is - I cheated a bit to use device classes to divide up my Intel Optane vs Samsung disks. Is there another proper way that would allow me to write a Crush rule to configure particular Ceph pools to only use certain models or brands of disks?

Secondly - is it safe to rename a Ceph pool on Proxmox, using:
Code:
ceph osd pool rename {current-pool-name} {new-pool-name}
 
My first question is - I cheated a bit to use device classes to divide up my Intel Optane vs Samsung disks. Is there another proper way that would allow me to write a Crush rule to configure particular Ceph pools to only use certain models or brands of disks?
Yes, but the device classes are intended for that use case. You also don't need to stick to the device naming and create a custom class.
 
  • Like
Reactions: takeokun
Hi,

in my PVE 7 Cluster I have 5 nodes and each node has 4 NVMe disks and 2 SSD disks. Now I would like to create 2 different pools, one for NVMe and one for SSD. I have carefully read the doc "Ceph CRUSH & device classes" but some steps are not clear to me. Which are the steps to achieve my goal? Are the following steps ok?
  1. Install Ceph 16.2
  2. Create 2 crush rules, one for nvme osd and one for ssd osd. How can I do this?
  3. Create the first pool, say nmve, and then bind it to the crush rule nvme?
  4. Bind nvme osd to the pool nvme, how can I do this?
  5. Repeat points 2,3 for the ssd pool...
Are they correct? Which steps can be performed via the GUI and which via the CLI?

Thank you
 
Hi,

I did some tests in PVE7 and Ceph 16.2 and I managed to reach my goal, which is to create 2 pools, one for NVMe disks and one for SSD disks. These are the steps:
  1. Install Ceph 16.2 on all nodes;
  2. Create 2 rules, one for NVMe and one for SSD (name rule for NVMe: nvme_replicated - name rule for SSD: ssd_replicated):
    1. ceph osd crush rule create-replicated nvme_replicated default host nvme
    2. ceph osd crush rule create-replicated ssd_replicated default host ssd
  3. Create 2 pools via GUI and bind one pool to the rule nvme_replicated and the other to ssd_replicated;
  4. Create, via GUI on each node, the OSDs and bind them to the class nmve for the nmve disks and to ssd for the ssd disks.
Bye
 
Osd type only show ssd,hdd and nvme.

How about if we want to combine hdd and ssd into 1 pool? Different osd type but create into 1 pool. How do we achieve that?
 
Osd type only show ssd,hdd and nvme.

How about if we want to combine hdd and ssd into 1 pool? Different osd type but create into 1 pool. How do we achieve that?
Please, don't do that. The different latency of the OSD will cause a huge increase in the amount of CPU and memory, and network load will increase significantly.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!