[SOLVED] Ceph Pools

Salzi

Active Member
May 4, 2017
8
2
43
Hi
I have a small cluster with 3 servers running Proxmox 5.1. Each Proxmox node is also used as CEPH node and each node has 2 OSDs installed.
There are two things which I don't understand:

1) Pools: How many pools should I setup. Should I create a pool per identical setting (size, min size, ...) - which would end in just one pool for me. Or should I create a pool for each type. Let's say one pool for VMs, one for containers, one for data,... Or should I create a pool for every VM and container? What is the advantage/disadvantage of having just one pool or multiple pools? Whats best practice here?

2) What I need is simple failure safety. The cluster should still operate when one server is offline. Therefore I would select a size of 2 and a min. size of 1 when creating a new pool in ceph. But from the ceph documentation:
Resilience: You can set how many OSD are allowed to fail without losing data. For replicated pools, it is the desired number of copies/replicas of an object. A typical configuration stores an object and one additional copy (i.e., size = 2), but you can determine the number of copies/replicas.

They are talking about OSD fails. I have two OSDs in each server. Is it possible that the data is stored on two osds but on the same node? That would mean that when that node is down, the cluster wouldn't be able to run anymore or will ceph handle that correctly by not placing the copy on the same node?

Thanks for help in advance
Salzi
 
IMHO two OSDs per node are a bit few, however it's working. CEPH in the crushmap have the failure domain (it's related to the rule), that usually is the host. So each PG is replicated <size> number of times across different hosts.
But I strongly advice you against size=2. I read a lot of problems on size=2, from a second disk failure during backfill of a failed disk to problems determining which PG copy is the correct one. size=3, min_size=2 is the way to go, if you need more space, add more disks in each host.

According to pools, I think you can do as you prefer, but just do the math to calc the right number of PGs to allocate, you can't reduce PG number once a pool is created, there is a calculator: http://ceph.com/pgcalc/
 
Last edited:
Thanks mbaldini for your reply.
You are right: It's not that easy to set the pg_num correctly. I found different information within the ceph documentation:

Concerning the formula here: docs.ceph.com/docs/jewel/rados/configuration/pool-pg-config-ref/ it should be
100*6/3 = 200
Concerning the recomentation here: docs.ceph.com/docs/master/rados/operations/placement-groups/ it should be 512
Concerning the calculator you recommended it should be 320

So the correct value is somewhere between 200 and 512 placement groups ...

PS: I'm not allowed to post links therefore I removed the http from the links ...
 
I'm not a CEPH expert, I run a little CEPH cluster not so different from yours (3 nodes, 3x1TB HDD + 1x250GB SSD in each node), what I have learnt is that too much PGs in each OSD will be bad for performance and data distribution.
So if you plan a CEPH cluster with 2 OSDs per node, 3 nodes, replica=3, you don't plan to increase the OSD number and you create only one pool for RBD, according to the calculator your pool should be created with 256 PG count.
 
Also be aware as you have only three nodes, your cluster will not recover on a node failure, it will stay in a degraded state as with a replica of 3 and a failure domain of host, it has no other server to put the PGs to. You cluster will still accept writes, as you have a min_size of two (as long as no OSD fails). After you have recovered your third node, your cluster will be able to heal again and get back to health ok. In your setup, assuming all disks are equal in size, you can only fill a disk with ~40% to be able to have a OSD failing and recovering to the remaining OSD in the server.

Your calculation is correct, the target PG count is 100 for a OSD, so you round your result to the nearest power of two (256). See the calculator for more insight into the PG calculation. http://ceph.com/pgcalc/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!