[SOLVED] Ceph missing more that 50% of storage capacity

lifeboy

Renowned Member
I have 3 nodes with 2 x 1TB HDD and 2 x 256G SSD's each.

I have the following configuration:

1 SSD is used as system drive (LVM partitioned so bout a third is used for the system partition and the rest is used in 2 partitions for the 2 x HDD's WALs.

The 2 x HDD are in a pool (the default "replicated_rule")
The remaining SSD is in a pool "fast" which has the following rule:

Code:
rule fast {
   id 1
   type replicated
   min_size 1
   max_size 10
   step take default class ssd
   step chooseleaf first n 0 type host
   step emit
}

Devices in the crush table are as follows:

Code:
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd

Furthermore:

Code:
osd pool default min size = 2
osd pool default size = 3

So according the calculation rules there should be 256GB storage available in that pool.
But in reality I see:

upload_2019-1-31_19-21-43.png

The space is really much less than I should have.

the "fast" pool should have 256GB, but it doesn't even have 100GB ??

Can someone help me make sense of this please?
 
I just wondered: I have specified that the "fast" rule must use class SSD drives. However, since I don't use class HDD for the "replicate_rule" rule, is the system possibly using the SSD drives for that pool as well? Surely that wouldn't be the case, since it would cause all sorts of problems to have an OSD in more than one pool...
 
I just wondered: I have specified that the "fast" rule must use class SSD drives. However, since I don't use class HDD for the "replicate_rule" rule, is the system possibly using the SSD drives for that pool as well? Surely that wouldn't be the case, since it would cause all sorts of problems to have an OSD in more than one pool...

Ceph does use ALL OSDs for any pool that does not have a drive type limitation. Your theory is likely valid.
Also, FYI the Total column is the amount of storage data being used. Not the total availability of the pool.
 
Ceph does use ALL OSDs for any pool that does not have a drive type limitation. Your theory is likely valid.

So if I change the crush table to limit rule 0 to HDD types, would that fix itself automatically?

Also, FYI the Total column is the amount of storage data being used. Not the total availability of the pool.

I realize that, but from the % and usage data is can be seen that the total storage is less than 100GB.
 
So if I change the crush table to limit rule 0 to HDD types, would that fix itself automatically?



I realize that, but from the % and usage data is can be seen that the total storage is less than 100GB.
I think it will require a completely new pool that you migrate over too, however I am not 100% positive.
 
I think it will require a completely new pool that you migrate over too, however I am not 100% positive.

Hmmm... easier said than done. I think what may work is the following:

  1. Move content of "fast" pool to the default pool.
  2. Remove the "fast" pool
  3. Remove the SSD OSD's.
    - Wait for everything to settle and the cluster to be up to date after OSD removals
  4. Change the default pool to explicitly only use HDD type OSD's
  5. Add the SSD OSD's again.
  6. Create a new pool with SSD's only

Problem should be solved.

Is there anything wrong with this plan / What am I missing / Comments?

thanks again

Roland
 
Hmmm... easier said than done. I think what may work is the following:

  1. Move content of "fast" pool to the default pool.
  2. Remove the "fast" pool
  3. Remove the SSD OSD's.
    - Wait for everything to settle and the cluster to be up to date after OSD removals
  4. Change the default pool to explicitly only use HDD type OSD's
  5. Add the SSD OSD's again.
  6. Create a new pool with SSD's only

Problem should be solved.

Is there anything wrong with this plan / What am I missing / Comments?

thanks again

Roland
Please clarify I understand your problem correctly: the 'general' pool is storing information on the SSDs?

This information is dependent upon my understanding above being correct. Just create a new temporary pool and empty everything off of the current problematic pool. Then delete the problem pool and recreate it with the correct device limitation, i.e. HDD only. To my knowledge, it is not possible to change the parameters of a currently established pool, i.e. 'default pool'.
 
Please clarify I understand your problem correctly: the 'general' pool is storing information on the SSDs?
I don't know that, I can only assume that it does, since my SSD pool only has about 100GB total size, but should be around 256GB in size.

This information is dependent upon my understanding above being correct. Just create a new temporary pool and empty everything off of the current problematic pool. Then delete the problem pool and recreate it with the correct device limitation, i.e. HDD only. To my knowledge, it is not possible to change the parameters of a currently established pool, i.e. 'default pool'.

I'll try that, but may run out of space. I'll report back once I have.
 
Last edited:
  • Like
Reactions: elmacus

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!