[SOLVED] Usable space on a Ceph cluster with LZ4 compression

lucaferr

Renowned Member
Jun 21, 2011
71
9
73
Hi! We have a 5 node Proxmox+Ceph cluster (we use the same nodes for computing and distributed storage). We have LZ4 compression enabled, which works pretty good (we're saving more than 16%). My ceph df detail looks like this:
ceph df detail.png

As you can see, I have 15 TB of physical storage (each of the 5 nodes has 3 x 1TB NVMe SSDs) and I'm using 62.38% of the physical space, with 3x replica. But in the pool I'm using 80.27% of storage (the difference is due to LZ4 compression I guess). So it says that I can only write about 872GB of other data (MAX AVAIL column) but it should be much more thanks to compression.
Will writes be denied when my pool will be at 100% usage, even if I do have physical space (global will be like 80%), or it's just an estimate and I'll be able to get to 120% pool usage?
I need this information to plan my OSDs upgrades...
Thank you very much!
 
Dont know if this is applicable, but I was running into ceph pool too full warnings for a long time even though I had plenty of space...

So what I ended up doing is to increase the size of the PGs for that pool, and I went from 80% of usage to about 35% in one night. I had turned on the compression right from the start (before the warnings) so it shouldnt have been that in my opinion but YMMV....

RAM usage is extensively higher since I done that BUT I have more available space now AND ceph feels much faster.
 
I have 512 PGs, which is the recommended value for 15 OSDs with a 3x replica, so I don't think that PGs number is the problem. Any other ideas?
 
Do you have more than one pool? How does your 'ceph osd df tree' look like?
 
Here's the output of 'ceph osd df tree':
The tree part on the end of the command, adds the crush hierarchy to the output. ;)

The PG distribution has a delta from 86 - 126 PG on the OSDs. This may reduce available disk space and performance. You could try with the 'ceph osd reweight-by-*' commands to get a better distribution. But this will redistribute data.

As an example, my test cluster has a more even distribution, hence VAR / STDDEV is way lower.
Code:
root@p5c02:~# ceph osd df tree
ID  CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE VAR  PGS TYPE NAME     
 -1       0.31189        -  319GiB 13.6GiB  305GiB 4.28 1.00   - root default   
 -3       0.06238        - 63.8GiB 2.68GiB 61.1GiB 4.19 0.98   -     host p5c01 
  1   hdd 0.03119  1.00000 31.9GiB 1.27GiB 30.6GiB 3.98 0.93  82         osd.1 
  2   hdd 0.03119  1.00000 31.9GiB 1.41GiB 30.5GiB 4.41 1.03  83         osd.2 
 -5       0.06238        - 63.8GiB 2.68GiB 61.1GiB 4.21 0.98   -     host p5c02 
  0   hdd 0.03119  1.00000 31.9GiB 1.31GiB 30.6GiB 4.10 0.96  94         osd.0 
  3   hdd 0.03119  1.00000 31.9GiB 1.37GiB 30.5GiB 4.31 1.01  79         osd.3 
 -7       0.06238        - 63.8GiB 2.73GiB 61.1GiB 4.28 1.00   -     host p5c03 
  4   hdd 0.03119  1.00000 31.9GiB 1.35GiB 30.5GiB 4.25 0.99  89         osd.4 
  5   hdd 0.03119  1.00000 31.9GiB 1.38GiB 30.5GiB 4.31 1.01  92         osd.5 
 -9       0.06238        - 63.8GiB 2.76GiB 61.0GiB 4.33 1.01   -     host p5c04 
  6   hdd 0.03119  1.00000 31.9GiB 1.43GiB 30.5GiB 4.49 1.05  99         osd.6 
  7   hdd 0.03119  1.00000 31.9GiB 1.33GiB 30.6GiB 4.17 0.98  78         osd.7 
-11       0.06238        - 63.8GiB 2.79GiB 61.0GiB 4.37 1.02   -     host p5c05 
  8   hdd 0.03119  1.00000 31.9GiB 1.32GiB 30.6GiB 4.15 0.97  85         osd.8 
  9   hdd 0.03119  1.00000 31.9GiB 1.46GiB 30.4GiB 4.59 1.07  83         osd.9 
                     TOTAL  319GiB 13.6GiB  305GiB 4.28                         
MIN/MAX VAR: 0.93/1.07  STDDEV: 0.18
 
Thank you very much Alwin, you were right: my data distribution was not balanced. I run
Code:
ceph osd reweight-by-utilization
preceded by
Code:
ceph osd test-reweight-by-utilization
just to make sure of what was about to happen. Ceph moved some data (not much, just a few gigabytes in a few minutes) and MAX AVAIL space has grown from 734G to 888G. Then I run it again lowering the threshold from 120% to 110% (command complete is ceph osd reweight-by-utilization 110 preceded by ceph osd test-reweight-by-utilization 110 for dry run) and now my MAX AVAIL is aroung 1000G :)
Probably everyone should run a "ceph osd test-reweight-by-utilization" followed by "ceph osd reweight-by-utilization" if everything sounds good once in a while (it increases both cluster capacity and cluster performance)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!