Ceph: actual used space?

godzilla · Sep 26, 2022

Hi,

I'm running a Proxmox 7.2-7 cluster with Ceph 16.2.9 "Pacific".

I can't tell the difference between Ceph > Usage and Ceph > Pools > Used (see screenshots).

Can someone please explain what's the actual space used in my Ceph storage? Do you think that 90% used pool is potentially dangerous or normal? Consider that I have now 74TB of raw disks with the OSDs but I'm going to expand it further in the next few days. In fact, the pool is rebalancing as I'm replacing some OSDs with bigger ones.

Thank you!

mira · Sep 26, 2022

How is the usage of each OSD? ceph osd df tree

godzilla · Sep 26, 2022

Hi @mira , thanks for your reply. Here's the df tree. You can safely ignore osd.8 as I just replaced the 1T drive with a 2T one.

Drives are all 2T, except osd.0 to osd.3 and osd.9 which are 1T and waiting to be replaced.

I can add another node with empty disks in 2-3 days if needed. What do you recommend?

Code:

# ceph osd df tree
ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         68.22739         -   68 TiB   51 TiB   51 TiB  597 MiB  135 GiB   17 TiB  75.46  1.00    -          root default
 -3          3.63879         -  3.6 TiB  2.7 TiB  2.7 TiB  4.7 MiB  9.0 GiB  968 GiB  74.02  0.98    -              host proxnode01
  0    ssd   0.90970   0.95000  932 GiB  692 GiB  690 GiB  1.2 MiB  2.2 GiB  239 GiB  74.31  0.98   20      up          osd.0
  1    ssd   0.90970   1.00000  932 GiB  645 GiB  643 GiB  1.0 MiB  2.2 GiB  286 GiB  69.29  0.92   18      up          osd.1
  2    ssd   0.90970   0.95000  932 GiB  763 GiB  761 GiB  1.3 MiB  2.5 GiB  168 GiB  81.92  1.09   22      up          osd.2
  3    ssd   0.90970   1.00000  932 GiB  657 GiB  655 GiB  1.3 MiB  2.2 GiB  274 GiB  70.54  0.93   19      up          osd.3
 -5          7.27759         -  7.3 TiB  5.6 TiB  5.6 TiB  179 MiB   14 GiB  1.7 TiB  77.29  1.02    -              host proxnode02
  4    ssd   1.81940   0.89000  1.8 TiB  1.4 TiB  1.3 TiB  2.4 MiB  3.4 GiB  478 GiB  74.32  0.98   40      up          osd.4
  5    ssd   1.81940   0.95000  1.8 TiB  1.5 TiB  1.5 TiB  2.6 MiB  3.5 GiB  335 GiB  82.03  1.09   44      up          osd.5
  6    ssd   1.81940   1.00000  1.8 TiB  1.3 TiB  1.3 TiB  171 MiB  3.1 GiB  547 GiB  70.65  0.94   38      up          osd.6
  7    ssd   1.81940   0.95000  1.8 TiB  1.5 TiB  1.5 TiB  2.6 MiB  3.6 GiB  333 GiB  82.15  1.09   44      up          osd.7
 -7          6.36789         -  6.4 TiB  3.7 TiB  3.7 TiB  176 MiB  9.1 GiB  2.7 TiB  58.07  0.77    -              host proxnode03
  8    ssd   1.81940   1.00000  1.8 TiB   49 GiB   49 GiB      0 B  241 MiB  1.8 TiB   2.62  0.03    1      up          osd.8
  9    ssd   0.90970   1.00000  932 GiB  746 GiB  744 GiB  1.4 MiB  2.2 GiB  186 GiB  80.07  1.06   21      up          osd.9
 10    ssd   1.81940   0.95000  1.8 TiB  1.5 TiB  1.5 TiB  3.1 MiB  4.0 GiB  299 GiB  83.95  1.11   45      up          osd.10
 11    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  171 MiB  2.7 GiB  436 GiB  76.62  1.02   42      up          osd.11
 -9          7.27759         -  7.3 TiB  5.6 TiB  5.6 TiB  179 MiB   15 GiB  1.7 TiB  77.22  1.02    -              host proxnode04
 12    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB  2.5 MiB  3.8 GiB  340 GiB  81.72  1.08   44      up          osd.12
 13    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  171 MiB  3.5 GiB  475 GiB  74.51  0.99   41      up          osd.13
 14    ssd   1.81940   0.95000  1.8 TiB  1.4 TiB  1.4 TiB  2.4 MiB  3.9 GiB  404 GiB  78.32  1.04   42      up          osd.14
 15    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.3 TiB  2.5 MiB  3.8 GiB  478 GiB  74.33  0.99   40      up          osd.15
-11          7.27759         -  7.3 TiB  5.6 TiB  5.6 TiB  9.8 MiB   14 GiB  1.7 TiB  77.06  1.02    -              host proxnode05
 16    ssd   1.81940   1.00000  1.8 TiB  1.3 TiB  1.3 TiB  2.3 MiB  3.3 GiB  547 GiB  70.62  0.94   38      up          osd.16
 17    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  2.5 MiB  3.4 GiB  410 GiB  78.01  1.03   42      up          osd.17
 18    ssd   1.81940   0.95000  1.8 TiB  1.5 TiB  1.4 TiB  2.5 MiB  3.8 GiB  375 GiB  79.86  1.06   43      up          osd.18
 19    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.4 TiB  2.6 MiB  3.5 GiB  377 GiB  79.76  1.06   43      up          osd.19
-13          7.27759         -  7.3 TiB  5.6 TiB  5.5 TiB  9.5 MiB   15 GiB  1.7 TiB  76.27  1.01    -              host proxnode06
 20    ssd   1.81940   1.00000  1.8 TiB  1.3 TiB  1.3 TiB  2.3 MiB  3.5 GiB  511 GiB  72.59  0.96   39      up          osd.20
 21    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB  2.6 MiB  4.1 GiB  337 GiB  81.91  1.09   44      up          osd.21
 22    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB  2.5 MiB  3.9 GiB  372 GiB  80.05  1.06   43      up          osd.22
 23    ssd   1.81940   1.00000  1.8 TiB  1.3 TiB  1.3 TiB  2.2 MiB  3.6 GiB  549 GiB  70.52  0.93   38      up          osd.23
-15          7.27759         -  7.3 TiB  5.6 TiB  5.6 TiB  9.9 MiB   15 GiB  1.7 TiB  77.12  1.02    -              host proxnode07
 24    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  2.5 MiB  4.0 GiB  411 GiB  77.92  1.03   42      up          osd.24
 25    ssd   1.81940   1.00000  1.8 TiB  1.3 TiB  1.3 TiB  2.3 MiB  3.5 GiB  511 GiB  72.58  0.96   39      up          osd.25
 26    ssd   1.81940   0.95000  1.8 TiB  1.5 TiB  1.4 TiB  2.6 MiB  4.0 GiB  374 GiB  79.90  1.06   43      up          osd.26
 27    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  2.5 MiB  3.5 GiB  408 GiB  78.09  1.03   42      up          osd.27
-17          7.27759         -  7.3 TiB  6.0 TiB  6.0 TiB   10 MiB   15 GiB  1.3 TiB  82.78  1.10    -              host proxnode08
 28    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB  2.6 MiB  3.7 GiB  308 GiB  83.46  1.11   44      up          osd.28
 29    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB  2.6 MiB  3.7 GiB  337 GiB  81.93  1.09   44      up          osd.29
 30    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB  2.6 MiB  3.6 GiB  370 GiB  80.14  1.06   43      up          osd.30
 31    ssd   1.81940   1.00000  1.8 TiB  1.6 TiB  1.6 TiB  2.7 MiB  3.7 GiB  268 GiB  85.62  1.13   46      up          osd.31
-19          7.27759         -  7.3 TiB  5.5 TiB  5.5 TiB  9.6 MiB   15 GiB  1.8 TiB  75.46  1.00    -              host proxnode09
 32    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  2.5 MiB  3.6 GiB  444 GiB  76.19  1.01   41      up          osd.32
 33    ssd   1.81940   1.00000  1.8 TiB  1.5 TiB  1.4 TiB  2.5 MiB  3.9 GiB  378 GiB  79.73  1.06   43      up          osd.33
 34    ssd   1.81940   1.00000  1.8 TiB  1.3 TiB  1.3 TiB  2.2 MiB  3.6 GiB  530 GiB  71.56  0.95   38      up          osd.34
 35    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.3 TiB  2.3 MiB  3.6 GiB  478 GiB  74.34  0.99   40      up          osd.35
-21          7.27759         -  7.3 TiB  5.6 TiB  5.5 TiB  9.6 MiB   15 GiB  1.7 TiB  76.37  1.01    -              host proxnode10
 36    ssd   1.81940   0.95000  1.8 TiB  1.5 TiB  1.5 TiB  2.5 MiB  3.9 GiB  337 GiB  81.90  1.09   44      up          osd.36
 37    ssd   1.81940   1.00000  1.8 TiB  1.3 TiB  1.3 TiB  2.2 MiB  3.4 GiB  579 GiB  68.92  0.91   37      up          osd.37
 38    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  2.4 MiB  3.8 GiB  438 GiB  76.48  1.01   41      up          osd.38
 39    ssd   1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB  2.4 MiB  3.8 GiB  406 GiB  78.18  1.04   42      up          osd.39
                         TOTAL   68 TiB   51 TiB   51 TiB  597 MiB  135 GiB   17 TiB  75.46
MIN/MAX VAR: 0.03/1.13  STDDEV: 12.55

alexskysilk · Sep 26, 2022

The listed cluster is pretty close to effectively full (clearly you're aware of this or there wouldn't be any reweights in place- just bear in mind that with only 4 OSDs in a node reweights are of very limited use since there isnt much option where to move pgs.) The risks of running with a full cluster is an uncontrolled node failure can trip your cluster full ratio limit.

What is your data growth rate? that's the first order of business before making any recommendations. The second order is the amount of CPU/RAM each node has available to OSDs; with only 4 OSDs/node the core/ram requirements per node are pretty modest (4 Cores/16G) and if you're not utilizing the rest of the resources for something productive it would probably be advisable to replace those nodes with more dense ones. (eg, 24 drive chassis instead of 4.) The chassis cost is not likely to make a big difference but would be much more flexible to dealing with your dilemma.

godzilla · Sep 27, 2022

Hi @alexskysilk ! Thanks for your advice. The growth rate is higher than usual lately because I'm moving VMs from the old virtualization system to Proxmox. It should stabilize soon.
Anyway, after the rebalance from osd.8 completed the used percentage dropped to 87%. Now I replaced osd.9 too and I expect it to drop even more.
I have spare room for more drives in the new nodes, so I'll try to fill them up with 2T drives and see what happens.

Thanks again!

mira · Sep 27, 2022

Is the balancer module enabled? Does it run from time to time?

godzilla · Sep 27, 2022

mira said:
Is the balancer module enabled? Does it run from time to time?

yes, autorebalance and autofill are active and I always see some backfilling running form time to time.

also, I see that the optimal number of PGs in the pool has doubled (1024 from the current 512) so I think it should autoscale the PGs sometime soon, shouldn't it?

alexskysilk · Sep 27, 2022

mira said:
Is the balancer module enabled? Does it run from time to time?

it aint gonna do much of anything with 4 OSDs/node, all pushing 80% utilization

godzilla said:
I have spare room for more drives in the new nodes,

you REALLY want to add more drives sooner then later. one per node is a good start.

godzilla · Sep 28, 2022

alexskysilk said:
it aint gonna do much of anything with 4 OSDs/node, all pushing 80% utilization

you REALLY want to add more drives sooner then later. one per node is a good start.

I'd be happy to do it any time, the problem is for some odd reason the nodes don't see new hotplugged disk, no matter what I do

So I'm forced to restart.

They're all HPE DL360gen9/gen10. Any advice on that?

Anyway no big issue: I'll start restarting (no pun intended) as soon as possibile.

alexskysilk · Sep 28, 2022

godzilla said:
They're all HPE DL360gen9/gen10. Any advice on that?

assuming you're using the included controller it would be a pxx0 (probably p840.)

First order of business is to install ssacli. you can find instructions here: https://gist.github.com/mrpeardotnet/a9ce41da99936c0175600f484fa20d03
(just remember to replace whatever version of debian is referenced with bullseye for proxmox 7)

IF your controller is set to raid mode, that would explain the behavior and you'll need to manually mark each new drive as JBOD (unmanaged) before it would be available for use- you can do so with ssacli. HOWEVER, next time you reboot you probably should to into the controller setup and change it to HBA mode so you wont have to do that any more.

Search

Search

Ceph: actual used space?

godzilla

Member

Attachments

mira

Proxmox Staff Member

godzilla

Member

alexskysilk

Distinguished Member

godzilla

Member

mira

Proxmox Staff Member

godzilla

Member

alexskysilk

Distinguished Member

godzilla

Member

alexskysilk

Distinguished Member