CEPH storage confusion

pawanshrestha

New Member
Apr 7, 2021
2
0
1
48
I did a round of search on this forum and elsewhere but could not convince myself. I have a four node CEPH cluster prepared by our IT team. The person who did the installation installed Proxmox in 3 nodes, did an installation from Proxmox GUI. After a few weeks he added the fourth node.

The version of ceph is nautilus:
Code:
$ ceph --version
ceph version 14.2.22 (877fa256043e4743620f4677e72dee5e738d1226) nautilus (stable)

My issue is the total size available to the ceph fs storage

Code:
$ceph df detail

RAW STORAGE:
    CLASS     SIZE        AVAIL       USED       RAW USED     %RAW USED
    hdd       240 TiB     145 TiB     95 TiB       95 TiB         39.50
    TOTAL     240 TiB     145 TiB     95 TiB       95 TiB         39.50

POOLS:
    POOL                ID     PGS     STORED     OBJECTS     USED       %USED     MAX AVAIL     QUOTA OBJECTS     QUOTA BYTES     DIRTY      USED COMPR     UNDER COMPR
    cephfs_data         11     128     45 TiB      62.51M     95 TiB     81.12        11 TiB     N/A               N/A             62.51M            0 B             0 B
    cephfs_metadata     12      32     26 GiB       7.56M     26 GiB      0.12        11 TiB     N/A               N/A              7.56M            0 B             0 B
As seen above, the AVAIL size is 145TB and used in 95TB. I cannot figure out why the %USED is 81.12 and MAX AVAIL 11TB

Few other details:
Code:
$ ceph osd pool ls detail

pool 11 'cephfs_data' replicated size 2 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 1340 flags hashpspool,nearfull stripe_width 0 application cephfs

pool 12 'cephfs_metadata' replicated size 2 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 1340 flags hashpspool,nearfull stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs

Code:
$ ceph osd df tree
ID CLASS WEIGHT    REWEIGHT SIZE    RAW USE DATA    OMAP     META    AVAIL   %USE  VAR  PGS STATUS TYPE NAME
-1       238.49013        - 240 TiB  95 TiB  95 TiB   65 GiB 256 GiB 145 TiB 39.49 1.00   -        root default
-3        65.49591        -  65 TiB  23 TiB  23 TiB   21 GiB  67 GiB  43 TiB 34.49 0.87   -            host ceph-node1
 0   hdd   5.45799  1.00000 5.5 TiB 1.8 TiB 1.8 TiB  2.1 GiB 5.3 GiB 3.6 TiB 33.34 0.84   7     up         osd.0
 1   hdd   5.45799  1.00000 5.5 TiB 1.1 TiB 1.1 TiB  977 MiB 3.3 GiB 4.4 TiB 19.99 0.51   4     up         osd.1
 2   hdd   5.45799  1.00000 5.5 TiB 2.9 TiB 2.9 TiB  2.0 GiB 8.2 GiB 2.5 TiB 53.46 1.35  10     up         osd.2
 3   hdd   5.45799  1.00000 5.5 TiB 1.8 TiB 1.8 TiB  2.3 GiB 5.5 GiB 3.6 TiB 33.41 0.85   7     up         osd.3
 4   hdd   5.45799  1.00000 5.5 TiB 1.1 TiB 1.1 TiB  275 MiB 3.1 GiB 4.4 TiB 19.97 0.51   3     up         osd.4
 5   hdd   5.45799  0.95001 5.5 TiB 2.9 TiB 2.9 TiB  2.1 GiB 8.0 GiB 2.5 TiB 53.46 1.35  10     up         osd.5
 6   hdd   5.45799  1.00000 5.5 TiB 2.9 TiB 2.9 TiB  4.2 GiB 8.2 GiB 2.5 TiB 53.36 1.35  12     up         osd.6
 7   hdd   5.45799  1.00000 5.5 TiB 1.8 TiB 1.8 TiB   52 KiB 5.6 GiB 3.6 TiB 33.44 0.85   5     up         osd.7
 8   hdd   5.45799  1.00000 5.5 TiB 1.1 TiB 1.1 TiB  2.0 GiB 4.1 GiB 4.4 TiB 20.04 0.51   5     up         osd.8
 9   hdd   5.45799  1.00000 5.5 TiB 1.5 TiB 1.4 TiB  979 MiB 4.4 GiB 4.0 TiB 26.66 0.68   5     up         osd.9
22   hdd   5.45799  1.00000 5.5 TiB 2.2 TiB 2.2 TiB  2.0 GiB 6.2 GiB 3.3 TiB 40.01 1.01   8     up         osd.22
23   hdd   5.45799  1.00000 5.5 TiB 1.5 TiB 1.5 TiB  2.0 GiB 5.1 GiB 4.0 TiB 26.72 0.68   6     up         osd.23
-5        76.40039        -  76 TiB  33 TiB  33 TiB   19 GiB  84 GiB  44 TiB 42.97 1.09   -            host ceph-node2
10   hdd  12.73340  0.90002  13 TiB 6.9 TiB 6.9 TiB  2.0 GiB  17 GiB 5.8 TiB 54.38 1.38  21     up         osd.10
11   hdd  12.73340  1.00000  13 TiB 4.7 TiB 4.7 TiB  2.1 GiB  12 GiB 8.1 TiB 36.65 0.93  14     up         osd.11
12   hdd  12.73340  1.00000  13 TiB 5.8 TiB 5.8 TiB  2.9 GiB  15 GiB 6.9 TiB 45.72 1.16  19     up         osd.12
13   hdd  12.73340  1.00000  13 TiB 5.8 TiB 5.8 TiB  3.1 GiB  15 GiB 6.9 TiB 45.78 1.16  19     up         osd.13
14   hdd  12.73340  1.00000  13 TiB 5.2 TiB 5.1 TiB  5.0 GiB  14 GiB 7.6 TiB 40.50 1.03  19     up         osd.14
15   hdd  12.73340  1.00000  13 TiB 4.4 TiB 4.4 TiB  4.1 GiB  12 GiB 8.3 TiB 34.77 0.88  16     up         osd.15
-7        76.40039        -  76 TiB  28 TiB  28 TiB   20 GiB  72 GiB  49 TiB 36.23 0.92   -            host ceph-node3
16   hdd  12.73340  1.00000  13 TiB 5.1 TiB 5.1 TiB  5.1 GiB  13 GiB 7.6 TiB 40.12 1.02  19     up         osd.16
17   hdd  12.73340  1.00000  13 TiB 4.4 TiB 4.4 TiB  2.0 GiB  11 GiB 8.4 TiB 34.35 0.87  14     up         osd.17
18   hdd  12.73340  1.00000  13 TiB 5.8 TiB 5.8 TiB  4.0 GiB  15 GiB 6.9 TiB 45.66 1.16  20     up         osd.18
19   hdd  12.73340  1.00000  13 TiB 4.4 TiB 4.4 TiB  1.1 GiB  11 GiB 8.3 TiB 34.82 0.88  13     up         osd.19
20   hdd  12.73340  1.00000  13 TiB 3.2 TiB 3.2 TiB  3.2 GiB 8.9 GiB 9.5 TiB 25.23 0.64  11     up         osd.20
21   hdd  12.73340  1.00000  13 TiB 4.7 TiB 4.7 TiB  4.1 GiB  12 GiB 8.0 TiB 37.21 0.94  17     up         osd.21
-9        20.19344        -  22 TiB  12 TiB  12 TiB  5.9 GiB  33 GiB  10 TiB 53.69 1.36   -            host ceph-node4
24   hdd   3.63869  1.00000 3.6 TiB 1.1 TiB 1.1 TiB  993 MiB 4.0 GiB 2.5 TiB 30.13 0.76   4     up         osd.24
25   hdd   3.63869  1.00000 3.6 TiB 1.4 TiB 1.4 TiB  2.0 GiB 4.3 GiB 2.3 TiB 37.39 0.95   5     up         osd.25
26   hdd   3.63869  0.95001 3.6 TiB 2.1 TiB 2.1 TiB 1012 MiB 5.8 GiB 1.5 TiB 57.50 1.46   6     up         osd.26
27   hdd   3.63869  0.90002 3.6 TiB 2.4 TiB 2.4 TiB   72 KiB 6.2 GiB 1.2 TiB 67.09 1.70   6     up         osd.27
28   hdd   3.63869  1.00000 3.6 TiB 1.5 TiB 1.5 TiB 1004 MiB 4.2 GiB 2.2 TiB 40.11 1.02   5     up         osd.28
29   hdd   2.00000  0.90002 3.6 TiB 3.3 TiB 3.3 TiB 1018 MiB 8.5 GiB 375 GiB 89.93 2.28  10     up         osd.29
                      TOTAL 240 TiB  95 TiB  95 TiB   65 GiB 256 GiB 145 TiB 39.49
MIN/MAX VAR: 0.51/2.28  STDDEV: 14.39

Code:
$ ceph -s
  cluster:
    id:     07a39b3f-0cf1-40bd-aef2-cd8b48a29aa7
    health: HEALTH_WARN
            1 nearfull osd(s)
            2 pool(s) nearfull

  services:
    mon: 4 daemons, quorum ceph-node1,ceph-node2,ceph-node3,ceph-node4 (age 8w)
    mgr: ceph-node1(active, since 2M), standbys: ceph-node2
    mds: cephfs:1 {0=ceph-node1=up:active} 1 up:standby
    osd: 30 osds: 30 up (since 8w), 30 in (since 8w); 9 remapped pgs

  data:
    pools:   2 pools, 160 pgs
    objects: 70.07M objects, 44 TiB
    usage:   95 TiB used, 145 TiB / 240 TiB avail
    pgs:     1679962/140137540 objects misplaced (1.199%)
             150 active+clean
             9   active+remapped+backfilling
             1   active+clean+scrubbing+deep

  io:
    client:   4.5 MiB/s rd, 9.8 MiB/s wr, 2 op/s rd, 49 op/s wr
    recovery: 34 MiB/s, 90 keys/s, 58 objects/s

Code:
$ rados df
POOL_NAME         USED  OBJECTS CLONES    COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED   RD_OPS      RD    WR_OPS      WR USED COMPR UNDER COMPR
cephfs_data     95 TiB 62512320      0 125024640                  0       0        0 24253675  21 TiB 266863440  98 TiB        0 B         0 B
cephfs_metadata 26 GiB  7556410      0  15112820                  0       0        0 88173716 329 GiB 120358798 737 GiB        0 B         0 B

total_objects    70068730
total_used       95 TiB
total_avail      145 TiB
total_space      240 TiB

Any help is highly appreciated.
 
Last edited:
<snip>
As seen above, the AVAIL size is 145TB and used in 95TB. I cannot figure out why the %USED is 81.12 and MAX AVAIL 11TB
<snip>

pool 11 'cephfs_data' replicated size 2 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 1340 flags hashpspool,nearfull stripe_width 0 application cephfs

pool 12 'cephfs_metadata' replicated size 2 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 1340 flags hashpspool,nearfull stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
[/CODE]

Code:
$ ceph osd df tree
ID CLASS WEIGHT    REWEIGHT SIZE    RAW USE DATA    OMAP     META    AVAIL   %USE  VAR  PGS STATUS TYPE NAME
<snip>
29   hdd   2.00000  0.90002 3.6 TiB 3.3 TiB 3.3 TiB 1018 MiB 8.5 GiB 375 GiB 89.93 2.28  10     up         osd.29
                      TOTAL 240 TiB  95 TiB  95 TiB   65 GiB 256 GiB 145 TiB 39.49
MIN/MAX VAR: 0.51/2.28  STDDEV: 14.39

<snip>

Any help is highly appreciated.
The calculations are not well documented except in the source code, IMO. AFAIK, the following are true:

%USED ~= STORED / (STORED + MAX AVAIL)

MAX AVAIL is calculated with respect to the OSD full ratio (95%), most full OSD in pool (%), WEIGHT, total weight of set of OSDs in pool and replication factor of pool. As seen above, OSD 29 is 89.93% full!

Also, your replication factor is 2 -- that is a bad idea if you care about your data. See Surviving a Ceph cluster outage: the hard way
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!