Ceph Mismatch in OSD size

vivekdelhi · Mar 13, 2020

Dear All
I have a running proxmox cluster with three nodes. One hard drive of each node is part of a CEPH pool.
I configured a HP blade with Smart Array p244br RAID contoller. The Raid controller was put in HBA mode and then Proxmox was installed using ZFS0 using one hard drive (600GB). The proxmox installation was then updated and joined the main cluster. It has a second 600GB (/dev/sdb) HDD as well
Then i ran

Code:

ceph-volume lvm zap /dev/sdb --destroy

followed by

Code:

# pveceph osd create /dev/sdb

# ceph osd crush tree --show-shadow
ID  CLASS WEIGHT  TYPE NAME                           
 -2   hdd 1.63507 root default~hdd                    
 -4   hdd 0.27249     host dell0104blade01~hdd        
  0   hdd 0.27249         osd.0                       
 -8   hdd 0.27280     host dell0104blade02~hdd        
  2   hdd 0.27280         osd.2                       
 -6   hdd 0.54489     host dell0104blade10~hdd        
  1   hdd 0.54489         osd.1                       
-10   hdd 0.54489     host hp0105blade07duplicate~hdd 
  3   hdd 0.54489         osd.3                       
 -1       1.63507 root default                        
 -3       0.27249     host dell0104blade01            
  0   hdd 0.27249         osd.0                       
 -7       0.27280     host dell0104blade02            
  2   hdd 0.27280         osd.2                       
 -5       0.54489     host dell0104blade10            
  1   hdd 0.54489         osd.1                       
 -9       0.54489     host hp0105blade07duplicate     
  3   hdd 0.54489         osd.3       
                
# ceph -s
  cluster:
    id:     09fc106c-d4cf-4edc-867f-db170301f857
    health: HEALTH_OK
 
  services:
    mon: 2 daemons, quorum dell0104blade01,dell0104blade10 (age 12d)
    mgr: dell0104blade01(active, since 12d), standbys: dell0104blade02, dell0104blade10
    osd: 4 osds: 4 up (since 27m), 4 in (since 27m); 100 remapped pgs
 
  data:
    pools:   1 pools, 128 pgs
    objects: 72.14k objects, 276 GiB
    usage:   696 GiB used, 978 GiB / 1.6 TiB avail
    pgs:     56346/216423 objects misplaced (26.035%)
             100 active+remapped+backfill_wait
             27  active+clean
             1   active+remapped+backfilling
 
  io:
    client:   50 KiB/s wr, 0 op/s rd, 2 op/s wr
    recovery: 15 MiB/s, 4 objects/s

There is one pool - cephpool1
The Pool has four hard drive OSDs: two of size 300GB each and 2 of size 600GB each - 1800GB total

Now in the web GUI, under
Ceph > Summary, i can see Health OK, with total size 1.64 TB
Ceph > OSD, i can see the OSDs of size 558 GB, 558GB, 279.3GB and 279GB
However when I click on the cephpool1 of each host, I am getting conflicting sizes

Code:

Hostname              Size in Ceph>OSD        Size in  cephpool1(hostname) > Summary
hp0105blade07                 558GB                             774.14GB
dell0104blade01              279GB                            312 GB
dell0104blade02             279.3GB                        312 GB
dell0104blade10              558GB                           312GB

Can any of the experienced members help me understand the issue and should I be worried about this discrepancy ?

the Crushmap is listed below

Code:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host dell0104blade01 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 0.272
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 0.272
}
host dell0104blade10 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 0.545
    alg straw2
    hash 0    # rjenkins1
    item osd.1 weight 0.545
}
host dell0104blade02 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 0.273
    alg straw2
    hash 0    # rjenkins1
    item osd.2 weight 0.273
}
host hp0105blade07duplicate {
    id -9        # do not change unnecessarily
    id -10 class hdd        # do not change unnecessarily
    # weight 0.545
    alg straw2
    hash 0    # rjenkins1
    item osd.3 weight 0.545
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 1.635
    alg straw2
    hash 0    # rjenkins1
    item dell0104blade01 weight 0.272
    item dell0104blade10 weight 0.545
    item dell0104blade02 weight 0.273
    item hp0105blade07duplicate weight 0.545
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

Alwin · Mar 13, 2020

Please also post a ceph osd df tree and ceph df.

vivekdelhi · Mar 14, 2020

Thanks for your reply. Here is the relevant information

Code:

# ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS TYPE NAME                     
-1       1.63507        - 1.6 TiB 697 GiB 693 GiB  87 MiB  4.0 GiB 977 GiB 41.65 1.00   -        root default                 
-3       0.27249        - 279 GiB 138 GiB 137 GiB 406 KiB  1.1 GiB 141 GiB 49.50 1.19   -            host dell0104blade01     
0   hdd 0.27249  1.00000 279 GiB 138 GiB 137 GiB 406 KiB  1.1 GiB 141 GiB 49.50 1.19  76     up         osd.0                 
-7       0.27280        - 279 GiB 162 GiB 161 GiB  85 MiB  939 MiB 118 GiB 57.82 1.39   -            host dell0104blade02     
2   hdd 0.27280  1.00000 279 GiB 162 GiB 161 GiB  85 MiB  939 MiB 118 GiB 57.82 1.39  89     up         osd.2                 
-5       0.54489        - 558 GiB 194 GiB 193 GiB 641 KiB 1023 MiB 364 GiB 34.71 0.83   -            host dell0104blade10     
1   hdd 0.54489  1.00000 558 GiB 194 GiB 193 GiB 641 KiB 1023 MiB 364 GiB 34.71 0.83 107     up         osd.1                 
-9       0.54489        - 558 GiB 204 GiB 203 GiB 684 KiB 1023 MiB 354 GiB 36.56 0.88   -            host hp0105blade07duplicate
3   hdd 0.54489  1.00000 558 GiB 204 GiB 203 GiB 684 KiB 1023 MiB 354 GiB 36.56 0.88 112     up         osd.3                 
                    TOTAL 1.6 TiB 697 GiB 693 GiB  87 MiB  4.0 GiB 977 GiB 41.65                                               
MIN/MAX VAR: 0.83/1.39  STDDEV: 9.96

# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       1.6 TiB     977 GiB     693 GiB      697 GiB         41.65
    TOTAL     1.6 TiB     977 GiB     693 GiB      697 GiB         41.65

POOLS:
    POOL          ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
    cephpool1      2     230 GiB      72.26k     693 GiB     52.67       208 GiB

# ceph -s
  cluster:
    id:     09fc106c-d4cf-4edc-867f-db170301f857
    health: HEALTH_OK

  services:
    mon: 2 daemons, quorum dell0104blade01,dell0104blade10 (age 12d)
    mgr: dell0104blade01(active, since 12d), standbys: dell0104blade02, dell0104blade10
    osd: 4 osds: 4 up (since 14h), 4 in (since 14h)

  data:
    pools:   1 pools, 128 pgs
    objects: 72.26k objects, 276 GiB
    usage:   697 GiB used, 977 GiB / 1.6 TiB avail
    pgs:     128 active+clean

  io:
    client:   120 KiB/s wr, 0 op/s rd, 14 op/s wr

I think that was an issue with re-balancing since I had just added a new node. The sizes now visible are

Now the size mismatch has changed

Code:

Hostname              Size in Ceph>OSD        Size in  cephpool1(hostname) > Summary
hp0105blade07              558 GB                        900 GB
dell0104blade01            279 GB                        437 GB
dell0104blade02            279 GB                        437 GB
dell0104blade10            558 GB                        437 GB

However, there is still that discrepancy that I would like to better understand

Alwin · Mar 16, 2020

vivekdelhi said:
I think that was an issue with re-balancing since I had just added a new node. The sizes now visible are

Ceph will show a "worst case" size. The size is not fixed, it greatly depends on the distribution and fill level of the OSDs.

vivekdelhi · Apr 13, 2020

Thanks

Search

Search

Ceph Mismatch in OSD size

vivekdelhi

New Member

Alwin

Proxmox Retired Staff

vivekdelhi

New Member

Alwin

Proxmox Retired Staff

vivekdelhi

New Member

We value your privacy