[SOLVED] Ceph health_warn

Nenes71

Member
Jun 27, 2022
15
2
8
Hello
Yesterday, I replace 4x1To disk with 4x2To (1 replacement per node) after 24H rebalancing. ceph seem stuck to 99.81% and always in warning
Can you help me to resolve errors.

#------------------
# ceph status
#------------------
cluster:
id: e7fc1497-5889-4aba-abc7-e0e1115d70ef
health: HEALTH_WARN
1 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull
Degraded data redundancy: 5554/1885612 objects degraded (0.295%), 5 pgs degraded, 5 pgs undersized
2 pool(s) nearfull

services:
mon: 4 daemons, quorum pve-01,pve-02,pve-03,pve-04 (age 22h)
mgr: pve-01(active, since 22h), standbys: pve-02, pve-04, pve-03
osd: 23 osds: 23 up (since 21h), 23 in (since 21h); 8 remapped pgs

data:
pools: 2 pools, 385 pgs
objects: 471.40k objects, 1.8 TiB
usage: 6.9 TiB used, 9.6 TiB / 16 TiB avail
pgs: 5554/1885612 objects degraded (0.295%)
4454/1885612 objects misplaced (0.236%)
377 active+clean
5 active+undersized+degraded+remapped+backfill_toofull
2 active+remapped+backfilling
1 active+remapped+backfill_wait

io:
client: 783 KiB/s wr, 0 op/s rd, 51 op/s wr
recovery: 124 MiB/s, 32 objects/s


#------------------
ceph osd df tree
#------------------
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 11.69846 - 16 TiB 6.9 TiB 6.8 TiB 167 MiB 23 GiB 9.6 TiB 41.53 1.00 - root default
-3 2.73254 - 4.1 TiB 1.7 TiB 1.7 TiB 53 MiB 4.7 GiB 2.4 TiB 41.58 1.00 - host pve-01
1 ssd 0.23219 1.00000 238 GiB 124 GiB 124 GiB 2.9 MiB 293 MiB 113 GiB 52.28 1.26 24 up osd.1
2 ssd 0.23219 0.90002 238 GiB 185 GiB 185 GiB 8.7 MiB 403 MiB 53 GiB 77.84 1.87 38 up osd.2
3 ssd 0.46519 1.00000 476 GiB 261 GiB 260 GiB 35 KiB 780 MiB 216 GiB 54.69 1.32 55 up osd.3
4 ssd 0.46509 1.00000 476 GiB 277 GiB 276 GiB 41 MiB 736 MiB 200 GiB 58.08 1.40 67 up osd.4
5 ssd 0.46509 1.00000 931 GiB 300 GiB 299 GiB 67 KiB 1.1 GiB 631 GiB 32.27 0.78 69 up osd.5
19 ssd 0.87279 1.00000 1.8 TiB 609 GiB 607 GiB 36 KiB 1.4 GiB 1.2 TiB 32.67 0.79 132 up osd.19
-5 2.53595 - 4.1 TiB 1.7 TiB 1.7 TiB 44 MiB 5.7 GiB 2.4 TiB 41.68 1.00 - host pve-02
6 ssd 0.23219 1.00000 1.8 TiB 138 GiB 137 GiB 6 KiB 1.1 GiB 1.7 TiB 7.42 0.18 30 up osd.6
7 ssd 0.23219 1.00000 238 GiB 173 GiB 172 GiB 5.1 MiB 541 MiB 65 GiB 72.71 1.75 43 up osd.7
8 ssd 0.23219 0.95001 238 GiB 171 GiB 170 GiB 2.5 MiB 509 MiB 67 GiB 71.85 1.73 40 up osd.8
9 ssd 0.46509 1.00000 476 GiB 344 GiB 343 GiB 5.0 MiB 695 MiB 132 GiB 72.24 1.74 69 up osd.9
10 ssd 0.46509 1.00000 476 GiB 298 GiB 298 GiB 7.6 MiB 736 MiB 178 GiB 62.63 1.51 61 up osd.10
18 ssd 0.90919 1.00000 931 GiB 635 GiB 633 GiB 23 MiB 2.2 GiB 296 GiB 68.21 1.64 142 up osd.18
-7 2.09248 - 4.1 TiB 1.7 TiB 1.7 TiB 47 MiB 5.3 GiB 2.4 TiB 41.69 1.00 - host pve-03
12 ssd 0.23230 1.00000 1.8 TiB 185 GiB 184 GiB 7 KiB 1.3 GiB 1.6 TiB 9.93 0.24 43 up osd.12
13 ssd 0.23230 1.00000 238 GiB 180 GiB 180 GiB 3.6 MiB 370 MiB 57 GiB 75.88 1.83 39 up osd.13
14 ssd 0.23230 0.95001 238 GiB 180 GiB 179 GiB 3.8 MiB 644 MiB 58 GiB 75.68 1.82 34 up osd.14
15 ssd 0.46519 0.95001 476 GiB 372 GiB 371 GiB 29 MiB 678 MiB 104 GiB 78.11 1.88 77 up osd.15
16 ssd 0.46519 0.90002 476 GiB 411 GiB 410 GiB 11 MiB 877 MiB 65 GiB 86.26 2.08 92 up osd.16
17 ssd 0.46519 1.00000 931 GiB 432 GiB 430 GiB 54 KiB 1.6 GiB 499 GiB 46.38 1.12 100 up osd.17
-13 4.33748 - 4.1 TiB 1.7 TiB 1.7 TiB 23 MiB 7.1 GiB 2.4 TiB 41.17 0.99 - host pve-04
21 ssd 0.23289 1.00000 477 GiB 152 GiB 151 GiB 0 B 938 MiB 325 GiB 31.77 0.76 30 up osd.21
22 ssd 0.46579 1.00000 477 GiB 206 GiB 205 GiB 42 KiB 1.2 GiB 271 GiB 43.23 1.04 48 up osd.22
23 ssd 0.90970 0.95001 477 GiB 432 GiB 431 GiB 73 KiB 1.3 GiB 45 GiB 90.55 2.18 91 up osd.23
24 ssd 0.90970 1.00000 932 GiB 383 GiB 381 GiB 23 MiB 1.7 GiB 549 GiB 41.10 0.99 80 up osd.24
25 ssd 1.81940 1.00000 1.8 TiB 567 GiB 565 GiB 36 KiB 2.0 GiB 1.3 TiB 30.44 0.73 133 up osd.25
TOTAL 16 TiB 6.9 TiB 6.8 TiB 167 MiB 23 GiB 9.6 TiB 41.53
MIN/MAX VAR: 0.18/2.18 STDDEV: 26.53

#------------------
ceph osd pool get pool_vm size
#------------------
size: 4
#------------------
ceph osd pool get pool_vm min_size
#------------------
size: 2

thank you in advance
 
Last edited:
#------------------
# ceph status
#------------------
cluster:
id: e7fc1497-5889-4aba-abc7-e0e1115d70ef
health: HEALTH_WARN
1 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull
Degraded data redundancy: 5554/1885612 objects degraded (0.295%), 5 pgs degraded, 5 pgs undersized
2 pool(s) nearfull

services:
mon: 4 daemons, quorum pve-01,pve-02,pve-03,pve-04 (age 22h)
mgr: pve-01(active, since 22h), standbys: pve-02, pve-04, pve-03
osd: 23 osds: 23 up (since 21h), 23 in (since 21h); 8 remapped pgs

Its because you have low space so backfilling wont work ("Low space hindering backfill")

5 active+undersized+degraded+remapped+backfill_toofull

backfill too full --> check your osds.
#------------------
ceph osd df tree
#------------------
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
16 ssd 0.46519 0.90002 476 GiB 411 GiB 410 GiB 11 MiB 877 MiB 65 GiB 86.26 2.08 92 up osd.16
23 ssd 0.90970 0.95001 477 GiB 432 GiB 431 GiB 73 KiB 1.3 GiB 45 GiB 90.55 2.18 91 up osd.23

Those are osds that are critical. You should never reach 90% with a osd in proxmox ceph, because your pool wont work anymore. You might try reweighting osd 23 to a lower value, see https://swamireddy.wordpress.com/2016/06/17/ceph-diff-ceph-osd-reweight-ceph-osd-crush-reweight/
Sounds like this could help you - but havent tried to be honest.

"(For instance, if one of your OSDs is at 90% and the others are at 50%, you could reduce this weight to try and compensate for it."

Additional information: If you will ever loose your 2TB disk on one of your nodes you will be in big trouble, because cluster will break again because osds reach 90% because of backfilling.
 
Last edited:
My problem change a litle after osd.23 out then osd.23 in see my ceph status bellow.
I don't understand why my pools is near full after adding 4To (replace 4 x1To drive by 4 x 2To, one disk per node)
I'm affraid to reweighting my osd like jsterr said.
I post my crush map and my global configuration and attach my ceph osd list
What can i do securely ?
I really need yours help
thanks by advance


-- ceph status
Code:
  cluster:
    id:     e7fc1497-5889-4aba-abc7-e0e1115d70ef
    health: HEALTH_WARN
            1 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull
            2 pool(s) nearfull

  services:
    mon: 4 daemons, quorum pve-01,pve-02,pve-03,pve-04 (age 46h)
    mgr: pve-01(active, since 46h), standbys: pve-02, pve-04, pve-03
    osd: 23 osds: 23 up (since 46h), 23 in (since 15h); 1 remapped pgs

  data:
    pools:   2 pools, 513 pgs
    objects: 471.38k objects, 1.8 TiB
    usage:   6.9 TiB used, 9.6 TiB / 16 TiB avail
    pgs:     2769/1885536 objects misplaced (0.147%)
             512 active+clean
             1   active+remapped+backfill_toofull

  io:
    client:   340 KiB/s wr, 0 op/s rd, 33 op/s wr

  progress:
    Global Recovery Event (23h)
      [===========================.] (remaining: 2m)

-- Crush map

Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
device 15 osd.15 class ssd
device 16 osd.16 class ssd
device 17 osd.17 class ssd
device 18 osd.18 class ssd
device 19 osd.19 class ssd
device 21 osd.21 class ssd
device 22 osd.22 class ssd
device 23 osd.23 class ssd
device 24 osd.24 class ssd
device 25 osd.25 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pve-01 {
    id -3        # do not change unnecessarily
    id -9 class ssd        # do not change unnecessarily
    # weight 2.733
    alg straw2
    hash 0    # rjenkins1
    item osd.1 weight 0.232
    item osd.2 weight 0.232
    item osd.4 weight 0.465
    item osd.5 weight 0.465
    item osd.19 weight 0.873
    item osd.3 weight 0.465
}
host pve-02 {
    id -5        # do not change unnecessarily
    id -10 class ssd        # do not change unnecessarily
    # weight 2.536
    alg straw2
    hash 0    # rjenkins1
    item osd.6 weight 0.232
    item osd.7 weight 0.232
    item osd.8 weight 0.232
    item osd.9 weight 0.465
    item osd.10 weight 0.465
    item osd.18 weight 0.909
}
host pve-03 {
    id -7        # do not change unnecessarily
    id -11 class ssd        # do not change unnecessarily
    # weight 2.092
    alg straw2
    hash 0    # rjenkins1
    item osd.12 weight 0.232
    item osd.13 weight 0.232
    item osd.14 weight 0.232
    item osd.15 weight 0.465
    item osd.16 weight 0.465
    item osd.17 weight 0.465
}
host pve-04 {
    id -13        # do not change unnecessarily
    id -15 class ssd        # do not change unnecessarily
    # weight 4.337
    alg straw2
    hash 0    # rjenkins1
    item osd.21 weight 0.233
    item osd.22 weight 0.466
    item osd.23 weight 0.910
    item osd.24 weight 0.910
    item osd.25 weight 1.819
}
root default {
    id -1        # do not change unnecessarily
    id -12 class ssd        # do not change unnecessarily
    # weight 11.698
    alg straw2
    hash 0    # rjenkins1
    item pve-01 weight 2.733
    item pve-02 weight 2.536
    item pve-03 weight 2.092
    item pve-04 weight 4.337
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

-- global configuration
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.0.0.0/24
     fsid = e7fc1497-5889-4aba-abc7-e0e1115d70ef
     mon_allow_pool_delete = true
     mon_host = 10.0.0.1 10.0.0.2 10.0.0.3 10.0.0.4
     ms_bind_ipv4 = true
     osd_journal_size = 5120
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.0.0.0/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve-01-ceph]
     host = pve-01-ceph
     mds_standby_for_name = pve

[mds.pve-02]
     host = pve-02
     mds_standby_for_name = pve

[mds.pve-03]
     host = pve-03
     mds_standby_for_name = pve

[mds.pve-04]
     host = pve-04
     mds standby for name = pve

[mon.pve-01-ceph]
     cluster_addr = 10.0.0.1
     public_addr = 10.0.0.1

[mon.pve-02]
     cluster_addr = 10.0.0.2
     public_addr = 10.0.0.2

[mon.pve-03]
     cluster_addr = 10.0.0.3
     public_addr = 10.0.0.3

[mon.pve-04]
     cluster_addr = 10.0.0.4
     public_addr = 10.0.0.4
 

Attachments

  • ceph.png
    ceph.png
    163.4 KB · Views: 25
Disclaimer: I have absolutely no knowledge of and experience with Ceph; so take this with caution and get it confirmed for sure from someone with knowledge!

Simply from my logical understanding, it seems to me, that the weight of the OSDs: 5, 6, 12, 17, 19, 21, 23 (confirm those for yourself!) is wrong for their corresponding sizes.
But how you do change this the proper way or if my assumption is even right at all, I do not know, sorry.
 
Disclaimer: I have absolutely no knowledge of and experience with Ceph; so take this with caution and get it confirmed for sure from someone with knowledge!

Simply from my logical understanding, it seems to me, that the weight of the OSDs: 5, 6, 12, 17, 19, 21, 23 (confirm those for yourself!) is wrong for their corresponding sizes.
But how you do change this the proper way or if my assumption is even right at all, I do not know, sorry.

You cant go higher then override weight to 1.0 so its the opposite all other osds need a lower reweight.
 
You cant go higher then override weight to 1.0 so its the opposite all other osds need a lower reweight.

My assumption was, since weight is in terabytes what I quickly read, that the weight has to be slightly lower as the corresponding disk size, which is true for all but those OSDs I mentioned.
Is my assumption wrong? And/Or can one not (let) modify the weight manually in some way? The only way is per reweight?
 
Thank for yours replys
If i understands yours replys,
i must put reweight to :
2 for my 2To disk (osd 19, 6, 12, 25)
1 for my 1To disk (osd 5, 18, 17, 24)
0,5 for my 512Mo disk (osd 3, 4, 10, 9, 15, 16, 21, 22, 23)
0,25 for my 256Mo disk (osd 1, 2, 7, 8, 13, 14)

And i do this with
#ceph osd crush reweight osd.2 2
for example
 
Thank for yours replys
If i understands yours replys,
i must put reweight to :
2 for my 2To disk (osd 19, 6, 12, 25)
1 for my 1To disk (osd 5, 18, 17, 24)
0,5 for my 512Mo disk (osd 3, 4, 10, 9, 15, 16, 21, 22, 23)
0,25 for my 256Mo disk (osd 1, 2, 7, 8, 13, 14)

And i do this with
#ceph osd crush reweight osd.2 2
for example

As mentioned here: https://swamireddy.wordpress.com/2016/06/17/ceph-diff-ceph-osd-reweight-ceph-osd-crush-reweight/ you cant go higher then 1.0. To fix your cluster I would try to just change the reweight of osd.14 (the one that has 90% usage) with the following command:

ceph osd crush reweight osd.14 0.25
Then watch the rebalancing and wait for ceph to rebalance.
 
Sorry I read this doc, but not correctly... as mentionned in this doc, i understand the value is in To for ceph osd crush reweight
> ceph osd crush reweight

sets the CRUSH weight of the OSD. This weight is an arbitrary value (generally
the size of the disk in TB or something) and controls how much data the
system tries to allocate to the OSD.

I do
ceph osd crush reweight osd.14 0.25

and now the status is
Code:
cluster:
    id:     e7fc1497-5889-4aba-abc7-e0e1115d70ef
    health: HEALTH_WARN
            Low space hindering backfill (add storage if this doesn't resolve itself): 9 pgs backfill_toofull

  services:
    mon: 4 daemons, quorum pve-01,pve-02,pve-03,pve-04 (age 2d)
    mgr: pve-01(active, since 2d), standbys: pve-02, pve-04, pve-03
    osd: 23 osds: 23 up (since 2d), 23 in (since 18h); 102 remapped pgs

  data:
    pools:   2 pools, 513 pgs
    objects: 471.46k objects, 1.8 TiB
    usage:   6.9 TiB used, 9.6 TiB / 16 TiB avail
    pgs:     106972/1885844 objects misplaced (5.672%)
             411 active+clean
             92  active+remapped+backfill_wait
             9   active+remapped+backfill_wait+backfill_toofull
             1   active+remapped+backfilling

  io:
    client:   0 B/s rd, 262 KiB/s wr, 0 op/s rd, 30 op/s wr
    recovery: 21 MiB/s, 5 objects/s

  progress:
    Global Recovery Event (2h)
      [======================......] (remaining: 6h)

I'm wating the end of rebalancing ...

if maximun value is 1 for ceph osd crush reweight

rewrite 1 for my 2To disk (osd 19, 6, 12, 25)
rewrite 0.5 for my 1To disk (osd 5, 18, 17, 24)
rewrite 0.25 for my 512Mo disk (osd 3, 4, 10, 9, 15, 16, 21, 22, 23)
rewrite 0.125 for my 256Mo disk (osd 1, 2, 7, 8, 13, 14)

this values seem you correct ?

And can you explain me why usage: 6.9 TiB used, 9.6 TiB / 16 TiB avail in ceph status
and proxmox said my pool is 6.86 TiB (89,23%) used ?
 
Last edited:
Rebalancing done
Status is OK

Code:
  cluster:
    id:     e7fc1497-5889-4aba-abc7-e0e1115d70ef
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum pve-01,pve-02,pve-03,pve-04 (age 2d)
    mgr: pve-01(active, since 2d), standbys: pve-02, pve-04, pve-03
    osd: 23 osds: 23 up (since 2d), 23 in (since 21h)

  data:
    pools:   2 pools, 513 pgs
    objects: 471.50k objects, 1.8 TiB
    usage:   6.9 TiB used, 9.6 TiB / 16 TiB avail
    pgs:     513 active+clean

  io:
    client:   22 KiB/s rd, 516 KiB/s wr, 0 op/s rd, 45 op/s wr

do you think I apply this value ?
rewrite 1 for my 2To disk (osd 19, 6, 12, 25)
rewrite 0.5 for my 1To disk (osd 5, 18, 17, 24)
rewrite 0.25 for my 512Mo disk (osd 3, 4, 10, 9, 15, 16, 21, 22, 23)
rewrite 0.125 for my 256Mo disk (osd 1, 2, 7, 8, 13, 14)

Code:
ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         12.25096         -   16 TiB  6.9 TiB  6.8 TiB  155 MiB   27 GiB  9.6 TiB  41.64  1.00    -          root default
 -3          2.73254         -  4.1 TiB  1.7 TiB  1.7 TiB   47 MiB  6.3 GiB  2.4 TiB  41.64  1.00    -              host pve-01
  1    ssd   0.23219   1.00000  238 GiB  156 GiB  155 GiB  1.2 MiB  593 MiB   82 GiB  65.59  1.58   45      up          osd.1
  2    ssd   0.23219   0.90002  238 GiB  141 GiB  140 GiB  7.8 MiB  482 MiB   97 GiB  59.15  1.42   41      up          osd.2
  3    ssd   0.46519   1.00000  476 GiB  284 GiB  282 GiB   10 KiB  1.5 GiB  193 GiB  59.58  1.43   83      up          osd.3
  4    ssd   0.46509   1.00000  476 GiB  313 GiB  313 GiB   38 MiB  643 MiB  163 GiB  65.79  1.58   92      up          osd.4
  5    ssd   0.46509   1.00000  931 GiB  284 GiB  282 GiB   46 KiB  1.3 GiB  647 GiB  30.47  0.73   83      up          osd.5
 19    ssd   0.87279   1.00000  1.8 TiB  580 GiB  578 GiB   22 KiB  1.8 GiB  1.3 TiB  31.16  0.75  169      up          osd.19
 -5          2.53595         -  4.1 TiB  1.7 TiB  1.7 TiB   36 MiB  5.7 GiB  2.4 TiB  41.69  1.00    -              host pve-02
  6    ssd   0.23219   1.00000  1.8 TiB  157 GiB  157 GiB   11 KiB  558 MiB  1.7 TiB   8.45  0.20   46      up          osd.6
  7    ssd   0.23219   1.00000  238 GiB  185 GiB  185 GiB  1.5 MiB  570 MiB   52 GiB  78.01  1.87   54      up          osd.7
  8    ssd   0.23219   0.95001  238 GiB  175 GiB  174 GiB  1.9 MiB  621 MiB   63 GiB  73.59  1.77   51      up          osd.8
  9    ssd   0.46509   1.00000  476 GiB  300 GiB  300 GiB  3.8 MiB  742 MiB  176 GiB  63.04  1.51   87      up          osd.9
 10    ssd   0.46509   1.00000  476 GiB  322 GiB  321 GiB  5.1 MiB  904 MiB  154 GiB  67.56  1.62   93      up          osd.10
 18    ssd   0.90919   1.00000  931 GiB  620 GiB  618 GiB   24 MiB  2.4 GiB  311 GiB  66.59  1.60  182      up          osd.18
 -7          2.64499         -  4.1 TiB  1.7 TiB  1.7 TiB   48 MiB  6.3 GiB  2.4 TiB  41.68  1.00    -              host pve-03
 12    ssd   0.23230   1.00000  1.8 TiB  173 GiB  171 GiB   12 KiB  1.4 GiB  1.7 TiB   9.28  0.22   50      up          osd.12
 13    ssd   0.23230   1.00000  238 GiB  134 GiB  134 GiB  6.7 MiB  479 MiB  104 GiB  56.32  1.35   39      up          osd.13
 14    ssd   0.25000   0.95001  238 GiB  187 GiB  187 GiB  4.1 MiB  564 MiB   50 GiB  78.82  1.89   54      up          osd.14
 15    ssd   0.46519   0.95001  476 GiB  296 GiB  295 GiB   29 MiB  831 MiB  180 GiB  62.12  1.49   87      up          osd.15
 16    ssd   0.46519   0.90002  476 GiB  251 GiB  250 GiB  8.5 MiB  1.0 GiB  225 GiB  52.75  1.27   73      up          osd.16
 17    ssd   1.00000   1.00000  931 GiB  718 GiB  716 GiB   58 KiB  2.1 GiB  213 GiB  77.15  1.85  210      up          osd.17
-13          4.33748         -  4.1 TiB  1.7 TiB  1.7 TiB   24 MiB  8.6 GiB  2.4 TiB  41.54  1.00    -              host pve-04
 21    ssd   0.23289   1.00000  477 GiB  107 GiB  106 GiB    8 KiB  1.1 GiB  370 GiB  22.51  0.54   31      up          osd.21
 22    ssd   0.46579   1.00000  477 GiB  197 GiB  195 GiB   54 KiB  1.4 GiB  280 GiB  41.27  0.99   57      up          osd.22
 23    ssd   0.90970   0.95001  477 GiB  391 GiB  389 GiB   50 KiB  1.6 GiB   86 GiB  81.90  1.97  114      up          osd.23
 24    ssd   0.90970   1.00000  932 GiB  355 GiB  353 GiB   23 MiB  1.9 GiB  576 GiB  38.14  0.92  105      up          osd.24
 25    ssd   1.81940   1.00000  1.8 TiB  705 GiB  702 GiB   25 KiB  2.7 GiB  1.1 TiB  37.84  0.91  206      up          osd.25
                         TOTAL   16 TiB  6.9 TiB  6.8 TiB  155 MiB   27 GiB  9.6 TiB  41.64
MIN/MAX VAR: 0.20/1.97  STDDEV: 24.37

I follow this post
https://forum.proxmox.com/threads/ceph-storage-fully-although-storage-is-still-available.119050/
but I don't understand why usage: 6.9 TiB used, 9.6 TiB / 16 TiB avail in ceph status
and proxmox said my pool is 6.86 TiB (89,23%) used ?
 
Rebalancing done
Status is OK
Nice, you should keep all disks below 85 ... If one reaches 90 % again youll have the same problem again. Its also bad that your mixing that much hdd sizes. Imagine losing osd.19 you can be sure that it will fill up the other disks, because if it fails all its data will be created on the remaining disks on pve-01.

do you think I apply this value ?
rewrite 1 for my 2To disk (osd 19, 6, 12, 25)
rewrite 0.5 for my 1To disk (osd 5, 18, 17, 24)
rewrite 0.25 for my 512Mo disk (osd 3, 4, 10, 9, 15, 16, 21, 22, 23)
rewrite 0.125 for my 256Mo disk (osd 1, 2, 7, 8, 13, 14)
Yeah that could work out, but do it one after another. And it is still a bad idea to mix hdd sizes (see my comment above)

Code:
ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         12.25096         -   16 TiB  6.9 TiB  6.8 TiB  155 MiB   27 GiB  9.6 TiB  41.64  1.00    -          root default
 -3          2.73254         -  4.1 TiB  1.7 TiB  1.7 TiB   47 MiB  6.3 GiB  2.4 TiB  41.64  1.00    -              host pve-01
  1    ssd   0.23219   1.00000  238 GiB  156 GiB  155 GiB  1.2 MiB  593 MiB   82 GiB  65.59  1.58   45      up          osd.1
  2    ssd   0.23219   0.90002  238 GiB  141 GiB  140 GiB  7.8 MiB  482 MiB   97 GiB  59.15  1.42   41      up          osd.2
  3    ssd   0.46519   1.00000  476 GiB  284 GiB  282 GiB   10 KiB  1.5 GiB  193 GiB  59.58  1.43   83      up          osd.3
  4    ssd   0.46509   1.00000  476 GiB  313 GiB  313 GiB   38 MiB  643 MiB  163 GiB  65.79  1.58   92      up          osd.4
  5    ssd   0.46509   1.00000  931 GiB  284 GiB  282 GiB   46 KiB  1.3 GiB  647 GiB  30.47  0.73   83      up          osd.5
 19    ssd   0.87279   1.00000  1.8 TiB  580 GiB  578 GiB   22 KiB  1.8 GiB  1.3 TiB  31.16  0.75  169      up          osd.19
 -5          2.53595         -  4.1 TiB  1.7 TiB  1.7 TiB   36 MiB  5.7 GiB  2.4 TiB  41.69  1.00    -              host pve-02
  6    ssd   0.23219   1.00000  1.8 TiB  157 GiB  157 GiB   11 KiB  558 MiB  1.7 TiB   8.45  0.20   46      up          osd.6
  7    ssd   0.23219   1.00000  238 GiB  185 GiB  185 GiB  1.5 MiB  570 MiB   52 GiB  78.01  1.87   54      up          osd.7
  8    ssd   0.23219   0.95001  238 GiB  175 GiB  174 GiB  1.9 MiB  621 MiB   63 GiB  73.59  1.77   51      up          osd.8
  9    ssd   0.46509   1.00000  476 GiB  300 GiB  300 GiB  3.8 MiB  742 MiB  176 GiB  63.04  1.51   87      up          osd.9
 10    ssd   0.46509   1.00000  476 GiB  322 GiB  321 GiB  5.1 MiB  904 MiB  154 GiB  67.56  1.62   93      up          osd.10
 18    ssd   0.90919   1.00000  931 GiB  620 GiB  618 GiB   24 MiB  2.4 GiB  311 GiB  66.59  1.60  182      up          osd.18
 -7          2.64499         -  4.1 TiB  1.7 TiB  1.7 TiB   48 MiB  6.3 GiB  2.4 TiB  41.68  1.00    -              host pve-03
 12    ssd   0.23230   1.00000  1.8 TiB  173 GiB  171 GiB   12 KiB  1.4 GiB  1.7 TiB   9.28  0.22   50      up          osd.12
 13    ssd   0.23230   1.00000  238 GiB  134 GiB  134 GiB  6.7 MiB  479 MiB  104 GiB  56.32  1.35   39      up          osd.13
 14    ssd   0.25000   0.95001  238 GiB  187 GiB  187 GiB  4.1 MiB  564 MiB   50 GiB  78.82  1.89   54      up          osd.14
 15    ssd   0.46519   0.95001  476 GiB  296 GiB  295 GiB   29 MiB  831 MiB  180 GiB  62.12  1.49   87      up          osd.15
 16    ssd   0.46519   0.90002  476 GiB  251 GiB  250 GiB  8.5 MiB  1.0 GiB  225 GiB  52.75  1.27   73      up          osd.16
 17    ssd   1.00000   1.00000  931 GiB  718 GiB  716 GiB   58 KiB  2.1 GiB  213 GiB  77.15  1.85  210      up          osd.17
-13          4.33748         -  4.1 TiB  1.7 TiB  1.7 TiB   24 MiB  8.6 GiB  2.4 TiB  41.54  1.00    -              host pve-04
 21    ssd   0.23289   1.00000  477 GiB  107 GiB  106 GiB    8 KiB  1.1 GiB  370 GiB  22.51  0.54   31      up          osd.21
 22    ssd   0.46579   1.00000  477 GiB  197 GiB  195 GiB   54 KiB  1.4 GiB  280 GiB  41.27  0.99   57      up          osd.22
 23    ssd   0.90970   0.95001  477 GiB  391 GiB  389 GiB   50 KiB  1.6 GiB   86 GiB  81.90  1.97  114      up          osd.23
 24    ssd   0.90970   1.00000  932 GiB  355 GiB  353 GiB   23 MiB  1.9 GiB  576 GiB  38.14  0.92  105      up          osd.24
 25    ssd   1.81940   1.00000  1.8 TiB  705 GiB  702 GiB   25 KiB  2.7 GiB  1.1 TiB  37.84  0.91  206      up          osd.25
                         TOTAL   16 TiB  6.9 TiB  6.8 TiB  155 MiB   27 GiB  9.6 TiB  41.64
MIN/MAX VAR: 0.20/1.97  STDDEV: 24.37

I follow this post
https://forum.proxmox.com/threads/ceph-storage-fully-although-storage-is-still-available.119050/
but I don't understand why usage: 6.9 TiB used, 9.6 TiB / 16 TiB avail in ceph status
and proxmox said my pool is 6.86 TiB (89,23%) used ?

Pool usage is based on the osd with the highest usage in % I guess.
 
Thanks a lot

No problem. But keep in mind: this is not a persistent setting! For permanent setting you should check ceph osd crush reweight https://ceph.io/en/news/blog/2014/difference-between-ceph-osd-reweight-and-ceph-osd-crush-reweight/

This 'ceph osd reweight' is not a persistent setting. When an OSD gets marked out, the osd weight will be set to 0. When it gets marked in again, the weight will be changed to 1. Because of this 'ceph osd reweight' is a temporary solution.
You should only use it to keep your cluster running while you're ordering
more hardware.

Edit: And to make it clear your setup is still very very unsecure (see comment below froim alexskysilk)
 
Last edited:
Your pool in inherently unbalanced, as one node has double the OSD capacity as the other 3. As long as that remains the case, you will continue to run into OSDs getting full on the 3 nodes with lower capacity, rendering the extra 2TB of raw storage on node 4 effectively unusable- if you force it to take more capacity by forced reweight, you'll create a spof (single point of failure) where the loss of node 4 will leave a cluster without adequate space to rebalance. I would suggest moving OSDs around to make a more even distribution.
 
Your pool in inherently unbalanced, as one node has double the OSD capacity as the other 3. As long as that remains the case, you will continue to run into OSDs getting full on the 3 nodes with lower capacity, rendering the extra 2TB of raw storage on node 4 effectively unusable- if you force it to take more capacity by forced reweight, you'll create a spof (single point of failure) where the loss of node 4 will leave a cluster without adequate space to rebalance. I would suggest moving OSDs around to make a more even distribution.
Hello
I don't understand your answer, my 4 nodes have 4.1 TiB see my ceph osd tree

node 1 with 2 x 256 Mo + 2 x 512 Mo + 1 x 1 To + 1 x 2 To
node 2 with 2 x 256 Mo + 2 x 512 Mo + 1 x 1 To + 1 x 2 To
node 3 with 2 x 256 Mo + 2 x 512 Mo + 1 x 1 To + 1 x 2 To
node 4 with 3 x 512 Mo + 1 x 1 To + 1 x 2 To

can you give me an exemple for this I would suggest moving OSDs around to make a more even distribution
thanks

Code:
ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP      META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME     
 -1         11.59126         -   16 TiB  6.9 TiB  6.8 TiB   158 MiB   23 GiB  9.6 TiB  41.60  1.00    -          root default   
 -3          2.73254         -  4.1 TiB  1.7 TiB  1.7 TiB    46 MiB  5.2 GiB  2.4 TiB  41.59  1.00    -              host pve-01
  1    ssd   0.23219   1.00000  238 GiB  152 GiB  152 GiB   1.5 MiB  554 MiB   85 GiB  64.07  1.54   44      up          osd.1 
  2    ssd   0.23219   0.90002  238 GiB  144 GiB  144 GiB   4.7 MiB  427 MiB   94 GiB  60.56  1.46   42      up          osd.2 
  3    ssd   0.46519   1.00000  476 GiB  287 GiB  286 GiB    14 KiB  685 MiB  190 GiB  60.18  1.45   84      up          osd.3 
  4    ssd   0.46509   1.00000  476 GiB  296 GiB  296 GiB    40 MiB  657 MiB  180 GiB  62.19  1.49   87      up          osd.4 
  5    ssd   0.46509   1.00000  931 GiB  297 GiB  296 GiB    63 KiB  1.5 GiB  634 GiB  31.93  0.77   87      up          osd.5 
 19    ssd   0.87279   1.00000  1.8 TiB  579 GiB  578 GiB    26 KiB  1.4 GiB  1.3 TiB  31.11  0.75  169      up          osd.19
 -5          2.53595         -  4.1 TiB  1.7 TiB  1.7 TiB    41 MiB  5.4 GiB  2.4 TiB  41.67  1.00    -              host pve-02
  6    ssd   0.23219   1.00000  1.8 TiB  144 GiB  143 GiB    11 KiB  601 MiB  1.7 TiB   7.71  0.19   42      up          osd.6 
  7    ssd   0.23219   1.00000  238 GiB  178 GiB  178 GiB   2.7 MiB  493 MiB   59 GiB  75.04  1.80   52      up          osd.7 
  8    ssd   0.23219   0.95001  238 GiB  189 GiB  188 GiB  1005 KiB  515 MiB   49 GiB  79.40  1.91   55      up          osd.8 
  9    ssd   0.46509   1.00000  476 GiB  314 GiB  313 GiB   5.7 MiB  776 MiB  163 GiB  65.83  1.58   91      up          osd.9 
 10    ssd   0.46509   1.00000  476 GiB  319 GiB  318 GiB   8.1 MiB  719 MiB  158 GiB  66.92  1.61   92      up          osd.10
 18    ssd   0.90919   1.00000  931 GiB  616 GiB  614 GiB    24 MiB  2.3 GiB  315 GiB  66.17  1.59  181      up          osd.18
 -7          2.64499         -  4.1 TiB  1.7 TiB  1.7 TiB    47 MiB  5.1 GiB  2.4 TiB  41.64  1.00    -              host pve-03
 12    ssd   0.23230   1.00000  1.8 TiB  162 GiB  161 GiB    12 KiB  370 MiB  1.7 TiB   8.67  0.21   47      up          osd.12
 13    ssd   0.23230   1.00000  238 GiB  145 GiB  144 GiB   4.3 MiB  436 MiB   93 GiB  60.82  1.46   42      up          osd.13
 14    ssd   0.25000   0.95001  238 GiB  194 GiB  194 GiB   2.6 MiB  637 MiB   44 GiB  81.61  1.96   56      up          osd.14
 15    ssd   0.46519   0.95001  476 GiB  289 GiB  288 GiB    31 MiB  863 MiB  187 GiB  60.73  1.46   85      up          osd.15
 16    ssd   0.46519   0.90002  476 GiB  265 GiB  264 GiB   8.5 MiB  739 MiB  212 GiB  55.53  1.33   77      up          osd.16
 17    ssd   1.00000   1.00000  931 GiB  704 GiB  702 GiB    38 KiB  2.2 GiB  227 GiB  75.62  1.82  206      up          osd.17
-13          3.67778         -  4.1 TiB  1.7 TiB  1.7 TiB    24 MiB  7.2 GiB  2.4 TiB  41.50  1.00    -              host pve-04
 21    ssd   0.23289   1.00000  477 GiB  124 GiB  124 GiB    15 KiB  654 MiB  353 GiB  26.06  0.63   36      up          osd.21
 22    ssd   0.46579   1.00000  477 GiB  259 GiB  257 GiB    58 KiB  1.8 GiB  218 GiB  54.26  1.30   75      up          osd.22
 23    ssd   0.25000   0.95001  477 GiB  127 GiB  126 GiB    70 KiB  1.5 GiB  350 GiB  26.70  0.64   37      up          osd.23
 24    ssd   0.90970   1.00000  932 GiB  426 GiB  425 GiB    24 MiB  1.1 GiB  505 GiB  45.73  1.10  126      up          osd.24
 25    ssd   1.81940   1.00000  1.8 TiB  817 GiB  815 GiB    61 KiB  2.2 GiB  1.0 TiB  43.85  1.05  239      up          osd.25
                         TOTAL   16 TiB  6.9 TiB  6.8 TiB   158 MiB   23 GiB  9.6 TiB  41.60                                   
MIN/MAX VAR: 0.19/1.96  STDDEV: 23.56
 
  • Like
Reactions: jsterr
The weighting of your hosts are the relevant factors here. Here is the list of hosts extracted from your tree:

-3 2.73254 - 4.1 TiB 1.7 TiB 1.7 TiB 46 MiB 5.2 GiB 2.4 TiB 41.59 1.00 - host pve-01
-5 2.53595 - 4.1 TiB 1.7 TiB 1.7 TiB 41 MiB 5.4 GiB 2.4 TiB 41.67 1.00 - host pve-02
-7 2.64499 - 4.1 TiB 1.7 TiB 1.7 TiB 47 MiB 5.1 GiB 2.4 TiB 41.64 1.00 - host pve-03
-13 3.67778 - 4.1 TiB 1.7 TiB 1.7 TiB 24 MiB 7.2 GiB 2.4 TiB 41.50 1.00 - host pve-04


But there are other issues at play, namely:
19 ssd 0.87279 1.00000 1.8 TiB 579 GiB 578 GiB 26 KiB 1.4 GiB 1.3 TiB 31.11 0.75 169 up osd.19

Why is this osd weighted so drastically low? Other drives are also not weighted consistently (eg, 17,23, probably more but I'm getting lazy reading ;) Looking at these it may be that its not drive distribution thats the problem but someone really messed with drive weightings.

As to how to rebalance the disks, its actually not that bad now that you made me look at it closer :) I think the issue is just the weighting above. when you're attempting to (re)balance your cluster, you change the reweight value and not your osd weight.
 
  • Like
Reactions: jsterr
The solution :
ceph osd set-require-min-compat-client luminous

ceph osd crush reweight osd.X 1.81940 # for all 2To disk
ceph osd crush reweight osd.X 0.90919 # for all 1To disk
ceph osd crush reweight osd.X 0.46519 # for all 512Mo disk
ceph osd crush reweight osd.X 0.23230 # for all 256Mo disk

and now all work fine
# ceph status

Code:
cluster:
    id:     e7fc1497-5889-4aba-abc7-e0e1115d70ef
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum pve-01,pve-02,pve-03,pve-04 (age 7d)
    mgr: pve-01(active, since 7d), standbys: pve-02, pve-04, pve-03
    osd: 23 osds: 23 up (since 15h), 23 in (since 15h)

  data:
    pools:   2 pools, 513 pgs
    objects: 471.48k objects, 1.8 TiB
    usage:   6.9 TiB used, 9.6 TiB / 16 TiB avail
    pgs:     513 active+clean

  io:
    client:   0 B/s rd, 304 KiB/s wr, 0 op/s rd, 27 op/s wr

ceph df

Code:
--- RAW STORAGE ---
CLASS    SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    16 TiB  9.6 TiB  6.9 TiB   6.9 TiB      41.53
TOTAL  16 TiB  9.6 TiB  6.9 TiB   6.9 TiB      41.53

--- POOLS ---
POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
pool_vm                 1  512  1.7 TiB  471.47k  6.8 TiB  47.18    1.9 TiB
device_health_metrics   2    1   31 MiB        8  122 MiB      0    1.9 TiB

ceph osd df tree
Code:
ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP    META      AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         16.35547         -   16 TiB  6.9 TiB  6.8 TiB  98 MiB    30 GiB  9.6 TiB  41.53  1.00    -          root default
 -3          4.12357         -  4.1 TiB  1.7 TiB  1.7 TiB  25 MiB   7.2 GiB  2.4 TiB  41.54  1.00    -              host pve-01
  1    ssd   0.23230   1.00000  238 GiB  104 GiB  102 GiB  22 KiB   1.0 GiB  134 GiB  43.52  1.05   30      up          osd.1
  2    ssd   0.23230   1.00000  238 GiB  102 GiB  101 GiB   7 KiB   416 MiB  136 GiB  42.76  1.03   30      up          osd.2
  3    ssd   0.46519   1.00000  476 GiB  185 GiB  183 GiB   8 KiB   1.1 GiB  292 GiB  38.74  0.93   54      up          osd.3
  4    ssd   0.46519   1.00000  476 GiB  199 GiB  198 GiB  25 MiB   1.0 GiB  277 GiB  41.84  1.01   59      up          osd.4
  5    ssd   0.90918   1.00000  931 GiB  368 GiB  367 GiB  22 KiB   1.2 GiB  563 GiB  39.50  0.95  108      up          osd.5
 19    ssd   1.81940   1.00000  1.8 TiB  797 GiB  794 GiB  31 KiB   2.4 GiB  1.0 TiB  42.78  1.03  232      up          osd.19
 -5          4.12358         -  4.1 TiB  1.7 TiB  1.7 TiB  25 MiB   7.2 GiB  2.4 TiB  41.54  1.00    -              host pve-02
  6    ssd   1.81940   1.00000  1.8 TiB  778 GiB  775 GiB  25 MiB   2.5 GiB  1.1 TiB  41.75  1.01  228      up          osd.6
  7    ssd   0.23230   1.00000  238 GiB   96 GiB   95 GiB  21 KiB   943 MiB  142 GiB  40.28  0.97   28      up          osd.7
  8    ssd   0.23230   1.00000  238 GiB  116 GiB  116 GiB   1 KiB   462 MiB  122 GiB  48.81  1.18   34      up          osd.8
  9    ssd   0.46519   1.00000  476 GiB  184 GiB  183 GiB  10 KiB   677 MiB  293 GiB  38.54  0.93   54      up          osd.9
 10    ssd   0.46519   1.00000  476 GiB  213 GiB  212 GiB   9 KiB   1.1 GiB  264 GiB  44.64  1.07   62      up          osd.10
 18    ssd   0.90919   1.00000  931 GiB  368 GiB  366 GiB  14 KiB   1.5 GiB  563 GiB  39.51  0.95  107      up          osd.18
 -7          4.21475         -  4.1 TiB  1.7 TiB  1.7 TiB  25 MiB   8.1 GiB  2.4 TiB  41.55  1.00    -              host pve-03
 12    ssd   1.81940   1.00000  1.8 TiB  765 GiB  762 GiB  25 MiB   2.4 GiB  1.1 TiB  41.06  0.99  224      up          osd.12
 13    ssd   0.23239   1.00000  238 GiB   79 GiB   78 GiB   2 KiB   807 MiB  159 GiB  33.11  0.80   23      up          osd.13
 14    ssd   0.23239   1.00000  238 GiB  117 GiB  116 GiB   4 KiB  1012 MiB  121 GiB  49.31  1.19   34      up          osd.14
 15    ssd   0.46529   1.00000  476 GiB  199 GiB  198 GiB  12 KiB   614 MiB  278 GiB  41.74  1.00   58      up          osd.15
 16    ssd   0.46529   1.00000  476 GiB  181 GiB  179 GiB  10 KiB   1.3 GiB  296 GiB  37.95  0.91   53      up          osd.16
 17    ssd   1.00000   1.00000  931 GiB  414 GiB  412 GiB  17 KiB   2.0 GiB  517 GiB  44.47  1.07  121      up          osd.17
-13          3.89357         -  4.1 TiB  1.7 TiB  1.7 TiB  25 MiB   7.5 GiB  2.4 TiB  41.51  1.00    -              host pve-04
 21    ssd   0.23289   1.00000  477 GiB  102 GiB  102 GiB  11 KiB   799 MiB  374 GiB  21.48  0.52   30      up          osd.21
 22    ssd   0.46579   1.00000  477 GiB  218 GiB  217 GiB  24 KiB   1.4 GiB  259 GiB  45.75  1.10   63      up          osd.22
 23    ssd   0.46579   1.00000  477 GiB  201 GiB  200 GiB   3 KiB   868 MiB  276 GiB  42.16  1.01   59      up          osd.23
 24    ssd   0.90970   1.00000  932 GiB  424 GiB  422 GiB  24 MiB   1.8 GiB  508 GiB  45.50  1.10  125      up          osd.24
 25    ssd   1.81940   1.00000  1.8 TiB  808 GiB  806 GiB  28 KiB   2.7 GiB  1.0 TiB  43.40  1.04  236      up          osd.25
                         TOTAL   16 TiB  6.9 TiB  6.8 TiB  98 MiB    30 GiB  9.6 TiB  41.53
MIN/MAX VAR: 0.52/1.19  STDDEV: 5.47

thank's everybody
 
Should the weight of OSD: 17 not also be: 0.9xxxx and of OSD: 21 not also be: 0.4xxxx?

At least, my inner monk is triggered... :p
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!