[SOLVED] Ceph not backfilling or scrubbing by the looks of it

Donovan Hoare

Active Member
Nov 16, 2017
28
6
43
43
Good day all.
As far as i can tell my ceph is not scrubbing or backfilling since i added new OSD's to the cluster as recovery time is now 0bytes for 2 days
Code:
ceph -s
  cluster:
    id:     32e62262-67a6-4129-9464-773375643266
    health: HEALTH_WARN
            61 pgs not deep-scrubbed in time
            44 pgs not scrubbed in time
 
  services:
    mon: 3 daemons, quorum atsho2p1,atsho2p2,atsho2p3 (age 2d)
    mgr: atsho2p1(active, since 8d), standbys: atsho2p2, atsho2p3
    mds: 1/1 daemons up, 2 standby
    osd: 47 osds: 47 up (since 3d), 47 in (since 8d); 19 remapped pgs
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 417 pgs
    objects: 2.01M objects, 7.6 TiB
    usage:   25 TiB used, 90 TiB / 115 TiB avail
    pgs:     240616/6029865 objects misplaced (3.990%)
             358 active+clean
             39  active+clean+scrubbing+deep
             16  active+remapped+backfill_wait
             3   active+remapped+backfilling
             1   active+clean+scrubbing
 
  io:
    client:   82 MiB/s rd, 9.1 MiB/s wr, 87 op/s rd, 823 op/s wr

Some other commands
Code:
ceph pg dump_stuck degraded
ok
ceph pg dump_stuck undersized
ok
ceph pg dump_stuck stale
ok
ceph pg dump_stuck inactive
ok
ceph pg dump_stuck unclean

PG_STAT  STATE                          UP          UP_PRIMARY  ACTING      ACTING_PRIMARY
4.7d     active+remapped+backfill_wait  [43,33,35]          43  [43,35,16]              43
4.75     active+remapped+backfill_wait  [36,14,42]          36   [36,42,4]              36
4.61     active+remapped+backfill_wait  [43,12,35]          43   [43,35,4]              43
4.66     active+remapped+backfill_wait  [43,38,14]          43  [43,38,18]              43
4.6f     active+remapped+backfill_wait  [43,17,36]          43  [43,36,31]              43
4.35     active+remapped+backfill_wait   [16,43,7]          16   [43,7,31]              43
4.2a       active+remapped+backfilling  [43,35,13]          43   [43,35,7]              43
4.2d     active+remapped+backfill_wait   [43,35,2]          43    [43,2,6]              43
4.23       active+remapped+backfilling   [43,14,7]          43   [43,7,35]              43
4.4      active+remapped+backfill_wait  [43,12,35]          43  [43,35,14]              43
4.64     active+remapped+backfill_wait   [43,35,7]          43   [43,7,15]              43
4.22     active+remapped+backfill_wait  [14,36,44]          14  [36,44,30]              36
4.1d     active+remapped+backfill_wait  [43,12,36]          43   [43,36,0]              43
4.44     active+remapped+backfill_wait  [43,36,16]          43   [43,36,3]              43
4.49     active+remapped+backfill_wait  [43,19,33]          43   [43,6,36]              43
4.5b     active+remapped+backfill_wait  [43,14,36]          43  [43,36,32]              43
4.5c       active+remapped+backfilling   [43,36,2]          43  [43,36,15]              43
4.1a     active+remapped+backfill_wait  [43,36,17]          43   [43,36,5]              43
1.0      active+remapped+backfill_wait   [38,20,2]          38  [38,20,44]              38
ok

Code:
ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    103 TiB   80 TiB   22 TiB    22 TiB      21.93
ssd     13 TiB  9.9 TiB  2.9 TiB   2.9 TiB      22.86
TOTAL  115 TiB   90 TiB   25 TiB    25 TiB      22.03
 
--- POOLS ---
POOL                    ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                     1    1   25 MiB        8   76 MiB      0     21 TiB
atsho2-ssd               3  128  993 GiB  255.33k  2.9 TiB  24.92    2.9 TiB
atsho2-hdd               4  128  6.7 TiB    1.75M   20 TiB  26.26     19 TiB

Code:
ceph osd tree
ID   CLASS  WEIGHT     TYPE NAME          STATUS  REWEIGHT  PRI-AFF
 -1         115.34235  root default                              
 -3          13.46262      host atsho2p1                          
  0    hdd    1.00060          osd.0          up   1.00000  1.00000
  1    hdd    1.00060          osd.1          up   1.00000  1.00000
 10    hdd    1.00060          osd.10         up   1.00000  1.00000
 11    hdd    1.00060          osd.11         up   1.00000  1.00000
 12    hdd    1.00060          osd.12         up   1.00000  1.00000
 13    hdd    1.00060          osd.13         up   1.00000  1.00000
 14    hdd    1.00060          osd.14         up   1.00000  1.00000
 15    hdd    1.00060          osd.15         up   1.00000  1.00000
 16    hdd    1.00060          osd.16         up   1.00000  1.00000
 17    hdd    1.00060          osd.17         up   1.00000  1.00000
 18    hdd    1.00060          osd.18         up   1.00000  1.00000
 19    hdd    1.00060          osd.19         up   1.00000  1.00000
 24    ssd    0.66429          osd.24         up   1.00000  1.00000
 25    ssd    0.66429          osd.25         up   1.00000  1.00000
 34    ssd    0.12689          osd.34         up   1.00000  1.00000
 -7          15.88333      host atsho2p2                          
  4    hdd    3.63869          osd.4          up   1.00000  1.00000
  5    hdd    3.63869          osd.5          up   1.00000  1.00000
  6    hdd    3.63869          osd.6          up   1.00000  1.00000
  7    hdd    3.63869          osd.7          up   1.00000  1.00000
 20    ssd    0.66429          osd.20         up   1.00000  1.00000
 21    ssd    0.66429          osd.21         up   1.00000  1.00000
 -5           5.69498      host atsho2p3                          
  2    hdd    1.09160          osd.2          up   1.00000  1.00000
  3    hdd    1.09160          osd.3          up   1.00000  1.00000
  8    hdd    1.09160          osd.8          up   1.00000  1.00000
  9    hdd    1.09160          osd.9          up   1.00000  1.00000
 22    ssd    0.66429          osd.22         up   1.00000  1.00000
 23    ssd    0.66429          osd.23         up   1.00000  1.00000
-13           9.91577      host atsho2p4                          
 26    hdd    1.09160          osd.26         up   1.00000  1.00000
 27    hdd    1.09160          osd.27         up   1.00000  1.00000
 30    hdd    1.09160          osd.30         up   1.00000  1.00000
 31    hdd    1.09160          osd.31         up   1.00000  1.00000
 32    hdd    1.09160          osd.32         up   1.00000  1.00000
 33    hdd    1.09160          osd.33         up   1.00000  1.00000
 28    ssd    1.68309          osd.28         up   1.00000  1.00000
 29    ssd    1.68309          osd.29         up   1.00000  1.00000
-16          30.64233      host atsho2p5                          
 35    hdd   11.01169          osd.35         up   1.00000  1.00000
 36    hdd   11.01169          osd.36         up   1.00000  1.00000
 37    hdd    2.82669          osd.37         up   1.00000  1.00000
 38    hdd    2.82669          osd.38         up   1.00000  1.00000
 39    ssd    1.48279          osd.39         up   1.00000  1.00000
 40    ssd    1.48279          osd.40         up   1.00000  1.00000
-19          39.74332      host atsho2p6                          
 41    hdd    3.78519          osd.41         up   1.00000  1.00000
 42    hdd    3.78519          osd.42         up   1.00000  1.00000
 43    hdd   14.89449          osd.43         up   1.00000  1.00000
 44    hdd   14.89449          osd.44         up   1.00000  1.00000
 45    ssd    1.19199          osd.45         up   1.00000  1.00000
 46    ssd    1.19199          osd.46         up   1.00000  1.00000

I cant seem to find the command that will list with percentages and pg's per disk like the UI
EDIT Found it

Code:
ceph osd df
ID  CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
 0    hdd   1.00060   1.00000  1.0 TiB  412 GiB  319 GiB   20 KiB  2.6 GiB  612 GiB  40.25  1.83   10      up
 1    hdd   1.00060   1.00000  1.0 TiB  359 GiB  266 GiB   28 KiB  1.0 GiB  666 GiB  35.04  1.59   10      up
10    hdd   1.00060   1.00000  1.0 TiB  360 GiB  266 GiB   21 KiB  1.9 GiB  665 GiB  35.09  1.59   10      up
11    hdd   1.00060   1.00000  1.0 TiB  307 GiB  214 GiB   22 KiB  2.1 GiB  718 GiB  29.94  1.36   13      up
12    hdd   1.00060   1.00000  1.0 TiB  306 GiB  213 GiB   26 KiB  1.9 GiB  719 GiB  29.84  1.35   11      up
13    hdd   1.00060   1.00000  1.0 TiB  305 GiB  212 GiB   28 KiB  1.0 GiB  719 GiB  29.81  1.35    8      up
14    hdd   1.00060   1.00000  1.0 TiB  228 GiB  135 GiB   28 KiB  1.6 GiB  796 GiB  22.27  1.01    6      up
15    hdd   1.00060   1.00000  1.0 TiB  412 GiB  319 GiB   28 KiB  1.9 GiB  613 GiB  40.21  1.83   11      up
16    hdd   1.00060   1.00000  1.0 TiB  255 GiB  162 GiB   23 KiB  1.5 GiB  770 GiB  24.86  1.13   10      up
17    hdd   1.00060   1.00000  1.0 TiB  253 GiB  160 GiB   25 KiB  1.7 GiB  771 GiB  24.71  1.12   10      up
18    hdd   1.00060   1.00000  1.0 TiB  357 GiB  264 GiB   18 KiB  2.6 GiB  667 GiB  34.86  1.58   12      up
19    hdd   1.00060   1.00000  1.0 TiB  330 GiB  237 GiB   25 KiB  1.8 GiB  695 GiB  32.19  1.46   10      up
24    ssd   0.66429   1.00000  680 GiB  179 GiB  177 GiB   49 KiB  2.1 GiB  501 GiB  26.35  1.20   23      up
25    ssd   0.66429   1.00000  680 GiB  182 GiB  180 GiB   52 KiB  1.7 GiB  499 GiB  26.69  1.21   23      up
34    ssd   0.12689   1.00000  130 GiB   33 GiB   32 GiB   24 KiB  1.5 GiB   97 GiB  25.71  1.17    4      up
 4    hdd   3.63869   1.00000  3.6 TiB  903 GiB  900 GiB   35 KiB  2.6 GiB  2.8 TiB  24.23  1.10   34      up
 5    hdd   3.63869   1.00000  3.6 TiB  797 GiB  794 GiB   29 KiB  3.2 GiB  2.9 TiB  21.40  0.97   34      up
 6    hdd   3.63869   1.00000  3.6 TiB  907 GiB  904 GiB   39 KiB  2.8 GiB  2.8 TiB  24.35  1.11   37      up
 7    hdd   3.63869   1.00000  3.6 TiB  905 GiB  902 GiB   38 KiB  3.5 GiB  2.8 TiB  24.30  1.10   38      up
20    ssd   0.66429   1.00000  680 GiB  172 GiB  170 GiB   31 KiB  1.9 GiB  508 GiB  25.34  1.15   23      up
21    ssd   0.66429   1.00000  680 GiB  157 GiB  155 GiB   36 KiB  2.0 GiB  523 GiB  23.14  1.05   20      up
 2    hdd   1.09160   1.00000  1.1 TiB  214 GiB  213 GiB   29 KiB  1.2 GiB  903 GiB  19.18  0.87   14      up
 3    hdd   1.09160   1.00000  1.1 TiB  268 GiB  266 GiB   27 KiB  1.5 GiB  850 GiB  23.95  1.09   10      up
 8    hdd   1.09160   1.00000  1.1 TiB  212 GiB  211 GiB   21 KiB  852 MiB  906 GiB  18.97  0.86   11      up
 9    hdd   1.09160   1.00000  1.1 TiB  321 GiB  319 GiB   27 KiB  1.8 GiB  797 GiB  28.69  1.30   13      up
22    ssd   0.66429   1.00000  680 GiB  149 GiB  148 GiB   43 KiB  1.6 GiB  531 GiB  21.96  1.00   19      up
23    ssd   0.66429   1.00000  680 GiB  163 GiB  162 GiB   25 KiB  984 MiB  517 GiB  23.97  1.09   21      up
26    hdd   1.09160   1.00000  1.1 TiB  268 GiB  266 GiB   19 KiB  2.3 GiB  849 GiB  24.01  1.09    9      up
27    hdd   1.09160   1.00000  1.1 TiB  267 GiB  264 GiB   16 KiB  2.6 GiB  851 GiB  23.87  1.08   10      up
30    hdd   1.09160   1.00000  1.1 TiB  320 GiB  318 GiB   33 KiB  2.7 GiB  798 GiB  28.65  1.30   12      up
31    hdd   1.09160   1.00000  1.1 TiB  372 GiB  369 GiB   28 KiB  2.6 GiB  746 GiB  33.27  1.51   12      up
32    hdd   1.09160   1.00000  1.1 TiB  375 GiB  372 GiB   21 KiB  2.6 GiB  743 GiB  33.53  1.52   12      up
33    hdd   1.09160   1.00000  1.1 TiB  132 GiB  131 GiB   25 KiB  1.1 GiB  986 GiB  11.83  0.54    5      up
28    ssd   1.68309   1.00000  1.7 TiB  356 GiB  355 GiB   70 KiB  1.1 GiB  1.3 TiB  20.68  0.94   46      up
29    ssd   1.68309   1.00000  1.7 TiB  375 GiB  373 GiB   72 KiB  2.1 GiB  1.3 TiB  21.75  0.99   48      up
35    hdd  11.01169   1.00000   11 TiB  2.0 TiB  1.9 TiB   55 KiB  4.6 GiB  9.0 TiB  17.87  0.81   90      up
36    hdd  11.01169   1.00000   11 TiB  2.2 TiB  2.1 TiB   47 KiB  5.3 GiB  8.8 TiB  19.72  0.90   89      up
37    hdd   2.82669   1.00000  2.8 TiB  578 GiB  478 GiB   38 KiB  1.9 GiB  2.3 TiB  19.96  0.91   21      up
38    hdd   2.82669   1.00000  2.8 TiB  525 GiB  425 GiB   20 KiB  2.8 GiB  2.3 TiB  18.12  0.82   19      up
39    ssd   1.48279   1.00000  1.5 TiB  319 GiB  317 GiB   59 KiB  1.7 GiB  1.2 TiB  21.01  0.95   41      up
40    ssd   1.48279   1.00000  1.5 TiB  344 GiB  342 GiB   51 KiB  2.5 GiB  1.1 TiB  22.68  1.03   44      up
41    hdd   3.78519   1.00000  3.8 TiB  681 GiB  531 GiB   22 KiB  1.6 GiB  3.1 TiB  17.57  0.80   23      up
42    hdd   3.78519   1.00000  3.8 TiB  680 GiB  530 GiB   36 KiB  2.2 GiB  3.1 TiB  17.54  0.80   23      up
43    hdd  14.89449   1.00000   15 TiB  3.0 TiB  2.7 TiB   59 KiB  6.1 GiB   12 TiB  20.40  0.93  114      up
44    hdd  14.89449   1.00000   15 TiB  3.0 TiB  2.6 TiB   69 KiB  6.1 GiB   12 TiB  20.06  0.91  115      up
45    ssd   1.19199   1.00000  1.2 TiB  281 GiB  280 GiB   43 KiB  1.2 GiB  940 GiB  23.02  1.05   36      up
46    ssd   1.19199   1.00000  1.2 TiB  289 GiB  286 GiB   53 KiB  2.5 GiB  932 GiB  23.67  1.07   37      up
                        TOTAL  115 TiB   25 TiB   23 TiB  1.6 MiB  107 GiB   90 TiB  22.03                   
MIN/MAX VAR: 0.54/1.83  STDDEV: 6.73

Let me know if there is anything else you need to be of some assistance and thank you
 
Last edited:
It's not scrubbing because the cluster is still backfilling.
Thanks, But the backfilling doesnt seem to be happeing either.
Its been hours and therre is no prgress from my 1st post


Code:
ceph -s
  cluster:
    id:     32e62262-67a6-4129-9464-773375643266
    health: HEALTH_WARN
            61 pgs not deep-scrubbed in time
            42 pgs not scrubbed in time
 
  services:
    mon: 3 daemons, quorum atsho2p1,atsho2p2,atsho2p3 (age 2d)
    mgr: atsho2p1(active, since 8d), standbys: atsho2p2, atsho2p3
    mds: 1/1 daemons up, 2 standby
    osd: 47 osds: 47 up (since 3d), 47 in (since 8d); 19 remapped pgs
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 417 pgs
    objects: 1.92M objects, 7.3 TiB
    usage:   24 TiB used, 91 TiB / 115 TiB avail
    pgs:     227777/5747562 objects misplaced (3.963%)
             355 active+clean
             42  active+clean+scrubbing+deep
             16  active+remapped+backfill_wait
             3   active+remapped+backfilling
             1   active+clean+scrubbing
 
  io:
    client:   5.3 KiB/s rd, 5.4 MiB/s wr, 0 op/s rd, 711 op/s wr

How can i check the progrress and speed

as per other posts i have now tried this
Code:
sudo ceph tell osd.* injectargs --osd_recovery_sleep_hdd=0
sudo ceph tell 'osd.*' injectargs --osd-max-backfills=16 --osd-recovery-max-active=4
 
Last edited:
Maybe one of the OSDs is stuck.

Look which OSDs the PGs are located on that should currently do the backfilling and restart those OSDs.
Thanks very much tht was it.
So i restarted the OSD that showed up in the above post waiting for backfilling.
Then it continued and is now complete
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!