Ceph pgs inactive or incomplete

Fred Saunier

Well-Known Member
Aug 24, 2017
55
2
48
Brussels, BE
Hello,

I have undergone a severe crash of my ceph cluster, due to electrical problems. I have partially succeeded in recovering most of the data. Unfortunately, some of it seems unrecoverable:

Code:
:~$ sudo ceph health
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 27 pgs inactive, 23 pgs incomplete; 23 pgs not deep-scrubbed in time; 23 pgs not scrubbed in
time; 568 slow ops, oldest one blocked for 5433 sec, daemons [osd.16,osd.17,osd.18,osd.19,osd.22,osd.24,osd.3,osd.4,osd.7] have slow ops.

For instance, if I map and query pg 12.22 I get contradictory results:

Code:
:~$ sudo ceph pg map 12.22
osdmap e39939 pg 12.22 (12.22) -> up [24,19] acting [24,19]

:~$ sudo ceph pg 12.22 query
Error ENOENT: i don't have pgid 12.22

My objective at this point is to get back to a healthy ceph. How can I get rid of those pgs that a beyond repair? I have tried ceph pg repair {pg.id} on those 27 pgs inactive, to no effect.

Thanks,
Fred
 
All OSDs have been restarted, but I still cannot get rid of this pg 12.22:
Code:
:~# ceph pg 12.22 mark_unfound_lost revert
Error ENOENT: i don't have pgid 12.22

:~# ceph pg 12.22 mark_unfound_lost delete
Error ENOENT: i don't have pgid 12.22

I saw on O'Reilly a suggestion to delete the header folder (https://www.oreilly.com/library/vie...05/42d80c67-10aa-4cf2-8812-e38c861cdc5d.xhtml), but the directory /var/lib/ceph/osd/ceph-X/current/xx.x_head/ no longer seems to exist, and I cannot find a '*_head' folder anywhere in the /var/lib/ceph folder structure.
 
Code:
:~$ sudo ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 27 pgs inactive, 23 pgs incomplete; 23 pgs not deep-scrubbed in time; 23 pgs not scrubbed in time; 5271 slow ops, oldest one blocked for 94976 sec, daemons [osd.16,osd.17,osd.18,osd.19,osd.22,osd.24,osd.3,osd.4,osd.7] have slow ops.
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
    mds.prox4(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 93429 secs
PG_AVAILABILITY Reduced data availability: 27 pgs inactive, 23 pgs incomplete
    pg 12.22 is stuck inactive for 95447.179554, current state unknown, last acting []
    pg 12.38 is incomplete, acting [4,19]
    pg 12.41 is incomplete, acting [3,18]
    pg 12.53 is incomplete, acting [7,3]
    pg 12.5e is incomplete, acting [22,18]
    pg 12.6c is incomplete, acting [18,24]
    pg 12.7d is stuck inactive for 95447.179554, current state unknown, last acting []
    pg 12.8e is incomplete, acting [24,10]
    pg 12.bd is incomplete, acting [19,20]
    pg 12.10d is incomplete, acting [19,22]
    pg 12.119 is incomplete, acting [22,18]
    pg 12.126 is stuck inactive for 95447.179554, current state unknown, last acting []
    pg 12.145 is incomplete, acting [7,3]
    pg 12.147 is incomplete, acting [18,22]
    pg 12.14b is incomplete, acting [18,3]
    pg 12.158 is incomplete, acting [18,3]
    pg 12.173 is incomplete, acting [18,24]
    pg 12.177 is incomplete, acting [7,3]
    pg 12.17f is stuck inactive for 95447.179554, current state unknown, last acting []
    pg 12.1a6 is incomplete, acting [24,18]
    pg 12.1a7 is incomplete, acting [18,3]
    pg 12.1c8 is incomplete, acting [24,18]
    pg 12.1d4 is incomplete, acting [24,9]
    pg 12.1d7 is incomplete, acting [16,20]
    pg 13.0 is incomplete, acting [20,19]
    pg 13.79 is incomplete, acting [3,5]
    pg 13.7c is incomplete, acting [18,3]
PG_NOT_DEEP_SCRUBBED 23 pgs not deep-scrubbed in time
    pg 12.bd not deep-scrubbed since 2021-07-06 11:36:40.273738
    pg 12.8e not deep-scrubbed since 2021-07-08 22:32:23.913393
    pg 12.41 not deep-scrubbed since 2021-07-11 21:31:35.204799
    pg 12.38 not deep-scrubbed since 2021-07-07 14:15:40.238958
    pg 13.0 not deep-scrubbed since 2021-07-10 00:24:17.206661
    pg 12.5e not deep-scrubbed since 2021-07-08 22:03:30.503820
    pg 12.53 not deep-scrubbed since 2021-07-06 10:14:04.080825
    pg 12.6c not deep-scrubbed since 2021-07-11 02:31:54.058387
    pg 13.79 not deep-scrubbed since 2021-07-10 15:55:16.005669
    pg 13.7c not deep-scrubbed since 2021-07-05 13:13:00.947506
    pg 12.10d not deep-scrubbed since 2021-07-09 18:32:56.421736
    pg 12.119 not deep-scrubbed since 2021-07-07 03:50:59.876078
    pg 12.14b not deep-scrubbed since 2021-07-08 02:45:17.555106
    pg 12.145 not deep-scrubbed since 2021-07-05 18:19:03.487633
    pg 12.147 not deep-scrubbed since 2021-07-06 04:25:04.063377
    pg 12.158 not deep-scrubbed since 2021-07-06 09:49:50.201250
    pg 12.173 not deep-scrubbed since 2021-07-09 16:00:08.924595
    pg 12.177 not deep-scrubbed since 2021-07-10 18:40:24.283839
    pg 12.1a7 not deep-scrubbed since 2021-07-10 09:48:20.775911
    pg 12.1a6 not deep-scrubbed since 2021-07-10 03:59:34.386266
    pg 12.1c8 not deep-scrubbed since 2021-07-09 22:52:04.955715
    pg 12.1d4 not deep-scrubbed since 2021-07-12 18:13:49.188494
    pg 12.1d7 not deep-scrubbed since 2021-07-06 05:26:41.732594
PG_NOT_SCRUBBED 23 pgs not scrubbed in time
    pg 12.bd not scrubbed since 2021-07-11 19:11:22.548364
    pg 12.8e not scrubbed since 2021-07-12 01:00:18.019265
    pg 12.41 not scrubbed since 2021-07-13 00:16:08.695677
    pg 12.38 not scrubbed since 2021-07-11 19:39:28.086485
    pg 13.0 not scrubbed since 2021-07-11 22:22:53.102255
    pg 12.5e not scrubbed since 2021-07-12 20:54:29.096296
    pg 12.53 not scrubbed since 2021-07-12 15:53:44.537069
    pg 12.6c not scrubbed since 2021-07-12 11:53:06.049167
    pg 13.79 not scrubbed since 2021-07-11 21:51:35.606499
    pg 13.7c not scrubbed since 2021-07-12 02:52:48.832582
    pg 12.10d not scrubbed since 2021-07-12 03:20:09.959440
    pg 12.119 not scrubbed since 2021-07-11 20:38:40.838518
    pg 12.14b not scrubbed since 2021-07-12 00:42:07.083203
    pg 12.145 not scrubbed since 2021-07-12 05:09:53.358642
    pg 12.147 not scrubbed since 2021-07-12 16:12:51.036545
    pg 12.158 not scrubbed since 2021-07-11 23:59:31.491733
    pg 12.173 not scrubbed since 2021-07-12 10:33:16.835784
    pg 12.177 not scrubbed since 2021-07-12 03:52:27.861605
    pg 12.1a7 not scrubbed since 2021-07-12 02:26:00.284785
    pg 12.1a6 not scrubbed since 2021-07-12 23:55:08.291654
    pg 12.1c8 not scrubbed since 2021-07-12 14:11:56.571619
    pg 12.1d4 not scrubbed since 2021-07-12 18:13:49.188494
    pg 12.1d7 not scrubbed since 2021-07-12 05:36:23.677295
SLOW_OPS 5271 slow ops, oldest one blocked for 94976 sec, daemons [osd.16,osd.17,osd.18,osd.19,osd.22,osd.24,osd.3,osd.4,osd.7] have slow ops.

Code:
:~$ sudo ceph osd df tree
ID  CLASS WEIGHT    REWEIGHT SIZE    RAW USE  DATA     OMAP    META    AVAIL    %USE  VAR  PGS STATUS TYPE NAME     
 -1       147.36133        - 147 TiB   91 TiB   91 TiB 477 MiB 152 GiB   56 TiB 61.91 1.00   -        root default   
-13        27.29024        -  27 TiB   18 TiB   18 TiB  94 MiB  32 GiB  9.4 TiB 65.68 1.06   -            host prox1
 12   hdd   3.63869  1.00000 3.6 TiB  2.4 TiB  2.4 TiB 9.7 MiB 4.2 GiB  1.3 TiB 64.81 1.05  75     up         osd.12
 13   hdd   3.63869  1.00000 3.6 TiB  2.5 TiB  2.5 TiB 8.1 MiB 4.5 GiB  1.1 TiB 70.01 1.13  84     up         osd.13
 14   hdd   3.63869  1.00000 3.6 TiB  2.4 TiB  2.4 TiB 9.0 MiB 4.4 GiB  1.3 TiB 65.40 1.06  76     up         osd.14
 25   hdd   7.27739  1.00000 7.3 TiB  4.7 TiB  4.7 TiB  16 MiB 7.8 GiB  2.6 TiB 64.55 1.04 150     up         osd.25
 26   hdd   7.27739  1.00000 7.3 TiB  5.0 TiB  4.9 TiB  17 MiB 8.3 GiB  2.3 TiB 68.10 1.10 158     up         osd.26
 11   ssd   1.81940  1.00000 1.8 TiB 1009 GiB 1007 GiB  34 MiB 2.5 GiB  854 GiB 54.16 0.87  78     up         osd.11
-16        27.29024        -  27 TiB   17 TiB   17 TiB 100 MiB  28 GiB   10 TiB 62.18 1.00   -            host prox2
  7   hdd   3.63869  1.00000 3.6 TiB  2.3 TiB  2.3 TiB 9.3 MiB 3.1 GiB  1.4 TiB 62.25 1.01  72     up         osd.7 
 16   hdd   3.63869  1.00000 3.6 TiB  2.7 TiB  2.7 TiB  11 MiB 4.6 GiB 1007 GiB 72.98 1.18  85     up         osd.16
 17   hdd   3.63869  1.00000 3.6 TiB  2.3 TiB  2.3 TiB 9.7 MiB 4.1 GiB  1.4 TiB 62.72 1.01  75     up         osd.17
 18   hdd   7.27739  1.00000 7.3 TiB  4.1 TiB  4.1 TiB  14 MiB 5.7 GiB  3.2 TiB 56.09 0.91 142     up         osd.18
 19   hdd   7.27739  1.00000 7.3 TiB  4.7 TiB  4.7 TiB  21 MiB 8.1 GiB  2.6 TiB 64.73 1.05 154     up         osd.19
 15   ssd   1.81940  1.00000 1.8 TiB  997 GiB  994 GiB  34 MiB 2.6 GiB  866 GiB 53.50 0.86  77     up         osd.15
 -3        23.64957        -  24 TiB   15 TiB   15 TiB  81 MiB  27 GiB  8.3 TiB 65.09 1.05   -            host prox3
  9   hdd  10.91409  1.00000  11 TiB  7.0 TiB  7.0 TiB  21 MiB  11 GiB  3.9 TiB 64.59 1.04 221     up         osd.9 
 10   hdd   7.27739  1.00000 7.3 TiB  4.6 TiB  4.6 TiB  18 MiB 8.0 GiB  2.7 TiB 62.79 1.01 152     up         osd.10
 21   hdd   3.63869  1.00000 3.6 TiB  2.9 TiB  2.8 TiB  11 MiB 5.0 GiB  807 GiB 78.34 1.27  89     up         osd.21
  2   ssd   1.81940  1.00000 1.8 TiB  946 GiB  944 GiB  32 MiB 2.4 GiB  917 GiB 50.78 0.82  73     up         osd.2 
 -5        27.29024        -  27 TiB   18 TiB   18 TiB  98 MiB  31 GiB  9.0 TiB 66.97 1.08   -            host prox4
  5   hdd   3.63869  1.00000 3.6 TiB  2.4 TiB  2.4 TiB 9.7 MiB 4.2 GiB  1.3 TiB 65.47 1.06  85     up         osd.5 
  6   hdd   3.63869  1.00000 3.6 TiB  2.3 TiB  2.3 TiB 8.2 MiB 3.8 GiB  1.3 TiB 64.24 1.04  73     up         osd.6 
  8   hdd   3.63869  1.00000 3.6 TiB  2.6 TiB  2.6 TiB  10 MiB 4.6 GiB  1.1 TiB 70.56 1.14  83     up         osd.8 
 23   hdd   7.27739  1.00000 7.3 TiB  5.0 TiB  5.0 TiB  19 MiB 8.1 GiB  2.3 TiB 68.48 1.11 162     up         osd.23
 27   hdd   7.27739  1.00000 7.3 TiB  5.0 TiB  5.0 TiB  16 MiB 8.0 GiB  2.3 TiB 68.30 1.10 158     up         osd.27
  1   ssd   1.81940  1.00000 1.8 TiB  1.0 TiB  1.0 TiB  36 MiB 2.7 GiB  804 GiB 56.82 0.92  82     up         osd.1 
 -7        41.84105        -  42 TiB   23 TiB   23 TiB 104 MiB  34 GiB   19 TiB 54.19 0.88   -            host prox5
  3   hdd  10.91409  1.00000  11 TiB  5.7 TiB  5.7 TiB  16 MiB 8.9 GiB  5.2 TiB 52.15 0.84 187     up         osd.3 
  4   hdd   3.63869  1.00000 3.6 TiB  2.0 TiB  2.0 TiB 5.8 MiB 2.7 GiB  1.6 TiB 54.86 0.89  64     up         osd.4 
 20   hdd   7.27739  1.00000 7.3 TiB  3.8 TiB  3.8 TiB  13 MiB 5.9 GiB  3.4 TiB 52.77 0.85 124     up         osd.20
 22   hdd   7.27739  1.00000 7.3 TiB  4.2 TiB  4.2 TiB  18 MiB 6.3 GiB  3.1 TiB 57.12 0.92 142     up         osd.22
 24   hdd  10.91409  1.00000  11 TiB  6.1 TiB  6.0 TiB  18 MiB 7.7 GiB  4.9 TiB 55.46 0.90 197     up         osd.24
  0   ssd   1.81940  1.00000 1.8 TiB  957 GiB  954 GiB  32 MiB 2.3 GiB  906 GiB 51.35 0.83  74     up         osd.0 
                       TOTAL 147 TiB   91 TiB   91 TiB 477 MiB 152 GiB   56 TiB 61.91                               
MIN/MAX VAR: 0.82/1.27  STDDEV: 7.16
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!