[SOLVED] PG stuck incomplete

verbunk · Sep 3, 2025

Hey Folks,

Stashing this here as it's the only solution that worked for me and I will undoubtedly need it again.

Given,

Code:

$> ceph health detail
...
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg incomplete
    pg 7.188 is incomplete, acting [5,10,43] (reducing pool ceph_pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
...

Code:

$> ceph pg 7.188 query

Check recovery_info stanza for "peering_blocked_by_history_les_bound"

Check if this flag is false

Code:

ceph config get osd osd_find_best_info_ignore_history_les

you can potentially get the incomplete to rebuild.

First grab an OSD involved,

Code:

ceph pg 7.188 repair

and remember the OSD it informs.

Next flip the state of the osd variable,

Code:

ceph config set osd osd_find_best_info_ignore_history_les true

then restart the OSD from repair output. I actually restarted the host (after migrating as I had another maint. task). On powerup ceph started rebuilding correctly and in 15 mins the PG was in good shape again. The osd flag won't affect OSDs that don't have that recovery blocked-by set but best to toggle back to false after the cluster has recreated the PG. TBH there was speculation this wouldn't restore up the to exact last write that failed so perhaps you should try to export/backup the data first (never a bad idea).

Thx for being my notepad.

Search

Search

[SOLVED] PG stuck incomplete

verbunk

Member

We value your privacy