Ceph Reef: pgs not deep-scrubbed in time and pgs not scrubbed in time

kellogs

Active Member
May 14, 2024
170
26
28
Seems like these PGs have been in this state forever (i have taken step to manually start the scrubbing and it did not clear up these messages)
root@stor-200-21:~# ceph health detail
HEALTH_WARN 15 pgs not deep-scrubbed in time; 15 pgs not scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 15 pgs not deep-scrubbed in time
pg 8.14 not deep-scrubbed since 2024-09-27T19:42:47.463766+0700
pg 7.1b not deep-scrubbed since 2024-09-30T21:54:16.584697+0700
pg 8.1b not deep-scrubbed since 2024-09-29T23:22:37.812579+0700
pg 6.13 not deep-scrubbed since 2024-09-26T07:16:31.387575+0700
pg 6.12 not deep-scrubbed since 2024-09-27T04:55:34.699188+0700
pg 8.3 not deep-scrubbed since 2024-09-29T20:57:40.973759+0700
pg 7.c not deep-scrubbed since 2024-10-01T22:57:28.955052+0700
pg 8.b not deep-scrubbed since 2024-10-01T09:15:55.223268+0700
pg 7.3 not deep-scrubbed since 2024-10-02T05:12:50.743544+0700
pg 7.18 not deep-scrubbed since 2024-09-30T17:57:34.550318+0700
pg 8.a not deep-scrubbed since 2024-09-30T10:37:06.429820+0700
pg 8.5 not deep-scrubbed since 2024-09-25T19:23:57.422280+0700
pg 8.9 not deep-scrubbed since 2024-09-30T17:57:33.493494+0700
pg 8.4 not deep-scrubbed since 2024-09-30T21:54:13.122287+0700
pg 8.10 not deep-scrubbed since 2024-10-02T20:07:42.793192+0700
[WRN] PG_NOT_SCRUBBED: 15 pgs not scrubbed in time
pg 8.14 not scrubbed since 2024-10-02T08:04:26.363903+0700
pg 7.1b not scrubbed since 2024-10-02T03:14:53.854041+0700
pg 8.1b not scrubbed since 2024-10-02T15:06:12.351794+0700
pg 6.13 not scrubbed since 2024-10-02T01:48:49.382133+0700
pg 6.12 not scrubbed since 2024-10-02T08:19:53.026422+0700
pg 8.3 not scrubbed since 2024-10-02T04:29:55.426251+0700
pg 7.c not scrubbed since 2024-10-01T22:57:28.955052+0700
pg 8.b not scrubbed since 2024-10-02T09:36:03.568661+0700
pg 7.3 not scrubbed since 2024-10-02T05:12:50.743544+0700
pg 7.18 not scrubbed since 2024-10-01T22:17:53.504919+0700
pg 8.a not scrubbed since 2024-10-01T19:42:53.953996+0700
pg 8.5 not scrubbed since 2024-10-02T10:04:33.412894+0700
pg 8.9 not scrubbed since 2024-10-01T20:22:40.181880+0700
pg 8.4 not scrubbed since 2024-10-01T21:56:08.596293+0700
pg 8.10 not scrubbed since 2024-10-02T20:07:42.793192+0700


my next plan is to restart the OSDs which are associated with these PGs

root@stor-200-21:~# ceph pg map 8.14
osdmap e20431 pg 8.14 (8.14) -> up [13,56,34] acting [13,56,34]
root@stor-200-21:~# ceph pg map 7.1b
osdmap e20431 pg 7.1b (7.1b) -> up [47,56,31] acting [47,56,31]
root@stor-200-21:~# ceph pg map 8.1b
osdmap e20431 pg 8.1b (8.1b) -> up [56,38,10] acting [56,38,10]
root@stor-200-21:~# ceph pg map 6.13
osdmap e20431 pg 6.13 (6.13) -> up [30,51,56] acting [30,51,56]
root@stor-200-21:~# ceph pg map 6.12
osdmap e20431 pg 6.12 (6.12) -> up [18,56,10] acting [18,56,10]
root@stor-200-21:~# ceph pg map 8.3
osdmap e20431 pg 8.3 (8.3) -> up [49,1,67] acting [49,67,2]
root@stor-200-21:~# ceph pg map 7.c
osdmap e20431 pg 7.c (7.c) -> up [15,56,31] acting [15,56,31]
root@stor-200-21:~# ceph pg map 8.b
osdmap e20431 pg 8.b (8.b) -> up [54,14,49] acting [54,14,49]
root@stor-200-21:~# ceph pg map 7.3
osdmap e20431 pg 7.3 (7.3) -> up [56,5,13] acting [56,5,13]
root@stor-200-21:~# ceph pg map 7.18
osdmap e20431 pg 7.18 (7.18) -> up [4,30,25] acting [4,30,25]
root@stor-200-21:~# ceph pg map 8.a
osdmap e20431 pg 8.a (8.a) -> up [55,4,38] acting [55,4,38]
root@stor-200-21:~# ceph pg map 8.5
osdmap e20431 pg 8.5 (8.5) -> up [19,30,49] acting [19,30,49]
root@stor-200-21:~# ceph pg map 8.9
osdmap e20431 pg 8.9 (8.9) -> up [49,56,4] acting [49,56,4]
root@stor-200-21:~# ceph pg map 8.4
osdmap e20431 pg 8.4 (8.4) -> up [26,67,56] acting [26,67,56]
root@stor-200-21:~# ceph pg map 8.10
osdmap e20431 pg 8.10 (8.10) -> up [16,18,49] acting [16,18,49]



if anyone has experience these and any suggestion is welcome :)
 
seems like i am hitting a weird bug ... so what i did was restarting those listed OSDs and the stucked pgs not deep-scrubbed in time and pgs not scrubbed in time starts to clear up!
 
It may not necessarily be due to bug.

Ceph scrub requires active participation from all OSDs that host replica of the PG. If one or more OSDs are unresponsive due to high I/O, CPU Load, memory pressure or any number of reason, scrubbing can stall. Scrub and deep-scrub operations are queued internally within the OSDs. If these queues are stuck due to any reason, scrub can also stall.

Restarting OSDs, clears the queue, resets their state and clears any transient issues, allowing scrubbing to proceed. If you face the issue again, it may be helpful to adjust scrub settings to balance load and performance. It is helpful when HDD are used as OSD, which I am assuming is the case for your deployment:
Code:
ceph config set osd osd_scrub_load_threshold 0.5  # Reduce impact on busy OSDs
ceph config set osd osd_max_scrubs 2             # Limit concurrent scrubs
Reduce the load threshold to 0.4 or 0.2 for heavily loaded or resource constrained cluster.

If you want to apply the value without restarting OSDs or node, simply use injectargs Ceph command to apply values in real time.
 
  • Like
Reactions: kellogs
Dear Wasim,

Thank you for the information. I suspect one/more of the OSDs are not working properly (about to fail) hence caused the failed scrubbing. The CEPH nodes are all dedicated for ceph only with compute on separate nodes and this cluster barely have any VMs on it. You are correct that they are HDD OSDs which i setup earlier part of the year to "play" with proxmox and ceph since we were a Vmware shop.

How i came to this conclusion was 3 days i replaced 1 OSD which has an error from dmesg output and immediately the forever stucked 16 PGs reduced to 12 and that me an idea to restart those PG OSDs and 1 by 1 it start to clear up.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!