ceph pgs not scrubbing for a long time

alexskysilk

Distinguished Member
Oct 16, 2015
2,216
545
183
Chatsworth, CA
www.skysilk.com
I've been wrestling with this issue for over a month now, and I cant seem to get past it.

I have two pgs that havent been scrubbed since June:
Code:
$ ceph health detail | grep "not scrubbed since 2024-06"
    pg 17.3dc not scrubbed since 2024-06-01T20:46:29.042727-0700
    pg 17.137 not scrubbed since 2024-06-23T02:50:12.630983-0700

performing ceph pg query shows

Code:
    "scrubber": {
        "active": false,
        "must_scrub": true,
        "must_deep_scrub": false,
        "must_repair": false,
        "need_auto": false,
        "scrub_reg_stamp": "1.000000",
        "schedule": "queued for deep scrub"
    },

but never actually executes. My google-fu clearly has failed me as I'm not able to get past this, and the time just keeps ticking by. Any ideas?
 
Do you run Crph along with the VMs on the same hardware? The load may be too high:

osd_scrub_load_threshold

Description: The maximum load. Ceph will not scrub when the system load (as defined by the getloadavg() function) is higher than this number. Default is 0.5.

Type: Float

Default 0.5

Allowed range: [0, 23]


You may need to adjust that threshold via "ceph config set".
 
good thought. There are no compute resources sharing nodes with OSDS, but nevertheless I already changed osd_scrub_load_threshhold to 3.5, but your comment prompted me to walk over my osd nodes to see whats happening there.

lo and behold, they're all busy, mostly with OSD load.

now to figure out how to quiesce them.