ceph wanring pgs not deep-scrubbed in time

powersupport · Jul 30, 2024

We’ve been receiving warnings about 621 PGs not being deep-scrubbed and 621 PGs not being scrubbed in time for a while now, and the issue doesn't seem to be resolving. Is there a command that can address this problem? There are over 600 PGs showing this error—will manually scrubbing each PG resolve the issue?

Thank you

gurubert · Aug 1, 2024

Scrubbing will not run if the load on the OSD nodes is above a certain threshold.

If you run a hyperconverged cluster you should adjust this threshold.

Look into the Ceph documentation about the configuration variable's name.

powersupport · Aug 5, 2024

I couldn't understand the recommended configuration, can anyone advise the same, we have 7 node cluster with 35 ODSs

Thank you

gurubert · Aug 5, 2024

The relevant part of the Ceph documentation: https://docs.ceph.com/en/reef/rados/configuration/osd-config-ref/#scrubbing

The config variable is called osd_scrub_load_threshold and has a default value of 0.5. This is the load per CPU.
If you have 12 CPUs scrubbing will only start if the total load is below 6.
When running VMs on the same nodes the load may be significantly higher so you need to increase this config setting.

ceph config set global osd_scrub_load_threshold 2.0

powersupport · Aug 5, 2024

Despite altering the values and seeing scrubbing processes in progress, the warning about 621 unsscrubbed and not deep scrubbed pgs persists. I cannot determine the cause of this discrepancy yet. no idea

alexskysilk · Aug 5, 2024

I had such an issue which was ended up being due to pgs on disks that should have been failed out. Ceph by default is not smart enough to fail OSDs if the underlying disk has not failed completely- zfs would have kicked out a drive with multiple read faults but ceph doesnt.

smart test all your disks.

Falk R. · Aug 5, 2024

You have some scrubs active, so everything seems to be OK with the configuration for now.
The most common causes are either disks that are too slow or too high a load on the nodes.
Of course, it can also be defective disks as the previous member wrote.

Can you provide some information on what disks are used as OSD and what the network setup looks like?

powersupport · Aug 6, 2024

HI,

All of the OSD disks are HDDs and the network port is 10 Gbps, the issue was only present recently, there was no issue before. I know HDD is slower, we use this cluster for storage purpose only , but it seems like unrelated to the issue.

There is a total of 1025 PGs and the issues only show for 621 all the time

gurubert · Aug 6, 2024

Do you see scrubbing on PGs in "ceph -s" or "ceph pg dump" output?
Is the number of unscrubbed PGs going down (slowly)?

Falk R. · Aug 6, 2024

powersupport said:
HI,

All of the OSD disks are HDDs and the network port is 10 Gbps, the issue was only present recently, there was no issue before. I know HDD is slower, we use this cluster for storage purpose only , but it seems like unrelated to the issue.

Then this is completely normal behavior.
I hope you have configured fast SSDs as Bluestore with sufficient capacity? WAL DB & Log need at least 1% of the storage capacity, according to RedHat documentation 4% is recommended.
If you don't have enough Bluestore memory, then WAL DB and Log will be moved to the HDDs and the whole thing will be very slow. The first time you notice this is when the scrubbings are not finished.
Unfortunately I have seen this happen several times.

powersupport · Aug 11, 2024

Strange, I managed to fix it by running the command "reweight-by-utilization" , not sure why, but it fixed the issue

Falk R. · Aug 11, 2024

powersupport said:
Strange, I managed to fix it by running the command "reweight-by-utilization" , not sure why, but it fixed the issue

Please Monitor the Usage on your OSDs. This Option can result in fill up single OSDs and bigger Problems when one OSD is Full.

alexskysilk · Aug 12, 2024

Falk R. said:
This Option can result in fill up single OSDs and bigger Problems when one OSD is Full.

uhh... thats not what reweight-by-utilization does. the default behavior is max change of +0.05 weight, and thats only to the LESS utilized OSDs.

Falk R. · Aug 12, 2024

alexskysilk said:
uhh... thats not what reweight-by-utilization does. the default behavior is max change of +0.05 weight, and thats only to the LESS utilized OSDs.

Unfortunately, I have seen it differently. But I also don't know whether other parameters were changed, as it was a setup from another service provider at the time.
The customer already had over 80% pool filling and wanted a better rebalancing. After the reweight-by-utilization option was set, some OSDs ran up to 90% full and we had a lot of trouble.

With a sensible setup and moderate pool filling, this is not a problem, but practice always shows you different things.

alexskysilk · Aug 13, 2024

Falk R. said:
Unfortunately, I have seen it differently. But I also don't know whether other parameters were changed, as it was a setup from another service provider at the time.

Thats the key, really. unless the user deliberately created an untenable situation, reweighting can only push relative weight DOWN for overloaded OSDs. and if the situation is untenable... well they're not going to fix the problem this way

I'm gonna go out on a limb and say that wasnt the cause of the issues you observed.

Falk R. · Aug 13, 2024

I don't suspect the root cause either, but it has made the situation worse instead of better. Since then I have been very careful with these options.

Search

Search

ceph wanring pgs not deep-scrubbed in time

powersupport

Active Member

Attachments

gurubert

Famous Member

powersupport

Active Member

gurubert

Famous Member

powersupport

Active Member

Attachments

alexskysilk

Distinguished Member

Falk R.

Famous Member

powersupport

Active Member

gurubert

Famous Member

Falk R.

Famous Member

powersupport

Active Member

Falk R.

Famous Member

alexskysilk

Distinguished Member

Falk R.

Famous Member

alexskysilk

Distinguished Member

Falk R.

Famous Member