ceph wanring pgs not deep-scrubbed in time

powersupport

Active Member
Jan 18, 2020
277
2
38
30
We’ve been receiving warnings about 621 PGs not being deep-scrubbed and 621 PGs not being scrubbed in time for a while now, and the issue doesn't seem to be resolving. Is there a command that can address this problem? There are over 600 PGs showing this error—will manually scrubbing each PG resolve the issue?

Thank you
 

Attachments

  • 397a7ba1-47a1-42b9-8269-09f948f99b59.png
    397a7ba1-47a1-42b9-8269-09f948f99b59.png
    21.1 KB · Views: 22
Scrubbing will not run if the load on the OSD nodes is above a certain threshold.

If you run a hyperconverged cluster you should adjust this threshold.

Look into the Ceph documentation about the configuration variable's name.
 
I couldn't understand the recommended configuration, can anyone advise the same, we have 7 node cluster with 35 ODSs

Thank you
 
The relevant part of the Ceph documentation: https://docs.ceph.com/en/reef/rados/configuration/osd-config-ref/#scrubbing

The config variable is called osd_scrub_load_threshold and has a default value of 0.5. This is the load per CPU.
If you have 12 CPUs scrubbing will only start if the total load is below 6.
When running VMs on the same nodes the load may be significantly higher so you need to increase this config setting.

ceph config set global osd_scrub_load_threshold 2.0
 
Despite altering the values and seeing scrubbing processes in progress, the warning about 621 unsscrubbed and not deep scrubbed pgs persists. I cannot determine the cause of this discrepancy yet. no idea
 

Attachments

  • 835b7a61-066e-4736-8940-002a84d39616.png
    835b7a61-066e-4736-8940-002a84d39616.png
    34.9 KB · Views: 16
I had such an issue which was ended up being due to pgs on disks that should have been failed out. Ceph by default is not smart enough to fail OSDs if the underlying disk has not failed completely- zfs would have kicked out a drive with multiple read faults but ceph doesnt.

smart test all your disks.
 
You have some scrubs active, so everything seems to be OK with the configuration for now.
The most common causes are either disks that are too slow or too high a load on the nodes.
Of course, it can also be defective disks as the previous member wrote.

Can you provide some information on what disks are used as OSD and what the network setup looks like?
 
  • Like
Reactions: gurubert
HI,

All of the OSD disks are HDDs and the network port is 10 Gbps, the issue was only present recently, there was no issue before. I know HDD is slower, we use this cluster for storage purpose only , but it seems like unrelated to the issue.

There is a total of 1025 PGs and the issues only show for 621 all the time
 
Last edited:
HI,

All of the OSD disks are HDDs and the network port is 10 Gbps, the issue was only present recently, there was no issue before. I know HDD is slower, we use this cluster for storage purpose only , but it seems like unrelated to the issue.
Then this is completely normal behavior.
I hope you have configured fast SSDs as Bluestore with sufficient capacity? WAL DB & Log need at least 1% of the storage capacity, according to RedHat documentation 4% is recommended.
If you don't have enough Bluestore memory, then WAL DB and Log will be moved to the HDDs and the whole thing will be very slow. The first time you notice this is when the scrubbings are not finished.
Unfortunately I have seen this happen several times.
 
Strange, I managed to fix it by running the command "reweight-by-utilization" , not sure why, but it fixed the issue
 
Strange, I managed to fix it by running the command "reweight-by-utilization" , not sure why, but it fixed the issue
Please Monitor the Usage on your OSDs. This Option can result in fill up single OSDs and bigger Problems when one OSD is Full.
 
uhh... thats not what reweight-by-utilization does. the default behavior is max change of +0.05 weight, and thats only to the LESS utilized OSDs.
Unfortunately, I have seen it differently. But I also don't know whether other parameters were changed, as it was a setup from another service provider at the time.
The customer already had over 80% pool filling and wanted a better rebalancing. After the reweight-by-utilization option was set, some OSDs ran up to 90% full and we had a lot of trouble.

With a sensible setup and moderate pool filling, this is not a problem, but practice always shows you different things.
 
Unfortunately, I have seen it differently. But I also don't know whether other parameters were changed, as it was a setup from another service provider at the time.
Thats the key, really. unless the user deliberately created an untenable situation, reweighting can only push relative weight DOWN for overloaded OSDs. and if the situation is untenable... well they're not going to fix the problem this way ;) I'm gonna go out on a limb and say that wasnt the cause of the issues you observed.
 
I don't suspect the root cause either, but it has made the situation worse instead of better. Since then I have been very careful with these options.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!