Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

Petr Svacina · Jun 23, 2025

I have upgraded too to 19.2.2 before weekend, but no luck:

HEALTH_WARN: 2 OSD(s) experiencing slow operations in BlueStore
osd.9 observed slow operation indications in BlueStore
osd.15 observed slow operation indications in BlueStore

Shelve5049 · Jun 23, 2025

I experience all of the mentioned BlueStore warnings... Additionally taking snapshots is taking forever now.
This used to be a matter of seconds, now it takes minutes + the snaptrim process loads the cpu for a very long time.

I feel, this is introduced in ceph 19.2.2. I will try and have proper data for this. Just sharing for now, maybe others experience this also.

Petr Svacina · Jun 23, 2025

I am not able to do snapshots on RBD storage, another bug:

https://tracker.ceph.com/issues/61582?next_issue_id=61581

I have had using Ceph long time, but, this going to be worse and worse ...

spirit · Jun 23, 2025

BloodBlight said:
After data recovery, and upgrading to 19.2.2, I am now getting this one of my pure SSD class pools (no wall or db, basic replication only):

Code:

[WRN] BLUESTORE_SLOW_OP_ALERT: 1 OSD(s) experiencing slow operations in BlueStore osd.9 observed slow operation indications in BlueStore [WRN] DB_DEVICE_STALLED_READ_ALERT: 1 OSD(s) experiencing stalled read in db device of BlueFS osd.9 observed stalled read indications in DB device

As best as I can tell, this disk is perfectly healthy. :/

what is your ssd model ?

wuwzy · Jun 24, 2025

The problem still exists, and it is not solved. I can only look at the error message and pray that it will not crash. Haha.

Petr Svacina · Jun 24, 2025

Anyone tried this ?

https://github.com/rook/rook/discussions/15403

Specially this two:

ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true

SteveITS · Jun 24, 2025

Petr Svacina said:
Anyone tried this ?

Several, starting here.

arteck · Jun 25, 2025

little infor from here

after change crucial CT240BX500SSD1 to WD blue is the problem DONE

unixe · Jun 26, 2025

Hi there!

As far as I can read the docs, it's not a disk failure, it's not even an error condition.

This feature was introduced in Reef with 18.2.5 and Squid with 19.2.1.

You can find the documentation here.
German speaking blog post here.

You can adjust the two variables to your needs:

Code:

ceph config set global bluestore_slow_ops_warn_lifetime 21600
ceph config set global bluestore_slow_ops_warn_threshold 5

I would be very careful changing them both or setting thresholds too low, but you're the expert in your environment.
For my stage cluster, it solves the unnecessary noises, but I'm also not dealing with performance issues there, so who knows...
Now I can go forward with prod.

Regards and happy hacking,
Marianne

Petr Svacina · Jun 27, 2025

unixe said:
Hi there!

As far as I can read the docs, it's not a disk failure, it's not even an error condition.

This feature was introduced in Reef with 18.2.5 and Squid with 19.2.1.

You can find the documentation here.
German speaking blog post here.

You can adjust the two variables to your needs:

Code:

ceph config set global bluestore_slow_ops_warn_lifetime 21600 ceph config set global bluestore_slow_ops_warn_threshold 5

I would be very careful changing them both or setting thresholds too low, but you're the expert in your environment.
For my stage cluster, it solves the unnecessary noises, but I'm also not dealing with performance issues there, so who knows...
Now I can go forward with prod.

Regards and happy hacking,
Marianne

I have to say, that I also found this documentation and I did set bluestore_slow_ops_warn_threshold per problematic OSD and warning is gone !

So it is really seems to be a feature ....

BloodBlight · Jun 30, 2025

Petr Svacina said:
I have to say, that I also found this documentation and I did set bluestore_slow_ops_warn_threshold per problematic OSD and warning is gone !

So it is really seems to be a feature ....

Ya, I still think something is still up... I have three SSD OSDs across hosts, different types, brands and controller types all reporting this. I mean MAYBE all thee are being bogged down enough to delay IO for more than a second, but (it not that busy)... Maybe that is some sort of round trip time that includes processes outside of actual read/writes...

Wish the docs said what these values are measured in. "bluestore_slow_ops_warn_threshold" Seems to be defaulted to 1, so I assume 1 second. It looks like the default for "bluestore_slow_ops_warn_lifetime" is only 600 (10 minutes? hours?).

Will experiment here.

aychprox · Jul 8, 2025

Observed for quite some time, it does happened only on SSD, but never happen to NVMe and HDD OSD.
I'm still using ceph: 17.2.8-pve2.

when happen, run in cli:

ceph config set osd.x bluestore_slow_ops_warn_threshold 120

error go away.

However, it come back again randomly on different SSD OSD, even plugged in the new one.

Search

Search

Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

Petr Svacina

Well-Known Member

Shelve5049

New Member

Petr Svacina

Well-Known Member

spirit

Distinguished Member

wuwzy

Well-Known Member

Petr Svacina

Well-Known Member

SteveITS

Active Member

arteck

Active Member

unixe

New Member

Petr Svacina

Well-Known Member

BloodBlight

Member

aychprox

Renowned Member

We value your privacy