Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

wuwzy · Jun 3, 2025

There have been several updates, but all the upgrades cannot fix this problem. It may be caused by the Linux kernel upgrade or the Ceph upgrade. I hope it can be fixed in the next update.

BloodBlight · Jun 4, 2025

Just to chime in, I am also seeing this on my homelab. Everything was fine until I upgrades to 19 (from 18). And I think there is an issue, not just alerting, as when it gets bad enough, the MDS servers start getting "cranky" (slow ops), and won't become healthy again until I restart the OSDs that are running slow.

RockyRen · Jun 6, 2025

BloodBlight said:
Just to chime in, I am also seeing this on my homelab. Everything was fine until I upgrades to 19 (from 18). And I think there is an issue, not just alerting, as when it gets bad enough, the MDS servers start getting "cranky" (slow ops), and won't become healthy again until I restart the OSDs that are running slow.

I also encountered this problem. I would like to ask if I restart the osd after it occurs, and the alarm will disappear after a few days. If this problem occurs in an osd in the future, restart the osd to temporarily solve the problem.

RockyRen · Jun 6, 2025

RockyRen said:
I also encountered this problem. I would like to ask if I restart the osd after it occurs, and the alarm will disappear after a few days. If this problem occurs in an osd in the future, restart the osd to temporarily solve the problem.

Sure enough, the alarm disappeared after I restarted osd.

arteck · Jun 7, 2025

wait 2 days and you see the info again

guruevi · Jun 7, 2025

What does:
ceph daemon osd.<id> dump_historic_ops

Tell you.

This error should occur when you OSD cannot write their queue for 30s. This typically means serious hardware or network issues. Eg after a full cluster reboot/recovery, this may happen if OSD are still starting or slow to start or just 1 drive is lagging causing the others not to start.

arteck · Jun 7, 2025

strange is under 18.2.x was no problem with same hardware constellation.. why now ??
my unifi network works since 8 years.. no issue here

btw: the last reef version 18.2.6 has the same problem..

L

Post in thread 'ceph after upgrade to 18.2.6 - observed slow operation indications in BlueStore'

May 8, 2025

Hi there,

that should explain it:
https://www.mail-archive.com/ceph-users@ceph.io/msg29871.html
Whether it's a problem in your area is something you have to decide I think.

BR

arteck · Jun 8, 2025

@g yesterday after restart it was no longer there.. now same issue

guruevi said:
ceph daemon osd.<id> dump_historic_ops

guruevi · Jun 10, 2025

Looks like you may have a bad drive, these ops complete in 120ms which is long even for spinning hard drives, with a spinning drive, you would expect <20ms +/- network latency of ~1-2ms. Are you using SMR drives? These seem to be mostly around the time you are rebuilding an OSD, which can indeed put very high load on both drives and network.

Again, either you are severely overloading your network leading to packet drops at this time, or your drive is failing causing some operations to take very long, you won’t notice in most cases as Ceph will redirect operations that don’t complete in time, but rebuilding does require the drive to be functional.

Given this is generally around commit time to disk, I would suspect the disk.

FrancisS · Jun 10, 2025

Hello,

Like I say in the first post no problem with Ceph 19.2.0, I have the messages with Ceph 19.2.1.

I have 6 clusters of 6 nodes , 3 clusters on one site (production) and 3 clusters on other site (rescue).

No messages on the rescue site only on the production site (where I have more I/Os)

All the cluster have the "same" configuration.

System storage: 2 SCSI 10k rpm disks (MD)
Ceph storage: N SATA SSD disks
Ceph Network: 4 clusters with (shared with VMs) 2x40Gb (Mellanox), 2 clusters with Ceph dedicated 2x10Gb.

At this moment on one production 40Gb cluster with for Ceph 3 SSD for each nodes I have 7 OSD with message "slow..."
(I have also "slow..." messages on the others production 40Gb and 10GB)

# ceph -s
cluster:
id: XXXX
health: HEALTH_WARN
7 OSD(s) experiencing slow operations in BlueStore

services:
mon: 3 daemons, quorum XXX1,XXX3,XXX5 (age 9d)
mgr: XXX2(active, since 6w), standbys: XXX4, XXX6
osd: 18 osds: 18 up (since 9h), 18 in (since 8w)

data:
pools: 2 pools, 513 pgs
objects: 1.15M objects, 4.2 TiB
usage: 12 TiB used, 7.8 TiB / 20 TiB avail
pgs: 513 active+clean

io:
client: 807 MiB/s rd, 6.6 MiB/s wr, 18.18k op/s rd, 623 op/s wr

Best regards.
Francis

arteck · Jun 10, 2025

guruevi said:
Looks like you may have a bad drive, these ops complete in 120ms which is long even for spinning hard drives, with a spinning drive, you would expect <20ms +/- network latency of ~1-2ms. Are you using SMR drives? These seem to be mostly around the time you are rebuilding an OSD, which can indeed put very high load on both drives and network.

Again, either you are severely overloading your network leading to packet drops at this time, or your drive is failing causing some operations to take very long, you won’t notice in most cases as Ceph will redirect operations that don’t complete in time, but rebuilding does require the drive to be functional.

Given this is generally around commit time to disk, I would suspect the disk.

ohmmm...I know you don't want to hear it but this is a SSD drive. and befor 18.2.6 it works without issue
and nope no SMR method.. i think its only for HDD

guruevi · Jun 11, 2025

What brand and model? 120ms is a really long time, for SSD you would expect <2ms. I don’t think it was ever ‘without issue’, you just never noticed or the new versions have a slightly different load pattern that trigger it, or you added more load. You can upgrade to Squid and see if it improves any, but the logs are pretty clear.

FrancisS · Jun 11, 2025

guruevi said:
What brand and model? 120ms is a really long time, for SSD you would expect <2ms. I don’t think it was ever ‘without issue’, you just never noticed or the new versions have a slightly different load pattern that trigger it, or you added more load. You can upgrade to Squid and see if it improves any, but the logs are pretty clear.

Hello,

I have messages "slow..." also with squid 19.2.1 (no messages with squid 19.2.0)

For my 3 clusters with the messages at the moment all the disks with osd slow message are Crucial "CT1000MX500SSD1"...

For the 3 other clusters with no messages I have only Intel and Samsung disks (but lessssss I/Os).

Best regards.

Francis

guruevi · Jun 11, 2025

This is a common issue with the (nearly decade old) MX500s. They are not very good in general, even for desktop use, they have major firmware issues, glitch out even in desktops. You can see if updating the firmware resolves the issue, but also check your SMART values (smartctl -a /dev/xxx) I will guess you have tons of pending sectors, they are probably 'worn out' to some extent as well. One of the recommendations is to start the computer but not start using them for about a minute so the firmware can boot up properly.

FrancisS · Jun 11, 2025

guruevi said:
This is a common issue with the (nearly decade old) MX500s. They are not very good in general, even for desktop use, they have major firmware issues, glitch out even in desktops. You can see if updating the firmware resolves the issue, but also check your SMART values (smartctl -a /dev/xxx) I will guess you have tons of pending sectors, they are probably 'worn out' to some extent as well. One of the recommendations is to start the computer but not start using them for about a minute so the firmware can boot up properly.

Thank you we have planned to update the firmware for most of the disks from M3CR043 to M3CR046

arteck · Jun 15, 2025

guruevi said:
What brand and model? 120ms is a really long time, for SSD you would expect <2ms. I don’t think it was ever ‘without issue’, you just never noticed or the new versions have a slightly different load pattern that trigger it, or you added more load. You can upgrade to Squid and see if it improves any, but the logs are pretty clear.

the issue was with crucial CT240BX500SSD1

I have changed it to WD blue..

i'm watching

BloodBlight · Jun 18, 2025

An update for my "Also seeing this." post.

I do not have the Crucial drives mentioned, but I do have a bunch of Samsung MZ6ER400HAGL-003 disks. All of the groups reporting this where HDD OSDs with their DB on these disks. Since then THREE of these disks failed! Seems like a firmware bug, and they make it VERY hard to update them. They fail with a generic:

Code:

FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x73 [asc=5d, ascq=73]

But have 100% endurance left... Not ALL of the disks that have failed at reporting this, but all ARE reporting issues in dmesg. Look for "Buffer I/O error" or "Media impending failure endurance limit met".

This COULD just be a doomsday firmware bug (I might loose all my data, but it's healing), it COULD be a firmware bug being triggered by something in Ceph 19, or maybe Ceph 19 just "caught" the issue just before failure...

If you are seeing this, I recommend looking for drive health issues. No idea if this is a caught by, or caused by situation. But I will be phasing these disks out (assuming my data recovers, still have a few that I can't "out" just yet...).

wuwzy · Jun 19, 2025

Just updated. Today's push of ceph 19.2.2 will take 2 days to verify whether this problem is solved. I hope it will be good news.

FrancisS · Jun 20, 2025

FrancisS said:
Thank you we have planned to update the firmware for most of the disks from M3CR043 to M3CR046

The new microcode can not solve the "problem", hope the Ceph squid 19.2.2 is "better" then 19.2.1...

BloodBlight · Jun 21, 2025

After data recovery, and upgrading to 19.2.2, I am now getting this one of my pure SSD class pools (no wall or db, basic replication only):

Code:

[WRN] BLUESTORE_SLOW_OP_ALERT: 1 OSD(s) experiencing slow operations in BlueStore
     osd.9 observed slow operation indications in BlueStore
[WRN] DB_DEVICE_STALLED_READ_ALERT: 1 OSD(s) experiencing stalled read in db device of BlueFS
     osd.9 observed stalled read indications in DB device

As best as I can tell, this disk is perfectly healthy. :/

Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

Well-Known Member

Member

New Member

New Member

Active Member

Well-Known Member

Active Member

Active Member

Attachments

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Active Member

Member

Well-Known Member

Well-Known Member

Member

We value your privacy