Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

There have been several updates, but all the upgrades cannot fix this problem. It may be caused by the Linux kernel upgrade or the Ceph upgrade. I hope it can be fixed in the next update.
 
  • Like
Reactions: RockyRen
Just to chime in, I am also seeing this on my homelab. Everything was fine until I upgrades to 19 (from 18). And I think there is an issue, not just alerting, as when it gets bad enough, the MDS servers start getting "cranky" (slow ops), and won't become healthy again until I restart the OSDs that are running slow.
 
Just to chime in, I am also seeing this on my homelab. Everything was fine until I upgrades to 19 (from 18). And I think there is an issue, not just alerting, as when it gets bad enough, the MDS servers start getting "cranky" (slow ops), and won't become healthy again until I restart the OSDs that are running slow.
I also encountered this problem. I would like to ask if I restart the osd after it occurs, and the alarm will disappear after a few days. If this problem occurs in an osd in the future, restart the osd to temporarily solve the problem.
 
I also encountered this problem. I would like to ask if I restart the osd after it occurs, and the alarm will disappear after a few days. If this problem occurs in an osd in the future, restart the osd to temporarily solve the problem.
Sure enough, the alarm disappeared after I restarted osd.
1749172077157.png
 
What does:
ceph daemon osd.<id> dump_historic_ops

Tell you.

This error should occur when you OSD cannot write their queue for 30s. This typically means serious hardware or network issues. Eg after a full cluster reboot/recovery, this may happen if OSD are still starting or slow to start or just 1 drive is lagging causing the others not to start.
 
Last edited:
Looks like you may have a bad drive, these ops complete in 120ms which is long even for spinning hard drives, with a spinning drive, you would expect <20ms +/- network latency of ~1-2ms. Are you using SMR drives? These seem to be mostly around the time you are rebuilding an OSD, which can indeed put very high load on both drives and network.

Again, either you are severely overloading your network leading to packet drops at this time, or your drive is failing causing some operations to take very long, you won’t notice in most cases as Ceph will redirect operations that don’t complete in time, but rebuilding does require the drive to be functional.

Given this is generally around commit time to disk, I would suspect the disk.
 
Last edited:
Hello,

Like I say in the first post no problem with Ceph 19.2.0, I have the messages with Ceph 19.2.1.

I have 6 clusters of 6 nodes , 3 clusters on one site (production) and 3 clusters on other site (rescue).

No messages on the rescue site only on the production site (where I have more I/Os)

All the cluster have the "same" configuration.

System storage: 2 SCSI 10k rpm disks (MD)
Ceph storage: N SATA SSD disks
Ceph Network: 4 clusters with (shared with VMs) 2x40Gb (Mellanox), 2 clusters with Ceph dedicated 2x10Gb.

At this moment on one production 40Gb cluster with for Ceph 3 SSD for each nodes I have 7 OSD with message "slow..."
(I have also "slow..." messages on the others production 40Gb and 10GB)

# ceph -s
cluster:
id: XXXX
health: HEALTH_WARN
7 OSD(s) experiencing slow operations in BlueStore

services:
mon: 3 daemons, quorum XXX1,XXX3,XXX5 (age 9d)
mgr: XXX2(active, since 6w), standbys: XXX4, XXX6
osd: 18 osds: 18 up (since 9h), 18 in (since 8w)

data:
pools: 2 pools, 513 pgs
objects: 1.15M objects, 4.2 TiB
usage: 12 TiB used, 7.8 TiB / 20 TiB avail
pgs: 513 active+clean

io:
client: 807 MiB/s rd, 6.6 MiB/s wr, 18.18k op/s rd, 623 op/s wr

Best regards.
Francis
 
Last edited:
Looks like you may have a bad drive, these ops complete in 120ms which is long even for spinning hard drives, with a spinning drive, you would expect <20ms +/- network latency of ~1-2ms. Are you using SMR drives? These seem to be mostly around the time you are rebuilding an OSD, which can indeed put very high load on both drives and network.

Again, either you are severely overloading your network leading to packet drops at this time, or your drive is failing causing some operations to take very long, you won’t notice in most cases as Ceph will redirect operations that don’t complete in time, but rebuilding does require the drive to be functional.

Given this is generally around commit time to disk, I would suspect the disk.
ohmmm...I know you don't want to hear it but this is a SSD drive. and befor 18.2.6 it works without issue
and nope no SMR method.. i think its only for HDD
 
Last edited:
What brand and model? 120ms is a really long time, for SSD you would expect <2ms. I don’t think it was ever ‘without issue’, you just never noticed or the new versions have a slightly different load pattern that trigger it, or you added more load. You can upgrade to Squid and see if it improves any, but the logs are pretty clear.
 
What brand and model? 120ms is a really long time, for SSD you would expect <2ms. I don’t think it was ever ‘without issue’, you just never noticed or the new versions have a slightly different load pattern that trigger it, or you added more load. You can upgrade to Squid and see if it improves any, but the logs are pretty clear.
Hello,

I have messages "slow..." also with squid 19.2.1 (no messages with squid 19.2.0)

For my 3 clusters with the messages at the moment all the disks with osd slow message are Crucial "CT1000MX500SSD1"...

For the 3 other clusters with no messages I have only Intel and Samsung disks (but lessssss I/Os).

Best regards.

Francis
 
Last edited:
This is a common issue with the (nearly decade old) MX500s. They are not very good in general, even for desktop use, they have major firmware issues, glitch out even in desktops. You can see if updating the firmware resolves the issue, but also check your SMART values (smartctl -a /dev/xxx) I will guess you have tons of pending sectors, they are probably 'worn out' to some extent as well. One of the recommendations is to start the computer but not start using them for about a minute so the firmware can boot up properly.
 
This is a common issue with the (nearly decade old) MX500s. They are not very good in general, even for desktop use, they have major firmware issues, glitch out even in desktops. You can see if updating the firmware resolves the issue, but also check your SMART values (smartctl -a /dev/xxx) I will guess you have tons of pending sectors, they are probably 'worn out' to some extent as well. One of the recommendations is to start the computer but not start using them for about a minute so the firmware can boot up properly.
Thank you we have planned to update the firmware for most of the disks from M3CR043 to M3CR046
 
What brand and model? 120ms is a really long time, for SSD you would expect <2ms. I don’t think it was ever ‘without issue’, you just never noticed or the new versions have a slightly different load pattern that trigger it, or you added more load. You can upgrade to Squid and see if it improves any, but the logs are pretty clear.
the issue was with crucial CT240BX500SSD1

I have changed it to WD blue..

i'm watching