Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

FrancisS · Apr 17, 2025

Hello,

I do the same, but for me with the "ceph config set ..." I have I/O errors with Samsung, Intel SSD, I do not see I/O errors with Crucial SSD.

[172265.244864] critical target error, dev sdd, sector 34601544 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[172265.320785] sd 0:0:8:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[172265.320792] sd 0:0:8:0: [sda] tag#109 Sense Key : Illegal Request [current]
[172265.320795] sd 0:0:8:0: [sda] tag#109 Add. Sense: Invalid field in parameter list
[172265.320798] sd 0:0:8:0: [sda] tag#109 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00

Best regards.
Francis

jloms · Apr 19, 2025

Hello
Facing the same issue, I think the problem is well described here:
https://www.spinics.net/lists/ceph-users/msg86138.html
In 19.2.1, code has been added for looking at osd slow ops and osd read stalled, raising alarms.
I change this: ceph config set class:hdd bdev_stalled_read_warn_lifetime 3600
The warning is triggered by the backup process, and an hour later the warning disappears.
But I would like to increase bdev_stalled_read_warn_threshold to avoid alerts.
How can I know the value to choose ?
I dont want to mask real problems with a value too high.

aufwiz · Apr 19, 2025

same as me.
follow up.

wuwzy · Monday at 02:59

wuwzy said:
I am doing a test. Steps:

1. Use ceph config dump first to check.

2. Enter the command again
ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true

3. Use ceph config dump again to check and see if it works.

I have waited for 30 minutes now. The warning is not automatically eliminated.

Now I start to restart a node, which will temporarily eliminate the problem, and usually the next day, the error prompt will appear again. I need time to report back to you.

If the command cannot solve the problem, you can use the following command to delete the two added configurations and restore them to their original state.

ceph config rm global bdev_async_discard_threads
ceph config rm global bdev_enable_discard

Happy Monday, I report to you that after a weekend, the error message has disappeared. This command is effective on my small cluster. I did not update any patches during this weekend.
A healthy cluster ceph is back again.

SteveITS · Monday at 03:51

We upgraded to 19.2.1 Friday night, and rebooted all servers. Saturday morning two out of three HDDs (with DB on SSD) had this warning. Without doing anything, when I looked on Sunday (early and again late in the day) the error was gone. No SSD OSDs had the warning.

YAGA · Tuesday at 21:25

Ceph 19.2.2 might fix this issue, please find the change log : https://docs.ceph.com/en/latest/releases/squid/#v19-2-2-squid

Regards

wuwzy · Wednesday at 04:37

After 2 days, I came back and the error appeared again. Ha, it's so magical, first there were 7 osd errors. This morning, it was reduced to 2 osd errors. It seems that the problem still exists. I look forward to the next update patch to solve it.

2 OSD(s) experiencing slow operations in BlueStore

osd.8 observed slow operation indications in BlueStore
osd.17 observed slow operation indications in BlueStore

SteveITS · Wednesday at 16:16

@wuwzy Are yours all on spinning disks?

wuwzy · Thursday at 04:35

SteveITS said:
@wuwzy Are yours all on spinning disks?

HDD 10TB

aychprox · Thursday at 05:48

same issue here with

ceph: 17.2.8-pve2
proxmox-ve: 8.4.0
pve-manager: 8.4.1

slow warning appeared at least 2-3 times a day from random SSD OSD.
restart the OSD will temporary remove the warning.

wuwzy · Thursday at 06:17

aychprox said:
same issue here with

ceph: 17.2.8-pve2
proxmox-ve: 8.4.0
pve-manager: 8.4.1

slow warning appeared at least 2-3 times a day from random SSD OSD.
restart the OSD will temporary remove the warning.

Yes, usually after 1 day, new reminders will slowly appear. I just pushed a core patch and reminded that I need to restart to make the new core take effect. I just finished it, so there are currently 0 errors. I will check again tomorrow and the day after tomorrow to see if it returns to normal.

SteveITS · Thursday at 06:49

@aychprox I thought the warning text was new in 19.2.1? Or did 17 get it also?
Also interesting you saw it on SSDs.

EllerholdAG · Thursday at 07:00

YAGA said:
Ceph 19.2.2 might fix this issue, please find the change log : https://docs.ceph.com/en/latest/releases/squid/#v19-2-2-squid

Regards

The changelog has 1 entry, thats got nothing to do with this warning...

FrancisS · Thursday at 13:28

EllerholdAG said:
The changelog has 1 entry, thats got nothing to do with this warning..

Hi Yaga,

Are you sure because in the 19.2.2 changelog there is only one change (critical) not related with the slow warning ?

squid: rgw: keep the tails when copying object to itself (pr#62711, cbodley)

aychprox · Thursday at 16:51

SteveITS said:
@aychprox I thought the warning text was new in 19.2.1? Or did 17 get it also?
Also interesting you saw it on SSDs.

Yes, no plan to upgrade to 19.2.1 yet.
Only seen this on SSD so far. Initially thought it was caused by bluestore_cache_size and bluestore_cache_kv_ratio, no luck even after adjusted.

wuwzy · Friday at 02:59

4 OSD(s) experiencing slow operations in BlueStore
	osd.7 observed slow operation indications in BlueStore osd.8 observed slow operation indications in BlueStore osd.9 observed slow operation indications in BlueStore osd.15 observed slow operation indications in BlueStore

I continue to follow this topic. Yesterday, I installed all the latest patches and restarted all the nodes on the cluster. Naturally, all the osd errors were cleared. This morning, when I checked, 4 osd errors appeared again. It can be seen that the problem still exists. I don’t know if it will affect the security of the data. If you haven’t upgraded yet, you can wait and see 19.2.1.

Search

Search

Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

FrancisS

Well-Known Member

jloms

Member

aufwiz

New Member

wuwzy

Active Member

SteveITS

Active Member

YAGA

Renowned Member

wuwzy

Active Member

SteveITS

Active Member

wuwzy

Active Member

aychprox

Renowned Member

wuwzy

Active Member

SteveITS

Active Member

EllerholdAG

Member

FrancisS

Well-Known Member

aychprox

Renowned Member

wuwzy

Active Member

We value your privacy