Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

FrancisS · Apr 10, 2025

Hello,

On our clusters 8.4.0 since the upgrade of Ceph 19.2.0 to 19.2.1 (pve2/pve3) I have warning messages.

X OSD(s) experiencing slow operations in BlueStore

osd.x observed slow operation indications in BlueStore
osd.y observed slow operation indications in BlueStore
...

I applied the "solution" at https://github.com/rook/rook/discussions/15403

ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true

But this not resolve the "problem".

Best regards.
Francis

FrancisS · Apr 11, 2025

Hello,

Finally I remove "bdev_async_discard_threads" and "bdev_enable_discard" because I think that create I/O errors on some ssd disks.

[172265.244864] critical target error, dev sdd, sector 34601544 op 0x3DISCARD) flags 0x800 phys_seg 1 prio class 0
[172265.320785] sd 0:0:8:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[172265.320792] sd 0:0:8:0: [sda] tag#109 Sense Key : Illegal Request [current]
[172265.320795] sd 0:0:8:0: [sda] tag#109 Add. Sense: Invalid field in parameter list
[172265.320798] sd 0:0:8:0: [sda] tag#109 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00

Best regards.
Francis

neodemus · Apr 11, 2025

define "remove" please
what exactly did you do ?

this?

Bash:

 ceph config rm global bdev_async_discard_threads
 ceph config rm global bdev_enable_discard

viewable via:
ceph config dump

with or without this, I still have those OSD performance errors
------------------------------------------
Edit No2:
What helped:
- adding "bdev_async_discard_threads" and "bdev_enable_discard"
As suggested did not work
- removing "bdev_async_discard_threads" and "bdev_enable_discard"
also did nothing
BUT
after I removed it, I freed every Host (8x2 stretch Cluster) from VMs
- shutdown now -h
waited a whole minute
- started Host up again
!!! a simple host reboot did not work !!!

FrancisS · Apr 12, 2025

neodemus said:
define "remove" please
what exactly did you do ?

this?

Bash:

ceph config rm global bdev_async_discard_threads ceph config rm global bdev_enable_discard

Yes I do this

neodemus said:
viewable via:
ceph config dump

with or without this, I still have those OSD performance errors
------------------------------------------
Edit No2:
What helped:
- adding "bdev_async_discard_threads" and "bdev_enable_discard"
As suggested did not work

Do not work and new I/O DISCARD errors

neodemus said:
- removing "bdev_async_discard_threads" and "bdev_enable_discard"
also did nothing

Do not work

neodemus said:
BUT
after I removed it, I freed every Host (8x2 stretch Cluster) from VMs
- shutdown now -h
waited a whole minute
- started Host up again
!!! a simple host reboot did not work !!!

Yes for the moment no more I/O DISCARD errors, reboot do not change the OSD "slow" problem.

The OSD "slow" warning come after the Ceph upgrade from 19.2.0 to 19.2.1.

Best regards.
Francis

YAGA · Apr 13, 2025

FrancisS said:
Hello,

On our clusters 8.4.0 since the upgrade of Ceph 19.2.0 to 19.2.1 (pve2/pve3) I have warning messages.

I applied the "solution" at https://github.com/rook/rook/discussions/15403

But this not resolve the "problem".

Best regards.
Francis

Hi Francis,

Could you restart with gui or cli the osd concerned by the slow warning message ?

Please check again and keep us informed.

Regards,

FrancisS · Apr 13, 2025

Thank you Yaga,

Allready done, when I restart the OSDs process the slow warning disappear, but after a short time I have again the message.

Best regards.
Francis

FrancisS · Apr 14, 2025

Hello,

l installed the PVE 8.4.1 en restarted all the "slow" OSD, for the moment no "slow" warning.

Best regards.
Francis

FrancisS · Apr 14, 2025

Hello,

Finally I have again de "slow" warning.

Francis

wassupluke · Apr 14, 2025

Same issue here. Following.

rolandkim.amaro · Apr 15, 2025

Hello po,

I encounter this warning aswell:

Does anyone of you resolved this issues? I encounter this after I update and upgrade the PVE to 8.3.5 and my ceph version is now 17.2.8.

Best regards,

Kim

Edwin_prox · Apr 15, 2025

Same here (osd.0 and osd.1)
osd.0 observed slow operation indications in Bluestore
and
osd.0 observed stalled read indications in DB device

I'm on pve-manager 8.4.1, kernel 6.8.12-9-pve and ceph version 17.2.8

wuwzy · Apr 16, 2025

Yes, I have the same problem.

Environment 3 nodes:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3

Latest patches have been applied.

Error message:
HEALTH_WARN: 3 OSD(s) experiencing slow operations in BlueStore
osd.8 observed slow operation indications in BlueStore
osd.14 observed slow operation indications in BlueStore
osd.15 observed slow operation indications in BlueStore

Before this, there were 6 OSD errors. I restarted the cluster nodes one by one and they were back to health, but the next day, the prompt appeared again.

Petr Svacina · Apr 16, 2025

Hi, guys same issue here after upgrade to CEPH 19.2.1...

Environment 3 nodes:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.4.1
ceph: 19.2.1-pve3

HEALTH_WARN: 2 OSD(s) experiencing slow operations in BlueStore
osd.9 observed slow operation indications in BlueStore
osd.15 observed slow operation indications in BlueStore

Any progress ? Thanks.

itNGO · Apr 16, 2025

Erasure Coding or Mirror?

Petr Svacina · Apr 16, 2025

Ceph bug ? https://github.com/rook/rook/discussions/15403

Petr Svacina · Apr 16, 2025

itNGO said:
Erasure Coding or Mirror?

?

itNGO · Apr 16, 2025

Petr Svacina said:
?

CLI-> pveceph pool get POOLNAME

Petr Svacina · Apr 16, 2025

itNGO said:
CLI-> pveceph pool get POOLNAME

root@pve1:~# pveceph pool get cephfs_data
┌────────────────────────┬─────────────────┐
│ key │ value │
╞════════════════════════╪═════════════════╡
│ crush_rule │ replicated_rule │
├────────────────────────┼─────────────────┤
│ fast_read │ 0 │
├────────────────────────┼─────────────────┤
│ hashpspool │ 1 │
├────────────────────────┼─────────────────┤
│ id │ 1 │
├────────────────────────┼─────────────────┤
│ min_size │ 2 │
├────────────────────────┼─────────────────┤
│ name │ cephfs_data │
├────────────────────────┼─────────────────┤
│ nodeep-scrub │ 0 │
├────────────────────────┼─────────────────┤
│ nodelete │ 0 │
├────────────────────────┼─────────────────┤
│ nopgchange │ 0 │
├────────────────────────┼─────────────────┤
│ noscrub │ 0 │
├────────────────────────┼─────────────────┤
│ nosizechange │ 0 │
├────────────────────────┼─────────────────┤
│ pg_autoscale_mode │ on │
├────────────────────────┼─────────────────┤
│ pg_num │ 512 │
├────────────────────────┼─────────────────┤
│ pg_num_min │ 128 │
├────────────────────────┼─────────────────┤
│ pgp_num │ 512 │
├────────────────────────┼─────────────────┤
│ size │ 3 │
├────────────────────────┼─────────────────┤
│ use_gmt_hitset │ 1 │
├────────────────────────┼─────────────────┤
│ write_fadvise_dontneed │ 0 │
└────────────────────────┴─────────────────┘

FrancisS · Apr 16, 2025

Hello,
Ceph bug ? https://github.com/rook/rook/discussions/15403

The solution do no solve the problem.

# pveceph pool get pool_vm
┌────────────────────────┬─────────────────┐
│ key │ value │
╞════════════════════════╪═════════════════╡
│ crush_rule │ replicated_rule │
├────────────────────────┼─────────────────┤
│ fast_read │ 0 │
├────────────────────────┼─────────────────┤
│ hashpspool │ 1 │
├────────────────────────┼─────────────────┤
│ id │ 2 │
├────────────────────────┼─────────────────┤
│ min_size │ 2 │
├────────────────────────┼─────────────────┤
│ name │ pool_vm │
├────────────────────────┼─────────────────┤
│ nodeep-scrub │ 0 │
├────────────────────────┼─────────────────┤
│ nodelete │ 0 │
├────────────────────────┼─────────────────┤
│ nopgchange │ 0 │
├────────────────────────┼─────────────────┤
│ noscrub │ 0 │
├────────────────────────┼─────────────────┤
│ nosizechange │ 0 │
├────────────────────────┼─────────────────┤
│ pg_autoscale_mode │ on │
├────────────────────────┼─────────────────┤
│ pg_num │ 512 │
├────────────────────────┼─────────────────┤
│ pgp_num │ 512 │
├────────────────────────┼─────────────────┤
│ size │ 3 │
├────────────────────────┼─────────────────┤
│ target_size_ratio │ 1 │
├────────────────────────┼─────────────────┤
│ use_gmt_hitset │ 1 │
├────────────────────────┼─────────────────┤
│ write_fadvise_dontneed │ 0 │
└────────────────────────┴─────────────────┘

Best regards.
Francis

wuwzy · Apr 17, 2025

I am doing a test. Steps:

1. Use ceph config dump first to check.

2. Enter the command again
ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true

3. Use ceph config dump again to check and see if it works.

I have waited for 30 minutes now. The warning is not automatically eliminated.

Now I start to restart a node, which will temporarily eliminate the problem, and usually the next day, the error prompt will appear again. I need time to report back to you.

If the command cannot solve the problem, you can use the following command to delete the two added configurations and restore them to their original state.

ceph config rm global bdev_async_discard_threads
ceph config rm global bdev_enable_discard

Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

Well-Known Member

Well-Known Member

New Member

Well-Known Member

Renowned Member

Well-Known Member

Well-Known Member

Well-Known Member

New Member

Member

New Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Well-Known Member

We value your privacy