Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

FrancisS

Well-Known Member
Apr 26, 2019
34
0
46
59
Hello,

On our clusters 8.4.0 since the upgrade of Ceph 19.2.0 to 19.2.1 (pve2/pve3) I have warning messages.

X OSD(s) experiencing slow operations in BlueStore

osd.x observed slow operation indications in BlueStore
osd.y observed slow operation indications in BlueStore
...

I applied the "solution" at https://github.com/rook/rook/discussions/15403

ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true

But this not resolve the "problem".

Best regards.
Francis
 
Last edited:
Hello,

Finally I remove "bdev_async_discard_threads" and "bdev_enable_discard" because I think that create I/O errors on some ssd disks.

[172265.244864] critical target error, dev sdd, sector 34601544 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[172265.320785] sd 0:0:8:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[172265.320792] sd 0:0:8:0: [sda] tag#109 Sense Key : Illegal Request [current]
[172265.320795] sd 0:0:8:0: [sda] tag#109 Add. Sense: Invalid field in parameter list
[172265.320798] sd 0:0:8:0: [sda] tag#109 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00

Best regards.
Francis
 
define "remove" please
what exactly did you do ?

this?
Bash:
 ceph config rm global bdev_async_discard_threads
 ceph config rm global bdev_enable_discard

viewable via:
ceph config dump


with or without this, I still have those OSD performance errors
------------------------------------------
Edit No2:
What helped:
- adding "bdev_async_discard_threads" and "bdev_enable_discard"
As suggested did not work
- removing "bdev_async_discard_threads" and "bdev_enable_discard"
also did nothing
BUT
after I removed it, I freed every Host (8x2 stretch Cluster) from VMs
- shutdown now -h
waited a whole minute
- started Host up again
!!! a simple host reboot did not work !!!
 
Last edited:
define "remove" please
what exactly did you do ?

this?
Bash:
 ceph config rm global bdev_async_discard_threads
 ceph config rm global bdev_enable_discard
Yes I do this
viewable via:
ceph config dump


with or without this, I still have those OSD performance errors
------------------------------------------
Edit No2:
What helped:
- adding "bdev_async_discard_threads" and "bdev_enable_discard"
As suggested did not work
Do not work and new I/O DISCARD errors
- removing "bdev_async_discard_threads" and "bdev_enable_discard"
also did nothing
Do not work
BUT
after I removed it, I freed every Host (8x2 stretch Cluster) from VMs
- shutdown now -h
waited a whole minute
- started Host up again
!!! a simple host reboot did not work !!!
Yes for the moment no more I/O DISCARD errors, reboot do not change the OSD "slow" problem.

The OSD "slow" warning come after the Ceph upgrade from 19.2.0 to 19.2.1.

Best regards.
Francis
 
Thank you Yaga,

Allready done, when I restart the OSDs process the slow warning disappear, but after a short time I have again the message.

Best regards.
Francis
 
Hello,

l installed the PVE 8.4.1 en restarted all the "slow" OSD, for the moment no "slow" warning.

Best regards.
Francis
 
Hello po,

I encounter this warning aswell:

1744676639662.png

Does anyone of you resolved this issues? I encounter this after I update and upgrade the PVE to 8.3.5 and my ceph version is now 17.2.8.

Best regards,

Kim
 
Same here (osd.0 and osd.1)
osd.0 observed slow operation indications in Bluestore
and
osd.0 observed stalled read indications in DB device

I'm on pve-manager 8.4.1, kernel 6.8.12-9-pve and ceph version 17.2.8
 
Yes, I have the same problem.

Environment 3 nodes:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3

Latest patches have been applied.

Error message:
HEALTH_WARN: 3 OSD(s) experiencing slow operations in BlueStore
osd.8 observed slow operation indications in BlueStore
osd.14 observed slow operation indications in BlueStore
osd.15 observed slow operation indications in BlueStore

Before this, there were 6 OSD errors. I restarted the cluster nodes one by one and they were back to health, but the next day, the prompt appeared again.
 
Hi, guys same issue here after upgrade to CEPH 19.2.1...

Environment 3 nodes:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.4.1
ceph: 19.2.1-pve3

HEALTH_WARN: 2 OSD(s) experiencing slow operations in BlueStore
osd.9 observed slow operation indications in BlueStore
osd.15 observed slow operation indications in BlueStore

Any progress ? Thanks.
 
Last edited:
Erasure Coding or Mirror?
 
CLI-> pveceph pool get POOLNAME
root@pve1:~# pveceph pool get cephfs_data
┌────────────────────────┬─────────────────┐
│ key │ value │
╞════════════════════════╪═════════════════╡
│ crush_rule │ replicated_rule │
├────────────────────────┼─────────────────┤
│ fast_read │ 0 │
├────────────────────────┼─────────────────┤
│ hashpspool │ 1 │
├────────────────────────┼─────────────────┤
│ id │ 1 │
├────────────────────────┼─────────────────┤
│ min_size │ 2 │
├────────────────────────┼─────────────────┤
│ name │ cephfs_data │
├────────────────────────┼─────────────────┤
│ nodeep-scrub │ 0 │
├────────────────────────┼─────────────────┤
│ nodelete │ 0 │
├────────────────────────┼─────────────────┤
│ nopgchange │ 0 │
├────────────────────────┼─────────────────┤
│ noscrub │ 0 │
├────────────────────────┼─────────────────┤
│ nosizechange │ 0 │
├────────────────────────┼─────────────────┤
│ pg_autoscale_mode │ on │
├────────────────────────┼─────────────────┤
│ pg_num │ 512 │
├────────────────────────┼─────────────────┤
│ pg_num_min │ 128 │
├────────────────────────┼─────────────────┤
│ pgp_num │ 512 │
├────────────────────────┼─────────────────┤
│ size │ 3 │
├────────────────────────┼─────────────────┤
│ use_gmt_hitset │ 1 │
├────────────────────────┼─────────────────┤
│ write_fadvise_dontneed │ 0 │
└────────────────────────┴─────────────────┘
 
Hello,
Ceph bug ? https://github.com/rook/rook/discussions/15403

The solution do no solve the problem.

# pveceph pool get pool_vm
┌────────────────────────┬─────────────────┐
│ key │ value │
╞════════════════════════╪═════════════════╡
│ crush_rule │ replicated_rule │
├────────────────────────┼─────────────────┤
│ fast_read │ 0 │
├────────────────────────┼─────────────────┤
│ hashpspool │ 1 │
├────────────────────────┼─────────────────┤
│ id │ 2 │
├────────────────────────┼─────────────────┤
│ min_size │ 2 │
├────────────────────────┼─────────────────┤
│ name │ pool_vm │
├────────────────────────┼─────────────────┤
│ nodeep-scrub │ 0 │
├────────────────────────┼─────────────────┤
│ nodelete │ 0 │
├────────────────────────┼─────────────────┤
│ nopgchange │ 0 │
├────────────────────────┼─────────────────┤
│ noscrub │ 0 │
├────────────────────────┼─────────────────┤
│ nosizechange │ 0 │
├────────────────────────┼─────────────────┤
│ pg_autoscale_mode │ on │
├────────────────────────┼─────────────────┤
│ pg_num │ 512 │
├────────────────────────┼─────────────────┤
│ pgp_num │ 512 │
├────────────────────────┼─────────────────┤
│ size │ 3 │
├────────────────────────┼─────────────────┤
│ target_size_ratio │ 1 │
├────────────────────────┼─────────────────┤
│ use_gmt_hitset │ 1 │
├────────────────────────┼─────────────────┤
│ write_fadvise_dontneed │ 0 │
└────────────────────────┴─────────────────┘

Best regards.
Francis
 
I am doing a test. Steps:

1. Use ceph config dump first to check.

2. Enter the command again
ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true

3. Use ceph config dump again to check and see if it works.

I have waited for 30 minutes now. The warning is not automatically eliminated.

Now I start to restart a node, which will temporarily eliminate the problem, and usually the next day, the error prompt will appear again. I need time to report back to you.


If the command cannot solve the problem, you can use the following command to delete the two added configurations and restore them to their original state.

ceph config rm global bdev_async_discard_threads

ceph config rm global bdev_enable_discard