OSD errors

Nov 10, 2025
5
1
1
Hello,

we are are experiencing a large number of OSD errors. This errors all occured during backup of the VMs on the cluster. We maneged to get the OSD to work again and stop showing the error, but we want to troubleshoot why this is happening.


Warning messages:
osd.3 observed slow operation indications in BlueStore (for multiple osd-s)
osd.4 observed stalled read indications in DB device
osd.19 crashed on host

root@testname1:~# ceph health detail
HEALTH_WARN 3 OSD(s) experiencing slow operations in BlueStore; 1 OSD(s) experiencing stalled read in db device of BlueFS
[WRN] BLUESTORE_SLOW_OP_ALERT: 3 OSD(s) experiencing slow operations in BlueStore
osd.4 observed slow operation indications in BlueStore
osd.6 observed slow operation indications in BlueStore
osd.8 observed slow operation indications in BlueStore
[WRN] DB_DEVICE_STALLED_READ_ALERT: 1 OSD(s) experiencing stalled read in db device of BlueFS
osd.4 observed stalled read indications in DB device


Troubleshooting steps:
- I checked that all the OSDs are working correctly
- I have confirmed that we do not have bluestore_elastic_shared_blob feature enabled: https://docs.clyso.com/docs/kb/known-bugs/squid/

Environment:
Proxmox 9.0.6
Ceph 19.2.23
 
Last edited:
  • Like
Reactions: flames
We see the same issue after upgrading to 9.x on both our Clusters

- observed stalled read indications in DB device

This usually affects all OSD on the same Host.

If you need special filtered logs - we can provide



we also see this:
[Sat Dec 13 09:24:11 2025] libceph: osd17 (1)XXXXXX:6815 socket error on write

[Sat Dec 13 09:24:11 2025] libceph: mds0 (1)XXXXXX:6801 socket error on write

[Sat Dec 13 09:24:11 2025] libceph: mon4 (1)XXXXXXX:6789 session established

[Sat Dec 13 09:24:11 2025] libceph: mds0 (1)XXXXXXXX:6801 session reset

[Sat Dec 13 09:24:11 2025] ceph: mds0 closed our session

[Sat Dec 13 09:24:11 2025] ceph: mds0 reconnect start

[Sat Dec 13 09:24:12 2025] systemd[1]: systemd-journald.service: Failed with result 'timeout'.

[Sat Dec 13 09:24:12 2025] systemd[1]: Failed to start Journal Service.

[Sat Dec 13 09:24:12 2025] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 52.

[Sat Dec 13 09:24:12 2025] systemd[1]: Stopped Journal Service.

[Sat Dec 13 09:24:13 2025] ceph: mds0 reconnect denied

[Sat Dec 13 09:24:13 2025] ceph: dropping dirty Fx state for 000000003ab648c8 1099527656419

[Sat Dec 13 09:24:13 2025] ceph: dropping dirty Fx state for 00000000bf26d167 1099527656842

[Sat Dec 13 09:24:13 2025] ceph: dropping dirty Fx state for 000000008921adb9 1099527656893

[Sat Dec 13 09:24:13 2025] ceph: dropping dirty Fx state for 00000000a8a72f05 1099527671953

[Sat Dec 13 09:24:13 2025] ceph: dropping dirty Fw state for 00000000a7844000 1099527717007

[Sat Dec 13 09:24:13 2025] libceph: mds0 (1)XXXXXXXX:6801 socket closed (con state V1_CONNECT_MSG)

[Sat Dec 13 09:24:13 2025] ceph: check_quota_exceeded: ino (10000f5808f.fffffffffffffffe) null i_snap_realm

[Sat Dec 13 09:26:18 2025] systemd[1]: Starting Journal Service...

[Sat Dec 13 09:26:18 2025] systemd[1]: systemd-journald.service: start operation timed out. Terminating.

[Sat Dec 13 09:26:18 2025] systemd[1]: systemd-journald.service: Failed with result 'timeout'.

[Sat Dec 13 09:26:18 2025] systemd[1]: Failed to start Journal Service.



Environment1:
CEPH Stretch Cluster, nvme 32xOSD

Environment2:
CEPH Cluster, SAS 36x OSD
 
we are are experiencing a large number of OSD errors.
What you're seeing isnt actually errors. It is, however, an indication of OSD resource starvation; if osd 4, 6 and 8 are all on the same host, check overall host load, and might need to make some adjustments hardware-wise.

Alternatively, you can lower priority for some osd garbage collection procedures such as osd_compaction_prio.

Lastly- if cluster performance doesn't seem to be actually affected (note your IOWait times and osd latency figured) you can just adjust the threshhold for the subsystem to complain, like so:

ceph config set osd osd_op_complaint_time [value]

default value is 0.05, or 50ms- drop that lower until the errors go away.

as for your disk with stalled reads- check the health of osd.4 (smartctl test.) If it passes, ignore it and keep an eye on repetition. if it does repeat, erase the osd and recreate- that will force any metadata fragmentation to clear and hopefully not trip the heartbeat grace value.