|
we are are experiencing a large number of OSD errors. This errors all occured during backup of the VMs on the cluster. We maneged to get the OSD to work again and stop showing the error, but we want to troubleshoot why this is happening.
Warning messages:
osd.3 observed slow operation indications in BlueStore (for multiple osd-s)
osd.4 observed stalled read indications in DB device
osd.19 crashed on host
root@testname1:~# ceph health detail
HEALTH_WARN 3 OSD(s) experiencing slow operations in BlueStore; 1 OSD(s) experiencing stalled read in db device of BlueFS
[WRN] BLUESTORE_SLOW_OP_ALERT: 3 OSD(s) experiencing slow operations in BlueStore
osd.4 observed slow operation indications in BlueStore
osd.6 observed slow operation indications in BlueStore
osd.8 observed slow operation indications in BlueStore
[WRN] DB_DEVICE_STALLED_READ_ALERT: 1 OSD(s) experiencing stalled read in db device of BlueFS
osd.4 observed stalled read indications in DB device
Troubleshooting steps:
- I checked that all the OSDs are working correctly
- I have confirmed that we do not have bluestore_elastic_shared_blob feature enabled: https://docs.clyso.com/docs/kb/known-bugs/squid/
Environment:
Proxmox 9.0.6
Ceph 19.2.23
Last edited: