Just to ask are we sure this is a problem? They added a warning for slowness but has something actually gotten slower or are they just alerting on the same behavior now?
Thank you for your attention and reply. I have used this(bdev_async_discard_threads > 1 ) and it did not solve the problem. Maybe I need to wait for a while longer.The new hotfix will be coming.
https://github.com/rook/rook/discussions/15403#discussioncomment-12423878
For now, it seems all we can do is wait.Hi,
Same issue here after update from 8.3.5 to 8.4.1 and ceph from 19.2.0 to 19.2.1
Any help?
Thanks
ceph config set class:hdd bluestore_slow_ops_warn_lifetime 60
ceph config set class:hdd bluestore_slow_ops_warn_threshold 10
ceph osd tree
ceph osd primary-affinity osd.12 0
I don't think we've seen this alert for a few weeks now.
FWIW I saw this post, but did not change any of our (19.2.1) settings:
Try following ceph documentation on BLUESTORE_SLOW_OP_ALERT (https://docs.ceph.com/en/reef/rados/operations/health-checks/#bluestore-slow-op-alert).
Default is 86400 seconds and 1 slow op. This will trigger a warning if >1 slow ops occurs within 24 hours. Which is quite common for old HDDs.
This worked for my cluster:
Bash:ceph config set class:hdd bluestore_slow_ops_warn_lifetime 60 ceph config set class:hdd bluestore_slow_ops_warn_threshold 10
you have mixed ssd and hdd in the same pool ???One possibly related note, especially for those with multiple OSD classes, we set our few remaining HDDs to primary-affinity 0, so the primary read would always be from an SSD.
We have some remaining SAS 10k drives. On the prior platform they had a read/write cache SSD, which we're using for DB/WAL. They'll get replaced eventually.mixed ssd and hdd in the same pool ?
The warning didn't exist until recently.to 18.2.7-pve1
For this specific case, I think this is normal to have random slow ops error, as your pg and replicats can be on different slorage speed. (so, a primary write on fast sdd, will always wait for replicat on slow hdd), and for read, it's really russian roulette.We have some remaining SAS 10k drives. On the prior platform they had a read/write cache SSD, which we're using for DB/WAL. They'll get replaced eventually.
We use essential cookies to make this site work, and optional cookies to enhance your experience.