Ceph 19.2.1 2 OSD(s) experiencing slow operations in BlueStore

I got these messages shortly after upgrading ceph (to squid) and it did not go away, neither by setting bluestore_slow_ops_warn (or other) parameters nor with the ceph update to 19.2.3 that came with PVE 9.

But what I noted after the last PVE upgrade: Only one specific OSD triggers alarms. That happened to be one of the SSDs that came with one of the mini PCs (see https://www.igorslab.de/nipogi-am06-pro-mini-pc-im-test/3/) that run my proxmox cluster.

After replacing it with a (possibly slightly better) WD Blue, the messages are gone.
 
Got this message just today for ONE OSD with PVE 8.4 and Ceph 19.2.2 on a 3-node cluster with each 3 NVMe Disks.
Faulty disk or should i just adjust the thresholds?

Also i get this message on one host with the osd.8:
[4780261.461045] print_bio(,4353): Unexpected write bio=00000000a7cfc028, dev=fc00002, sector=1792539680, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=000000002ef4619e bi_max_vecs=4
[4780261.461062] print_bio(,4353): Unexpected write bio=000000007f7bfd61, dev=fc00002, sector=2228658560, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=00000000c5fe085b bi_max_vecs=4
[4780261.461077] print_bio(,4353): Unexpected write bio=0000000072393dc1, dev=fc00002, sector=2527454680, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=000000001a38a917 bi_max_vecs=4
[4780261.461092] print_bio(,4353): Unexpected write bio=00000000538ae896, dev=fc00002, sector=2261353144, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=00000000df6f4479 bi_max_vecs=4
[4780261.461107] print_bio(,4353): Unexpected write bio=0000000000509db8, dev=fc00002, sector=2495446344, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=000000009a579e86 bi_max_vecs=4
[4780261.461123] print_bio(,4353): Unexpected write bio=00000000ec88e1ca, dev=fc00002, sector=2594239672, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=00000000381dabfa bi_max_vecs=4
[4780261.495169] print_bio(,4591): Unexpected write bio=000000002f50053c, dev=fc00002, sector=2282422104, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=0000000007f2c056 bi_max_vecs=4
[4780261.854248] print_bio(,4587): Unexpected write bio=00000000ae4c167f, dev=fc00002, sector=2282422112, bi_flags=181 bi_rw=8801 bi_size=4096 bi_vcnt=1 bi_io_vec=0000000032b76841 bi_max_vecs=4
[4780261.854509] print_bio(,4589): Unexpected write bio=00000000cc94fe10, dev=fc00002, sector=2534489608, bi_flags=181 bi_rw=8801 bi_size=36864 bi_vcnt=9 bi_io_vec=000000004d62b2e7 bi_max_vecs=16
[4780261.854790] print_bio(,4582): Unexpected write bio=000000007ac972ce, dev=fc00002, sector=3578032512, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=000000008db74c87 bi_max_vecs=4
[4780261.854820] print_bio(,4582): Unexpected write bio=00000000bc262934, dev=fc00002, sector=3578032544, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=00000000ed91e3bc bi_max_vecs=4
[4780261.854991] print_bio(,4591): Unexpected write bio=000000001547878a, dev=fc00002, sector=1702058648, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=000000000cd87865 bi_max_vecs=4
[4780261.971673] print_bio(,4585): Unexpected write bio=000000007055e670, dev=fc00002, sector=2534489680, bi_flags=181 bi_rw=8801 bi_size=20480 bi_vcnt=5 bi_io_vec=00000000be8e3625 bi_max_vecs=16
[4780261.971716] print_bio(,4584): Unexpected write bio=000000003d19261d, dev=fc00002, sector=1702058680, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=000000008146da23 bi_max_vecs=4
[4780262.192973] print_bio(,4583): Unexpected write bio=00000000d7b1227c, dev=fc00002, sector=1702058712, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=00000000e32670a2 bi_max_vecs=4
[4780262.205130] print_bio(,4583): Unexpected write bio=0000000091f794d9, dev=fc00002, sector=1702058744, bi_flags=181 bi_rw=8801 bi_size=49152 bi_vcnt=12 bi_io_vec=0000000027031723 bi_max_vecs=16
[4780262.205187] print_bio(,4583): Unexpected write bio=00000000b7cda450, dev=fc00002, sector=1702058840, bi_flags=181 bi_rw=8801 bi_size=32768 bi_vcnt=8 bi_io_vec=000000007943a597 bi_max_vecs=16
[4780262.205742] print_bio(,4591): Unexpected write bio=0000000046b1c47d, dev=fc00002, sector=1702058904, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=000000006c3d5b9c bi_max_vecs=4
[4780262.217809] print_bio(,4583): Unexpected write bio=00000000c11c86fa, dev=fc00002, sector=1702058936, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=000000009cb65552 bi_max_vecs=4
[4780262.217840] print_bio(,4583): Unexpected write bio=000000006dd6828d, dev=fc00002, sector=1702058968, bi_flags=181 bi_rw=8801 bi_size=65536 bi_vcnt=16 bi_io_vec=000000007559477d bi_max_vecs=16
[4780262.217854] print_bio(,4583): Unexpected write bio=0000000004e63d49, dev=fc00002, sector=1702059096, bi_flags=181 bi_rw=8801 bi_size=32768 bi_vcnt=8 bi_io_vec=000000001f2b8b97 bi_max_vecs=16
[4780262.220043] print_bio(,4583): Unexpected write bio=00000000524e66e3, dev=fc00002, sector=3578032576, bi_flags=181 bi_rw=8801 bi_size=32768 bi_vcnt=8 bi_io_vec=00000000402090f9 bi_max_vecs=16
[4780262.230420] print_bio(,4583): Unexpected write bio=00000000c205d0ba, dev=fc00002, sector=3578032640, bi_flags=181 bi_rw=8801 bi_size=65536 bi_vcnt=16 bi_io_vec=0000000065d50da5 bi_max_vecs=16
[4780262.230449] print_bio(,4583): Unexpected write bio=0000000003e06da7, dev=fc00002, sector=3578032768, bi_flags=181 bi_rw=8801 bi_size=32768 bi_vcnt=8 bi_io_vec=000000005d8616c0 bi_max_vecs=16
[4780262.231061] print_bio(,4591): Unexpected write bio=000000003541177c, dev=fc00002, sector=3578032832, bi_flags=181 bi_rw=8801 bi_size=32768 bi_vcnt=8 bi_io_vec=00000000eafb616f bi_max_vecs=16
[4780262.242086] print_bio(,4583): Unexpected write bio=000000000f946a03, dev=fc00002, sector=3578032896, bi_flags=181 bi_rw=8801 bi_size=65536 bi_vcnt=16 bi_io_vec=0000000060f4848b bi_max_vecs=16
[4780262.242116] print_bio(,4583): Unexpected write bio=0000000028fe7dd6, dev=fc00002, sector=3578033024, bi_flags=181 bi_rw=8801 bi_size=32768 bi_vcnt=8 bi_io_vec=0000000036442b6e bi_max_vecs=16
[4780262.242711] print_bio(,4591): Unexpected write bio=00000000d7722585, dev=fc00002, sector=1702059160, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=00000000cae471fa bi_max_vecs=4
[4780262.254579] print_bio(,4591): Unexpected write bio=00000000c3079716, dev=fc00002, sector=3578033088, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=00000000219b2692 bi_max_vecs=4
[4780262.254611] print_bio(,4591): Unexpected write bio=0000000006154e00, dev=fc00002, sector=3578033120, bi_flags=181 bi_rw=8801 bi_size=65536 bi_vcnt=16 bi_io_vec=00000000bf978928 bi_max_vecs=16
[4780262.254625] print_bio(,4591): Unexpected write bio=000000002c656d19, dev=fc00002, sector=3578033248, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=000000000f083b35 bi_max_vecs=4
[4780262.255256] print_bio(,4583): Unexpected write bio=00000000044c34c5, dev=fc00002, sector=1702059192, bi_flags=181 bi_rw=8801 bi_size=16384 bi_vcnt=4 bi_io_vec=000000008c8823bd bi_max_vecs=4
[4780270.518908] session_init(mms,4020790): OK. kdev=fc:3, bs=4096.

On this host i can find these messages for all devices fc00000, fc00001 and fc00002 bit only see the warning for osd.8 in the status page:

====== osd.6 =======
[block] /dev/ceph-18080be7-ec84-4caf-8e3a-abfcdc880263/osd-block-8b30c3cf-fa1d-4436-83ea-b50b4efdc077
block device /dev/ceph-18080be7-ec84-4caf-8e3a-abfcdc880263/osd-block-8b30c3cf-fa1d-4436-83ea-b50b4efdc077
osd fsid 8b30c3cf-fa1d-4436-83ea-b50b4efdc077
osd id 6
osdspec affinity
====== osd.7 =======
[block] /dev/ceph-b117a6ee-01b3-41cb-9dee-ae854ef9157f/osd-block-e99ecfec-c7b4-4bd6-bbaa-701347960213
block device /dev/ceph-b117a6ee-01b3-41cb-9dee-ae854ef9157f/osd-block-e99ecfec-c7b4-4bd6-bbaa-701347960213
osd fsid e99ecfec-c7b4-4bd6-bbaa-701347960213
osd id 7
osdspec affinity
====== osd.8 =======
[block] /dev/ceph-0308b793-b25d-4300-9d62-f013eb0b1269/osd-block-f8590460-6bf0-4c80-9956-8d29ac64845d
block device /dev/ceph-0308b793-b25d-4300-9d62-f013eb0b1269/osd-block-f8590460-6bf0-4c80-9956-8d29ac64845d
osd fsid f8590460-6bf0-4c80-9956-8d29ac64845d
osd id 8
osdspec affinity
 
Got this message just today for ONE OSD with PVE 8.4 and Ceph 19.2.2 on a 3-node cluster with each 3 NVMe Disks.
Faulty disk or should i just adjust the thresholds?

Also i get this message on one host with the osd.8:


On this host i can find these messages for all devices fc00000, fc00001 and fc00002 bit only see the warning for osd.8 in the status page:
Upgraded to PVE9 but same results on 2 servers during the week. Replaced 2 HDD's and no more warnings.
Put the 'old' HDD's (WD Blue, 4TB) in a QNAS and no issues on that NAS also...

Strange but solved!
 
Hi everyone,

Got the same issue after upgraded to 19.2.1 (pve-enterprise). I only got this on rotating disks . Warning started for 3 disks and during the time increased . Now I have warning for 12 disks. All disks are new / came with Dell nodes . Disks are : Seagate ST2000NM012B-2TD (enterprise).

Did someone discovered what is the issue or workaround .. ? It seems not very safe to increase bluestore_slow_ops_warn_lifetime .

Thanlk you.
 
Last edited:
This issue has been going on for months. Apparently, no single update has been able to resolve it. As of today, September 1, 2025, this remains an unresolved issue.
 
Same. Started the cluster from cold today, hadn't even un-paused it and I was getting the alarm from SSDs that are perfectly healthy...
 
I got rid of these warnings with the following settings :
ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true
ceph config set osd bluestore_slow_ops_warn_lifetime 600
 
I got rid of these warnings with the following settings :
ceph config set global bdev_async_discard_threads 1
ceph config set global bdev_enable_discard true
ceph config set osd bluestore_slow_ops_warn_lifetime 600
Rebuilding a node right now... But will try that soon!

EDIT: Just checked, I already have these set. :/
 
ceph config set osd bluestore_slow_ops_warn_lifetime 600

Does this line mean: I won't read, I won't listen, and you don't say anything.
 
Possible source of persistent issues!

Sounds like my situation COULD be related to a firmware issue on LSI 9305 controllers (though I have multiple types, these ARE in the mix for me).

Basically, discard commands cause later read commands to be blocked. Disabling discard (I have high write endurance disks) MIGHT alleviate it!

Code:
ceph config set osd.9 bdev_enable_discard 0
systemctl restart ceph-osd@9.service

Will report back if that fixes it....

FYI: DO NOT DO THIS Until I can confirm it works, it WILL cause your SSDs to wear more quickly. Long term, I need to replace all my controllers with something else...
 
Last edited:
It's been several days, and still no alarms after disabling discard/trim... I am going to go with that this was always an issue, but ceph just handled it silently in the back ground. So, cool? Guess it's time to get the credit card out.
 
  • Like
Reactions: SteveITS