I've got 15k-16k random write IOPS in 16 threads with 4k blocks per bluestore OSD on 40G IB net, but that is the good result, 10k per OSD is not bad. With io-thread=1 I can get only 1400-1600 IOPS of random writes.
The problem here is the OSD code latencies an WA produced on every operation. OSD itself can take 700 μs (0,7ms) just to execute one IO operation, so even on RAM disk, where kernel IO operations in <10μs you can't barely reach 3k IOPS with a best high freq CPU.
P.S. Random read is about 35-50k IOPS on the same system, but all of them is just OSD performance mesure (data was read from OSD cache, when testing not disk IO is done)