The problem is coming from network latency + ceph latency. If you copy 1 file, sequentially and with small blocks, it's iodepth=1. (same with dd command for example).
For each block, you'll have your network latency (0,1ms for example), you'll be able to do 10000 iops.
if you do it with 4k...