Actually it is almost same when using 16k, 32k or 64k blocks, almost like it has been cut off at 11MB/s..:
iodepth=1, 16k: READ: bw=10.3MiB/s (10.8MB/s), 10.3MiB/s-10.3MiB/s (10.8MB/s-10.8MB/s), io=616MiB (646MB), run=60003-60003msec
iodepth=1, 32k: READ: bw=10.1MiB/s (10.6MB/s)...