CEPH Performance tuning with small block size

DynFi User

Renowned Member
Apr 18, 2016
153
17
83
49
dynfi.com
Hello,

We are looking for information on how to properly tune CEPH or any other part of the system in order to get the best possible performance on a 4 node CEPH cluster equipped with NVMe disks. We have very poor perfs when using small block size, we have crafted a script in order to illustrate perfs ∆ between small block write and large one:

Code:
Testing different block sizes with a file size of 1024 MB.
Block size : Write Rate
        64 :     7 MB/s
       128 :   164 MB/s
       256 :   307 MB/s
       512 :   539 MB/s
      1024 :   882 MB/s
      2048 :     4 GB/s
      4096 :     9 GB/s
      8192 :     1 GB/s
     16384 :     2 GB/s
     32768 :     2 GB/s
     65536 :     2 GB/s
    131072 :     1 GB/s
    262144 :     1 GB/s
    524288 :     1 GB/s
   1048576 :     1 GB/s
Tests completed.

As you can see we have ∆ of more than 1000x if we write using 4096k blocksize compared to write with 64k

Any idea will be welcome.
 
As you can see we have ∆ of more than 1000x if we write using 4096k blocksize compared to write with 64k
That is normal if your NVMe namespaces are formatted as 4096. And usually their "internal" best performance. So if you have every layer (NVMe internal, namespace format, Ceph defaults to 4096) at the same block size, this already gives you the best performance.
So you already are optimized in the sweet spot.

Sure, in theory you could format the namespace to 64...(Don't know if the NVMe really accepts it) and force Ceph down to 64...then you have best throughput for 64, but only for 64.
Also keep in mind that a smaller block size means more waste in the overall memory space. But usually you don't want that and 4096 is really the sweet spot at the moment.

You cannot optimize for every block size and the best performance is always achieved by using the fastest of the "lowest device/layer" and configuring the ones built on top of it in the same way. namespace4k,filesystem4k(ceph/zfs),windows-vm(ntfs4k)

As you can see we have ∆ of more than 1000x if we write using 4096k blocksize compared to write with 64k
4096bytes=4k, 64bytes and not 64k
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!