ceph nvme ssd slower than spinning disks16 node 40 gbe ceph cluster

Nathan Stratton

Well-Known Member
Dec 28, 2018
47
3
48
49
I am running the latest version of proxmox on a 16 node 40 gbe cluster. Each node has 2 Samsung 960 EVO 250GB NVMe SSDs and 3 Hitachi 2 TB 7200 RPM Ultrastar disks. I am using bluestore for all disks with two crush rules, one for fast nvme and slow for hdd.

I have tested bandwidth between all hosts using iperf -c {host} -p 8 -i 1 -t 30 and get:

[SUM] 0.0-30.0 sec 138 GBytes 39.6 Gbits/sec

However when I test with rados bench -p bench_slow 100 write --no-cleanup -t=256 I get:

Total time run: 100.131
Total writes made: 12328
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 492.475
Stddev Bandwidth: 120.179
Max bandwidth (MB/sec): 1428
Min bandwidth (MB/sec): 0
Average IOPS: 123
Stddev IOPS: 30.0448
Max IOPS: 357
Min IOPS: 0
Average Latency(s): 2.05843
Stddev Latency(s): 0.20674
Max latency(s): 2.66947
Min latency(s): 0.137308

I am ok with almost 500MB/sec on spinning disks. However if I test rados bench -p bench-fast 100 write --no-cleanup -t=256 I get:

Total time run: 104.242
Total writes made: 5529
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 212.16
Stddev Bandwidth: 228.632
Max bandwidth (MB/sec): 976
Min bandwidth (MB/sec): 0
Average IOPS: 53
Stddev IOPS: 57.1618
Max IOPS: 244
Min IOPS: 0
Average Latency(s): 4.77283
Stddev Latency(s): 3.33254
Max latency(s): 13.8286
Min latency(s): 0.867378

Now I know this pool has 16 less disks (only 2 NVMe * 250GB per host vs 3 SATA * 2TB per hosts) but I was expecting way way more than 212MB/sec and a max latency of 13 seconds!!!!
 
Yes, that is true, but my SATA drives don't support them either. I find it hard to believe that the fact that they are not Pro makes them slower than spinning disks.
 
Yes, that is true, but my SATA drives don't support them either. I find it hard to believe that the fact that they are not Pro makes them slower than spinning disks.

CEPH uses a very particular set of I/O when writing to disks, therefore certain non-enterprise drives will really suffer

https://www.sebastien-han.fr/blog/2...-if-your-ssd-is-suitable-as-a-journal-device/

Is fairly old but will give you a good example of the large difference in performance between enterprise and non.