[SOLVED] CEPH Realistic Expectations

skywavecomm

Active Member
May 30, 2019
16
2
43
30
I have setup a small ceph cluster with the following specs:
Three Nodes identical
- HP DL380p G8
- Intel Xeon E5-2697-v2
- 128GB DDR3 RAM (16GB 2RX4 PC3-14900R)
- OS Drive: Intel DC S4500 240GB
- OSD Drives: 2x Intel DC S3500 800GB
- NIC: Intel X520
Configuration:
Code:
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.20.192.0/24
fsid = e79b56b5-656a-4259-b2bf-68f53e1faee9
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.20.192.0/24
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mon.roc-server03]
host = roc-server03
mon addr = 10.20.192.12:6789
[mon.roc-server01]
host = roc-server01
mon addr = 10.20.192.10:6789
[mon.roc-server02]
host = roc-server02
mon addr = 10.20.192.11:6789
A monitor is on each node. Each server has 2 OSDs. Usable space for the ceph cluster is 1.8TB.
All communication between nodes is showing near 10gbps through iperf. I have also enabled jumbo frames for the interfaces on the X520.
Benchmark is showing the following:
Code:
rados bench -p scbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_roc-server01_229534
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 130 114 455.974 456 0.0489869 0.135184
2 16 253 237 473.955 492 0.0439613 0.123906
3 16 363 347 462.623 440 0.0773927 0.134225
4 16 494 478 477.95 524 0.0857855 0.130516
5 16 574 558 446.353 320 0.0236249 0.126669
6 16 649 633 421.955 300 0.0814599 0.148254
7 16 769 753 430.241 480 0.156152 0.148414
8 16 891 875 437.455 488 0.0937066 0.144579
9 16 1004 988 439.066 452 0.070477 0.145127
10 16 1106 1090 435.955 408 0.0571002 0.145583
Total time run: 10.320690
Total writes made: 1107
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 429.041
Stddev Bandwidth: 73.7322
Max bandwidth (MB/sec): 524
Min bandwidth (MB/sec): 300
Average IOPS: 107
Stddev IOPS: 18
Max IOPS: 131
Min IOPS: 75
Average Latency(s): 0.148274
Stddev Latency(s): 0.150493
Max latency(s): 1.2687
Min latency(s): 0.0213823
Code:
rados bench -p scbench 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 397 381 1523.78 1524 0.0131171 0.0404196
2 16 795 779 1557.78 1592 0.0136142 0.039872
Total time run: 2.955225
Total reads made: 1107
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1498.36
Average IOPS: 374
Stddev IOPS: 12
Max IOPS: 398
Min IOPS: 381
Average Latency(s): 0.0419944
Max latency(s): 0.301681
Min latency(s): 0.0117499
Code:
rados bench -p scbench 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 445 429 1715.72 1716 0.0706885 0.0356332
2 15 842 827 1653.71 1592 0.014284 0.0373693
3 16 1231 1215 1619.75 1552 0.134321 0.0384631
4 16 1635 1619 1618.78 1616 0.0589793 0.0387607
5 16 2036 2020 1615.79 1604 0.101172 0.0387121
6 16 2422 2406 1603.79 1544 0.00206484 0.0390605
7 16 2837 2821 1611.8 1660 0.00258349 0.0389393
8 16 3223 3207 1603.31 1544 0.00390366 0.0391965
9 16 3612 3596 1598.03 1556 0.0186623 0.0393629
10 16 4020 4004 1601.41 1632 0.064532 0.0393168
Total time run: 10.057262
Total reads made: 4020
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1598.84
Average IOPS: 399
Stddev IOPS: 14
Max IOPS: 429
Min IOPS: 386
Average Latency(s): 0.0395143
Max latency(s): 0.222441
Min latency(s): 0.00170083
Is this optimal to expect, or is there room for improvement through configuration of ceph? Or do I need hardware additions or changes to get any further speed improvements?
Thanks
 
It's hard to compare but ~600mbps seems somewhat comparable, although I have less OSDs per node than the benchmarks, but what configuration changes were made in the benchmark to achieve the results?
Stock Prxomox VE and stock Ceph, MTU 9000. Nothing more, besides for the first graph the switch used is the 100 GbE. The bandwidth is in MB/s, in bps it would be 6.6 Gbps.

The main points, for improvement would be to use more OSDs. As Ceph writes the object to the primary OSD only and that OSD is distributing the object to the other two OSDs. And only after it was written successfully to all involved OSDs, the acknowledgement comes back to the writing process. Read performance will be always higher, as it can read from the OSDs in parallel.
 
  • Like
Reactions: skywavecomm

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!