Ceph Performance Tuning?

lastb0isct

Member
Dec 29, 2015
61
6
6
38
Need help tweaking the write performance...hope it might just be my settings

rK5VyTt.png


I'm currently getting these performance numbers:

Code:
root@b:~# rados -p ceph bench 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_b_26978
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        32        16   63.9961        64    0.742284    0.741389
    2      16        48        32   63.9913        64    0.744329    0.738303
    3      16        80        64   85.3206       128    0.732874    0.732742
    4      16        96        80   79.9876        64    0.739055     0.73361
    5      16       112        96   76.7869        64    0.710905    0.730787
    6      16       144       128   85.3186       128    0.709124     0.72821
    7      16       160       144   82.2713        64    0.715961    0.726585
    8      16       176       160   79.9857        64    0.736764    0.727324
    9      16       208       192   85.3183       128    0.710421    0.727327
   10      16       224       208   83.1852        64    0.721571    0.726753
   11      16       240       224   81.4399        64      1.1033    0.753151
   12      16       256       240   79.9856        64    0.766299    0.753224
   13      16       288       272   83.6774       128    0.746685    0.750316
   14      16       304       288    82.271        64    0.708056    0.748651
   15      16       336       320   85.3182       128    0.738596    0.746393
   16      16       352       336   83.9851        64    0.729581    0.745565
   17      16       368       352   82.8086        64     0.72453    0.744487
   18      16       400       384   85.3177       128    0.867381    0.749343
   19      16       416       400   84.1952        64    0.749326    0.749293
2018-01-19 15:53:20.708550 min lat: 0.703258 max lat: 1.1049 avg lat: 0.749212
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16       432       416   83.1849        64    0.746651    0.749212
   21      16       464       448   85.3178       128     0.71229    0.747226
   22      16       480       464   84.3484        64    0.739092    0.746627
   23      16       496       480   83.4632        64    0.748121     0.74651
   24      16       528       512   85.3178       128    0.722808    0.744759
   25      16       544       528   84.4647        64    0.734928    0.744589
   26      16       560       544   83.6773        64    0.747629    0.744323
   27      16       592       576   85.3183       128    0.715299     0.74294
   28      16       608       592   84.5565        64    0.745908    0.742835
   29      16       640       624   86.0537       128    0.722302    0.741677
   30      16       656       640   85.3185        64    0.854054    0.744586
   31      16       672       656   84.6305        64    0.720943     0.74382
   32      16       704       688   85.9851       128    0.718778    0.742854
   33      16       720       704   85.3186        64    0.711601    0.742019
   34      16       736       720   84.6911        64    0.713723    0.741602
   35      16       768       752   85.9279       128     0.70926    0.740532
   36      16       784       768   85.3186        64    0.703707    0.739824
   37      16       800       784   84.7421        64    0.740217    0.739762
   38      16       832       816   85.8799       128    0.735073    0.742288
   39      16       848       832   85.3183        64    0.729904    0.741968
2018-01-19 15:53:40.711955 min lat: 0.696736 max lat: 1.1049 avg lat: 0.741717
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16       864       848   84.7851        64    0.732268    0.741717
   41      16       896       880   85.8385       128    0.880577    0.743501
   42      16       912       896   85.3183        64     0.72953    0.742986
   43      16       928       912   84.8222        64    0.753257    0.743072
   44      16       960       944    85.803       128    0.742051    0.742632
   45      16       976       960   85.3183        64    0.740055    0.742515
   46      16       992       976   84.8546        64    0.779094    0.743074
   47      16      1024      1008   85.7721       128    0.712664    0.741781
   48      16      1040      1024   85.3183        64    0.745199    0.741766
   49      16      1072      1056   86.1889       128    0.744478    0.741593
   50      16      1088      1072   85.7446        64    0.727967    0.741474
   51      16      1104      1088   85.3181        64    0.724968    0.741211
   52      16      1120      1104   84.9079        64     0.90392    0.743401
   53      16      1152      1136   85.7206       128    0.751372    0.743221
   54      16      1168      1152   85.3182        64     0.71858    0.742909
   55      16      1200      1184   86.0939       128    0.721085    0.742282
   56      16      1216      1200   85.6992        64    0.738273    0.742133
   57      16      1232      1216   85.3183        64    0.739722    0.742057
   58      16      1264      1248   86.0538       128    0.734007    0.741703
   59      16      1280      1264   85.6798        64    0.721673    0.741487
2018-01-19 15:54:00.715493 min lat: 0.687348 max lat: 1.1049 avg lat: 0.741079
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16      1297      1281   85.3849        68    0.727328    0.741079
Total time run:         60.246708
Total writes made:      1297
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     86.1126
Stddev Bandwidth:       30.3813
Max bandwidth (MB/sec): 128
Min bandwidth (MB/sec): 64
Average IOPS:           21
Stddev IOPS:            7
Max IOPS:               32
Min IOPS:               16
Average Latency(s):     0.742483
Stddev Latency(s):      0.0600846
Max latency(s):         1.1049
Min latency(s):         0.236145
root@b:~# rados -p ceph bench 60 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       179       163   651.862       652    0.093421   0.0818938
    2      16       358       342   683.876       716    0.066012   0.0911231
    3      16       521       505   673.226       652   0.0274761    0.092056
    4      16       681       665   664.902       640   0.0229427   0.0923282
    5      16       842       826   660.708       644   0.0248527    0.094545
    6      16       973       957   637.916       524    0.231265     0.09789
    7      16      1109      1093   624.492       544   0.0225874    0.100473
    8      16      1268      1252   625.923       636    0.210718    0.100501
Total time run:       8.407530
Total reads made:     1297
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   617.066
Average IOPS:         154
Stddev IOPS:          15
Max IOPS:             179
Min IOPS:             131
Average Latency(s):   0.102411
Max latency(s):       0.81993
Min latency(s):       0.0209651

Anything look suspicious in my settings? Would those writes be managable for ~15 LXCs? Only a few of which are under heavy utilization?
 
You Ceph is slow, seems you do not have the right disks and/or network.

But as you provide no details about your network, your hardware and the configuration, this is just a guess.
 
I am running 3 nodes. All with 10GbE network. Only one OSD per machine right now, but the OSDs are SSDs. The Filesystem is also on SSDs. benchmarking them has them at around 450MB/s
 
Which SSD model do you use? Looks like you are NOT using enterprise class SSDs.?
 
I am using Samsung 850 Pro's. It is definitely not an ideal setup. But i think i should be getting better thoroughput from that. You can see i'm able to get good speeds with sequential.
 
I am using Samsung 850 Pro's. It is definitely not an ideal setup. But i think i should be getting better thoroughput from that. You can see i'm able to get good speeds with sequential.

The PRO is not suited and your limited performance is expected. You can read a lot about this in the Ceph communities (mailing lists), also here.
I suggest you replace your consumer SSDs with a Samsung SM863a (or similar). We will publish performance benchmarks with such SSDs end of January, I assume this can help others to design their own deployments.

See also:
https://www.sebastien-han.fr/blog/2...-if-your-ssd-is-suitable-as-a-journal-device/

The 850 Pro does not perform well on his tests/blog
 
Thank you for the reply and information. I'll look into it. I might be able to get away with the performance as I'm only doing this for homelab stuff.

Do you know if the ceph read/write performance will really impact actual speed of the running LXCs/VMs? Or is it only the replicating to the other nodes that is affected?
 
Yes, this will influence the VM performance. And do not forget the endurance, a 850 Pro will/can fail soon, depending on your writes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!