I just set up my first ceph pool. I have 4 identical nodes each with an identical 16TB HDD. The HDDs in isolation get about 270MB/s read and write. Nodes are connected with a ceph-dedicated 10Gb network. I set up the pool with 2 replicas and with pg autoscaling enabled (though changing number of pgs doesn't seem to significantly impact performance), and I disabled write-caching on the drives themselves.
Ideally, I would expect that for writes, I get double the performance of a single drive, since the same data could be written to two drives, and each pair would get half the data. For reads, you could get quadruple the performance of a single drive since a fourth of the data could be read from each drive in parallel.
In practice, I'm seeing the following
Pool writes of ~370MB/s
Pool reads of ~430MB/s
Interestingly, I can see the network traffic correlating to these speeds. Meaning on the node where I'm running the command, I can see 370MB/s of traffic going out, and roughly 1/3 of that going into each of the other 3 nodes. Similar, but reversed, for reads. Overall, though, the speeds are quite a bit lower than I expected in both cases. Meanwhile, CPU and memory usage is low.
If I create a block device following the ceph wiki, I see:
Block writes of ~170MB/s
Block reads of ~15MiB/s
I'm guessing there might be something wrong with the commands I'm using here. I'm not sure why the speeds are so poor.
I've copied the full commands and outputs I've used below. If anyone has any feedback on how to further improve performance, I'd greatly appreciate it!
Ideally, I would expect that for writes, I get double the performance of a single drive, since the same data could be written to two drives, and each pair would get half the data. For reads, you could get quadruple the performance of a single drive since a fourth of the data could be read from each drive in parallel.
In practice, I'm seeing the following
Pool writes of ~370MB/s
Pool reads of ~430MB/s
Interestingly, I can see the network traffic correlating to these speeds. Meaning on the node where I'm running the command, I can see 370MB/s of traffic going out, and roughly 1/3 of that going into each of the other 3 nodes. Similar, but reversed, for reads. Overall, though, the speeds are quite a bit lower than I expected in both cases. Meanwhile, CPU and memory usage is low.
If I create a block device following the ceph wiki, I see:
Block writes of ~170MB/s
Block reads of ~15MiB/s
I'm guessing there might be something wrong with the commands I'm using here. I'm not sure why the speeds are so poor.
I've copied the full commands and outputs I've used below. If anyone has any feedback on how to further improve performance, I'd greatly appreciate it!
Code:
root@cortana01:~# rados bench -p HDD 10 write -b 16M --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 16777216 bytes to objects of size 16777216 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_cortana01_39329
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 26 11 175.921 176 0.61204 0.666382
2 15 49 34 271.923 368 0.595901 0.686431
3 15 72 57 303.932 368 0.702953 0.686895
4 15 94 79 315.934 352 0.792737 0.692411
5 15 120 105 335.936 416 0.638328 0.687194
6 15 140 125 333.274 320 0.65639 0.696982
7 15 165 150 342.799 400 0.619978 0.693383
8 15 186 171 341.947 336 0.790205 0.686658
9 15 212 197 350.17 416 0.61829 0.689477
10 15 234 219 350.35 352 0.748165 0.687685
Total time run: 10.1302
Total writes made: 235
Write size: 16777216
Object size: 16777216
Bandwidth (MB/sec): 371.168
Stddev Bandwidth: 69.3128
Max bandwidth (MB/sec): 416
Min bandwidth (MB/sec): 176
Average IOPS: 23
Stddev IOPS: 4.33205
Max IOPS: 26
Min IOPS: 11
Average Latency(s): 0.668439
Stddev Latency(s): 0.105804
Max latency(s): 0.849491
Min latency(s): 0.128145
Code:
root@cortana01:~# rados bench -p HDD 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 31 16 255.91 256 0.598025 0.610392
2 15 58 43 343.921 432 0.521511 0.59645
3 15 87 72 383.924 464 0.559089 0.56917
4 15 119 104 415.926 512 0.507587 0.542226
5 15 144 129 412.732 400 0.714784 0.553088
6 15 169 154 410.605 400 0.719124 0.569878
7 15 195 180 411.371 416 0.669092 0.578182
8 15 222 207 413.94 432 0.685254 0.580606
Total time run: 8.74796
Total reads made: 238
Read size: 16777216
Object size: 16777216
Bandwidth (MB/sec): 435.301
Average IOPS: 27
Stddev IOPS: 4.61171
Max IOPS: 32
Min IOPS: 16
Average Latency(s): 0.564066
Max latency(s): 1.23279
Min latency(s): 0.143175
Code:
root@cortana01:/# rbd bench --io-type read -p HDD image01
bench type read io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 4656 4681.34 18 MiB/s
2 11984 5990.98 23 MiB/s
3 12304 4102.54 16 MiB/s
4 12624 3157.61 12 MiB/s
5 17024 3409.34 13 MiB/s
6 24048 3861.39 15 MiB/s
7 29312 3455.21 13 MiB/s
8 29952 3516.92 14 MiB/s
9 36032 4683.45 18 MiB/s
10 36592 3893.33 15 MiB/s
11 42272 3652.82 14 MiB/s
12 42896 2726.05 11 MiB/s
13 44528 2925.13 11 MiB/s
14 45008 1794.47 7.0 MiB/s
15 52560 3210.28 13 MiB/s
16 55952 2735.98 11 MiB/s
17 56432 2702.86 11 MiB/s
18 62784 3642.44 14 MiB/s
19 63168 3622.56 14 MiB/s
20 69936 3475.18 14 MiB/s
21 85296 5873.47 23 MiB/s
22 87152 6157.51 24 MiB/s
23 93376 6136.78 24 MiB/s
24 94016 6178.21 24 MiB/s
25 100416 6093.53 24 MiB/s
26 103072 3560.16 14 MiB/s
27 109616 4488.29 18 MiB/s
28 114640 4254.48 17 MiB/s
29 122608 5716.08 22 MiB/s
30 123248 4556.35 18 MiB/s
31 130432 5459.96 21 MiB/s
32 135952 5271.39 21 MiB/s
33 138064 4681.03 18 MiB/s
34 142096 3907.74 15 MiB/s
35 154496 6265.86 24 MiB/s
36 155696 5029.64 20 MiB/s
37 158864 4576.88 18 MiB/s
38 159344 4253.42 17 MiB/s
39 159824 3525.84 14 MiB/s
40 167184 2528.99 9.9 MiB/s
41 167648 2406.27 9.4 MiB/s
42 168480 1925.11 7.5 MiB/s
43 170688 2265.16 8.8 MiB/s
44 171168 2274.25 8.9 MiB/s
45 172800 1122.97 4.4 MiB/s
46 174064 1278.84 5.0 MiB/s
47 174464 1190.6 4.7 MiB/s
48 174848 826.869 3.2 MiB/s
49 180800 1931.41 7.5 MiB/s
50 182240 1891.77 7.4 MiB/s
51 184000 1993.57 7.8 MiB/s
52 186624 2443.72 9.5 MiB/s
53 187104 2474.45 9.7 MiB/s
54 194672 2771.06 11 MiB/s
55 201872 3927.95 15 MiB/s
56 202512 3700.16 14 MiB/s
57 209824 4642.76 18 MiB/s
58 220592 6697.56 26 MiB/s
59 224816 6028.77 24 MiB/s
60 225296 4681.96 18 MiB/s
61 226624 4817.56 19 MiB/s
62 229744 3956.28 15 MiB/s
63 232192 2316.74 9.0 MiB/s
64 246880 4418.96 17 MiB/s
65 250240 4980.8 19 MiB/s
66 260832 6851.15 27 MiB/s
elapsed: 66 ops: 262144 ops/sec: 3949.6 bytes/sec: 15 MiB/s
Code:
root@cortana01:~# rbd bench --io-type write -p HDD --io-size 16K image01
bench type write io_size 16384 io_threads 16 bytes 1073741824 pattern sequential
SEC OPS OPS/SEC BYTES/SEC
1 12128 12143.9 190 MiB/s
2 23360 11664.6 182 MiB/s
3 34720 11521 180 MiB/s
4 45008 11188.8 175 MiB/s
5 55648 11106.1 174 MiB/s
elapsed: 6 ops: 65536 ops/sec: 10810.9 bytes/sec: 169 MiB/s