I have been upgrading my customer node by node from 10 gig to 40 gig, now that all nodes are 40 gig I am seeing very very slow ceph. Setup is Dual E5-2690v2 with dual 40 gig bond into a cisco 3132q. OSDs are Samsung 960 EVO 256G NVMe with 2 in each server.
Total time run: 92.179118
Total writes made: 130
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 5.64119
Stddev Bandwidth: 32.0926
Max bandwidth (MB/sec): 304
Min bandwidth (MB/sec): 0
Average IOPS: 1
Stddev IOPS: 8
Max IOPS: 76
Min IOPS: 0
Average Latency(s): 11.3449
Stddev Latency(s): 29.3919
Max latency(s): 92.002
Min latency(s): 0.0157386
Reeds look fine:
root@virt0:/home# rados bench -p fast 60 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
Total time run: 0.152010
Total reads made: 130
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 3420.81
Average IOPS: 855
Stddev IOPS: 0
Max IOPS: 0
Min IOPS: 2147483647
Average Latency(s): 0.0175385
Max latency(s): 0.0434704
Min latency(s): 0.00729327
network I/O looks fine:
root@virt0:/home# iperf -c virt4 -P 4
------------------------------------------------------------
Client connecting to virt4, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 6] local 10.88.64.120 port 38350 connected with 10.88.64.124 port 5001
[ 3] local 10.88.64.120 port 38344 connected with 10.88.64.124 port 5001
[ 5] local 10.88.64.120 port 38348 connected with 10.88.64.124 port 5001
[ 4] local 10.88.64.120 port 38346 connected with 10.88.64.124 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 12.0 GBytes 10.3 Gbits/sec
[ 3] 0.0-10.0 sec 12.0 GBytes 10.3 Gbits/sec
[ 5] 0.0-10.0 sec 12.0 GBytes 10.3 Gbits/sec
[ 4] 0.0-10.0 sec 10.2 GBytes 8.73 Gbits/sec
[SUM] 0.0-10.0 sec 46.1 GBytes 39.6 Gbits/sec
latency looks ok:
root@virt4:~# ceph osd perf | sort -n
0 3 3
osd commit_latency(ms) apply_latency(ms)
1 2 2
2 4 4
3 2 2
4 3 3
5 2 2
6 2 2
7 2 2
8 3 3
9 6 6
10 1 1
11 6 6
12 3 3
13 4 4
14 2 2
15 2 2
16 3 3
17 3 3
18 3 3
19 4 4
20 2 2
21 2 2
22 3 3
23 3 3
Total time run: 92.179118
Total writes made: 130
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 5.64119
Stddev Bandwidth: 32.0926
Max bandwidth (MB/sec): 304
Min bandwidth (MB/sec): 0
Average IOPS: 1
Stddev IOPS: 8
Max IOPS: 76
Min IOPS: 0
Average Latency(s): 11.3449
Stddev Latency(s): 29.3919
Max latency(s): 92.002
Min latency(s): 0.0157386
Reeds look fine:
root@virt0:/home# rados bench -p fast 60 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
Total time run: 0.152010
Total reads made: 130
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 3420.81
Average IOPS: 855
Stddev IOPS: 0
Max IOPS: 0
Min IOPS: 2147483647
Average Latency(s): 0.0175385
Max latency(s): 0.0434704
Min latency(s): 0.00729327
network I/O looks fine:
root@virt0:/home# iperf -c virt4 -P 4
------------------------------------------------------------
Client connecting to virt4, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 6] local 10.88.64.120 port 38350 connected with 10.88.64.124 port 5001
[ 3] local 10.88.64.120 port 38344 connected with 10.88.64.124 port 5001
[ 5] local 10.88.64.120 port 38348 connected with 10.88.64.124 port 5001
[ 4] local 10.88.64.120 port 38346 connected with 10.88.64.124 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 12.0 GBytes 10.3 Gbits/sec
[ 3] 0.0-10.0 sec 12.0 GBytes 10.3 Gbits/sec
[ 5] 0.0-10.0 sec 12.0 GBytes 10.3 Gbits/sec
[ 4] 0.0-10.0 sec 10.2 GBytes 8.73 Gbits/sec
[SUM] 0.0-10.0 sec 46.1 GBytes 39.6 Gbits/sec
latency looks ok:
root@virt4:~# ceph osd perf | sort -n
0 3 3
osd commit_latency(ms) apply_latency(ms)
1 2 2
2 4 4
3 2 2
4 3 3
5 2 2
6 2 2
7 2 2
8 3 3
9 6 6
10 1 1
11 6 6
12 3 3
13 4 4
14 2 2
15 2 2
16 3 3
17 3 3
18 3 3
19 4 4
20 2 2
21 2 2
22 3 3
23 3 3