ceph-perfomance and latency

udo · Nov 17, 2013

Hi,
I have made some rados benchmark and I'm see high max latency (rados -p test bench -b 4194304 60 write -t 32 --no-cleanup).

does anybody know how to find and isolate the reason for the high latency?

Code:

Total time run:            61.186561        60.008194 
Total writes made:       7045           100939 
Write size:           4194304             4096
Bandwidth (MB/sec):   460.559                6.571

Stddev Bandwidth:          98.5743           5.50003
Max bandwidth (MB/sec):   568               15.4883
Min bandwidth (MB/sec):     0                0 
Average Latency:            0.277103         0.0190213    
Stddev Latency:             0.227484         0.120132    
Max latency:                [B]4.81041[/B]          [B]4.09047 [/B]
Min latency:                0.067239         0.001231

One thing I found is the numbers of threads - the throughput is better with more threads - even on a host with only 8 cores the performance is with -t 32 much better as with -t 8. But the max latency also rised up (but only slightly).

e100 · Nov 21, 2013

I don't have an answer to your question but do have some related comments.

We just setup a four node CEPH cluster using old SMART error SATA disks we had laying around. Each node has 16gb ram and10g infiniband.

I also noticed that a higher number of threads resulted in better performance.

Wish I had four SSDs laying around so I could benchmark worth the journal on SSD.

I am disappointed in the read speeds of CEPH. I suspect that the network communications is what introduces the unexpected latency.

udo · Nov 21, 2013

e100 said:
I don't have an answer to your question but do have some related comments.

We just setup a four node CEPH cluster using old SMART error SATA disks we had laying around. Each node has 16gb ram and10g infiniband.

I also noticed that a higher number of threads resulted in better performance.

Wish I had four SSDs laying around so I could benchmark worth the journal on SSD.

I am disappointed in the read speeds of CEPH. I suspect that the network communications is what introduces the unexpected latency.

Hi e100,
you don't have an good read-performance with an infiniband-connection?

I searched a little bit, and found an hint on the ceph-mailing list. SSDs are sometimes very slow (ceph use dsync for the journal).
We use three different SSDs, because that not die all SSDs together.

Here the results from my SSDs:

Code:

dd if=/root/randfile of=/mnt/test bs=350k count=10000 oflag=direct,dsync

# ssd1: Corsair Force GS
161 MB/s

# ssd2: INTEL SSDSC2CW12
126 MB/s

# ssd3: Samsung SSD 840
52,6 MB/s

The Samsung is much to slow... I have ordered an new corsair-ssd and report if something changed with the latency.

Udo

e100 · Nov 21, 2013

udo said:
Hi e100,
you don't have an good read-performance with an infiniband-connection?

One thing I learned: putting the public and private CEPH networks on different IB connections gave me the best performance.

I was getting 187MB/sec inside a windows VM (virtio 0.1-52) sequential read, I can get 1200MB/sec reading from my Areca array in this same VM so I know the bottleneck is not the VM itself.
Reading from 12 OSDs is less than 15MB/sec each.

What really puzzeled me is that during the sequential read the CEPH servers were not reading from the disks, they were getting the data from the cache.
To only get 187MB/sec when reading from the cache of 4 nodes seems rather low even with my crappy hardware.
I even tested with 16OSDs, same speed, not sure what is preventing more performance.

I've checked bandwidth using iperf, the IB network is working fine.
Any thoughts/suggestions?

udo · Nov 21, 2013

e100 said:
One thing I learned: putting the public and private CEPH networks on different IB connections gave me the best performance.

I was getting 187MB/sec inside a windows VM (virtio 0.1-52) sequential read, I can get 1200MB/sec reading from my Areca array in this same VM so I know the bottleneck is not the VM itself.
Reading from 12 OSDs is less than 15MB/sec each.

What really puzzeled me is that during the sequential read the CEPH servers were not reading from the disks, they were getting the data from the cache.
To only get 187MB/sec when reading from the cache of 4 nodes seems rather low even with my crappy hardware.
I even tested with 16OSDs, same speed, not sure what is preventing more performance.

I've checked bandwidth using iperf, the IB network is working fine.
Any thoughts/suggestions?

Hi,
what values you got from the host with

Code:

rados -p test bench -b 4194304 60 write -t 32 --no-cleanup
rados -p test bench -b 4194304 60 seq -t 32 --no-cleanup

(you have to create the pool test first: "ceph osd pool create test 1600")

Udo

e100 · Nov 21, 2013

rados -p test bench -b 4194304 60 write -t 32 --no-cleanup

Code:

 Total time run:         61.299273
Total writes made:      1232
Write size:             4194304
Bandwidth (MB/sec):     80.392 

Stddev Bandwidth:       22.1143
Max bandwidth (MB/sec): 112
Min bandwidth (MB/sec): 0
Average Latency:        1.58592
Stddev Latency:         0.889294
Max latency:            4.17732
Min latency:            0.273837

rados -p test bench -b 4194304 60 seq -t 32 --no-cleanup

Code:

 Total time run:        6.391866
Total reads made:     1232
Read size:            4194304
Bandwidth (MB/sec):    770.980 

Average Latency:       0.165004
Max latency:           0.753776
Min latency:           0.058182

OK now that seems like what I was expecting, thats about 64MB/sec/OSD which for these old disks is about their average speed.

Inside the VM I get:

Code:

 Sequential Read :   186.845 MB/s
Sequential Write :    71.298 MB/s

The write speed seems about right, the read speed in the VM is far off from what rados is doing.

Same VM, different disk that is one local Areca Array:

Code:

 Sequential Read :  1301.366 MB/s
Sequential Write :   985.813 MB/s

Both the RBD and LVM disk in the VM have the same settings and driver versions.
I've tested writethrough/back, directsync,none, the sequential read on RBD always hangs around 187MB/sec

mir · Nov 21, 2013

Is there somewhere in the configs a setting controlling bandwidth limits?

e100 · Nov 22, 2013

No bandwidth limits anywhere.

dietmar · Nov 22, 2013

e100 said:
The write speed seems about right, the read speed in the VM is far off from what rados is doing.

Your run the benchmark with 32 thread (-t 32). A VM is only a single IO thread.

udo · Nov 22, 2013

dietmar said:
Your run the benchmark with 32 thread (-t 32). A VM is only a single IO thread.

Hi,
but this confused me completly...

because with single thread I got very small values with rados: (blocksize 4M):
write: 93.180 MB/s
read: 43.655 MB/s

inside a VM i got this with dd (1MB blocksize on a filesystem):
write: 214 MB/s
read: 168 MB/s

How can the VM be faster than the host? But never the less IMHO the VM-speed is not high enough.

With many threads the read-speed ist the same (or a little bit higher) than the write speed - in the VM (and with rados + 1 thread) it's much slower.
Is this normal?

Udo

dietmar · Nov 22, 2013

udo said:
With many threads the read-speed ist the same (or a little bit higher) than the write speed - in the VM (and with rados + 1 thread) it's much slower.
Is this normal?

I have no real idea. But I think it would be better to ask those questions on the ceph mailing lists.

e100 · Nov 22, 2013

With rados single thread I got:

Code:

 Total time run:        24.016654
Total reads made:     1232
Read size:            4194304
Bandwidth (MB/sec):    205.191 

Average Latency:       0.0194893
Max latency:           0.026378
Min latency:           0.012442

That corresponds to what I see in the VM itself.

Seems the single IO thread in KVM is the limitation here.

udo · Nov 22, 2013

udo said:
Hi e100,
you don't have an good read-performance with an infiniband-connection?

I searched a little bit, and found an hint on the ceph-mailing list. SSDs are sometimes very slow (ceph use dsync for the journal).
We use three different SSDs, because that not die all SSDs together.

Here the results from my SSDs:

Code:

dd if=/root/randfile of=/mnt/test bs=350k count=10000 oflag=direct,dsync # ssd1: Corsair Force GS 161 MB/s # ssd2: INTEL SSDSC2CW12 126 MB/s # ssd3: Samsung SSD 840 52,6 MB/s

The Samsung is much to slow... I have ordered an new corsair-ssd and report if something changed with the latency.

Udo

Hi,
with the new ssd for journal I got an better performance (610 MB/s write with 4k-blocks instead of 460 MB/s), but the max latencys looks sometimes not better...

Udo

Search

Search

ceph-perfomance and latency

udo

Distinguished Member

e100

Famous Member

udo

Distinguished Member

e100

Famous Member

udo

Distinguished Member

e100

Famous Member

mir

Famous Member

e100

Famous Member

dietmar

Proxmox Staff Member

udo

Distinguished Member

dietmar

Proxmox Staff Member

e100

Famous Member

udo

Distinguished Member

We value your privacy