External CEPH performance and more ...

ednt

Renowned Member
Mar 16, 2017
112
7
83
Hi,

we just installed a separate CEPH cluster with Debian 8 and Kraken (because up to now stretch is not officialy supported by CEPH)
The cluster works and everything is connected via 10Gb network.

A first filecopy test on one of the OSDs itself resulted in 6500Mb/s. Fantastic.

Then we installed a new Proxmox 4.4 server.
After a bit of try and error we were able to setup 2 storages via RBD. One for CTs, one for VMs.
A copy test inside of a VM resulted in 150Mb/s.
Ok, let's test a CT, because of direct access... 200Mb/s

Now an additional test was needed:
We installed a CEPH MDS and created a CEPHfs.
Mounted on the Proxmox PC, we got a copy speed of 5500Mb/s.
So it was not a problem of our CEPH cluster.

Then we created a storage inside Proxmox which uses the mounted CEPHfs.
Using this storage for a CT resulted in 1300Mb/s

All not what we expected.
Then I discovered that ceph 0.80.7 is in use with Proxmox 4.4.
So I decided this morning to setup a Proxmox 5.0 beta.

Now we reached also 1300Mb/s in a CT which is in a RDB storage.

But still... compared to the rate on the same PC when a local CEPHfs mount is used, it is a bit frustrating.
(5500 -> 1300Mb/s)

Is this normal?


At the moment we also can not use apt-get update for 5.0. There always occure an error with a ca-file.

Err:6 https://enterprise.proxmox.com/debian/pve stretch/pve-enterprise amd64 Packages
server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

Also a change to pvetest changed nothing.


One other thing:
When we click now on one of the CEPH points in the GUI of Proxmox, it results in an endless 'connection error'.
Ok, the GUI CEPH stuff is, maybe, only for an own CEPH installation.
But since a ceph.conf is missing, this stuff should not be executed.
Or is there a way to show a dashboard over an external CEPH cluster?

Bernd
 
Last edited:
No, if you read a bit below, you can see that I was able to achieve nearly the same speed when I mount the CEPHfs on the Proxmox server by hand and start a file copy in the mounted directory.

The speed of >5000Mb/s is available.

But not inside of a CT or VM :(

Btw. sync on a CEPH cluster ???
 
Last edited:
No, if you read a bit below, you can see that I was able to achieve nearly the same speed ?

I think you should use a reasonable benchmark tool to compare performance. Benchmarking is difficult and you can run into many traps. At least you should post the exact commands you use. For Benchmarks Inside VM/Container the guest configuration is also relevant.
 
Hi,

we just installed a separate CEPH cluster with Debian 8 and Kraken (because up to now stretch is not officialy supported by CEPH)
The cluster works and everything is connected via 10Gb network.

A first filecopy test on one of the OSDs itself resulted in 6500Mb/s. Fantastic.
Hi Bernd,
sorry but it's looks that you measure caching!
Which kind of single osd provide 6.5GB/s write speed?? Are all your OSDs pcie-nvrams?

Depends on your journal device you have doubled writes if journal and osd is on the same device.
See also here because of speed: https://www.sebastien-han.fr/blog/2...-if-your-ssd-is-suitable-as-a-journal-device/

You also don't write anything about amount of OSDs and OSD-Nodes...

Looks with "rados bench -p test 60 write --no-cleanup" if you have an ceph-pool named test.

For write-speeds about 1GB/s you need an good and big ceph-cluster (and more than 10GB-E).

Udo
 
Hi,

my rates are given in bits per second.

How I did my tests:

Very simple but like in daily use, I always used mc in a Debian 8 environment.
Proxmox 4.4 is also a Debian 8, so the 'outer' environment is/was always the same.
I copied an ISO file of 6GB. No cache has such a size.
Since I copied the file with mc it shows you the transferrate. (And I also measured the time)
It is also not the absolute value which makes me wonder.
It is the discrepance between the time from the mounted drive on Proxmox and the time of a container stored on the same mounted drive. As mentioned before: OS is the same, 6GB file is the same.

We use 32 OSDs with 14TB, 4 monitors and journal on SSDs.
But this doesn't matter for the different results, because all scenarios uses the same stuff.

Bernd
 
But this doesn't matter for the different results, because all scenarios uses the same stuff.

This is a very naive viewpoint. You test totally different things, because file test inside the VM results in totally different access pattern.
Again, I recommend to use a reasonable benchmark tool (fio, ...)
 
Hi,

maybe I'm 'naive', but I have to copy files inside my CTs and VMs.
And if this is so much slower, even in a container (which is not the case when I use a local disk at the Proxmox server as storage),
than I have 2 possibilities:
1. Using 'real' servers and CEPH
2. Using Proxmox and use 'local' storage

I thought I can use the best of 2 worlds.
Maybe I was wrong.

I'm not interested in benchmarks, I'm interested in practical results.
It doesn't help if the benchmark is ultra fast, but the time for copying a real file is slow.

If I'm using iSCSI via CEPH as storage (a single point of failure) in a Hyper V server, I get the full speed.
But that's not what I want :(

So I do now some further testing.