is CEPH stable for environment production critical?

cesarpk

Well-Known Member
Mar 31, 2012
770
3
58
Hello people

I had read this link:
http://pve.proxmox.com/wiki/Ceph_Server

But as on this link say that is as technology preview, i think by the moment i will test ceph running independently to the PVE Nodes

In this terms:
1- I want to know what so stable is CEPH for use in environment production critical?
2- Can i run BB.DD. in the KVM VMs without problems of perfomance by access to disk? (always thinking to use a single NIC 10 Gb. ethernet or dual 10 Gb. ethernet over LACP 802.3ad, both options are for use exclusive of CEPH network communication)

Please, I welcome your comments, especially the negative part of CEPH

Best regards
 
Just to clarify, are you asking if CEPH itself stable enough to run in a production environment ? Or are you asking about the stability of the new upcoming release of Proxmox CEPH Server where both Proxmox and CEPH can be on same node and all CEPH management can be done from Proxmox GUI?

If you are asking the stability of CEPH on its own, then my comment on is, it is very much stable to be used in Production environment. I have 3 setups where CEPH is used as storage backbone and serving about 30 users avg. in each setup. Its stability and resiliency is hard to match with other solutions out there. But i will suggest if you need faster IO access such as database server, it helps to have separate CEPH pools of SSD and put the virtual server there. Increases much performance. Things also speeds up if used minimum 3 nodes and 12 OSDs. None of my setup has 10GB backbone, File transfer is still acceptable.

If your interest is in the new Proxmox CEPH Server, i would say hold off to that and give time to let it mature a bit. That way it can gaurantee safe operations after few bugs has been taken care of.
 
Hello people

I had read this link:
http://pve.proxmox.com/wiki/Ceph_Server

But as on this link say that is as technology preview, i think by the moment i will test ceph running independently to the PVE Nodes

In this terms:
1- I want to know what so stable is CEPH for use in environment production critical?
2- Can i run BB.DD. in the KVM VMs without problems of perfomance by access to disk? (always thinking to use a single NIC 10 Gb. ethernet or dual 10 Gb. ethernet over LACP 802.3ad, both options are for use exclusive of CEPH network communication)

Please, I welcome your comments, especially the negative part of CEPH

Best regards
Hi,
ceph is stable, but you can also run in trouble with ceph.

This happens normaly due fails of the admin ;-)
e.g. if your disks are to full, or you change to much at one time (reweighting of disks, ingreases of pgs...).

I have do some test with the new pve-ceph solution and it's looks good. On the other side, I have an ceph-cluster in production, where the performance is not good egough (but this is no generally ceph-problem).
Try since weeks to isolate the issue, but it's not so easy to find (the pve-ceph from pvetest performs better (and use only Gigabit) than the 10Gbit Ceph-cluster...).

Udo
 
Thanks symmcom and udo for your comments

@udo:
what is your isolate issue?

@experienced user:
please give your feedback

Best regards
Cesar
 
Just wanted to note that there will be a new ceph release (firefly) in a few weeks. This is expected
to be even more stable, and it is marked as 'long term' supported.
 
Hi,
ceph is stable, but you can also run in trouble with ceph.

This happens normaly due fails of the admin ;-)
e.g. if your disks are to full, or you change to much at one time (reweighting of disks, ingreases of pgs...).

I have do some test with the new pve-ceph solution and it's looks good. On the other side, I have an ceph-cluster in production, where the performance is not good egough (but this is no generally ceph-problem).
Try since weeks to isolate the issue, but it's not so easy to find (the pve-ceph from pvetest performs better (and use only Gigabit) than the 10Gbit Ceph-cluster...).

Udo

Hi,

may i ask what "not so good" means in figures? I'm running a test ceph installation, 3 nodes, each with 4 SATA disks with 4 OSDs/node, using 2GBit links (bonding). I'm getting up to 120MB/s, and i'm wondering if 10GBit links would still improve rates in this case... :)
 
Meaning only release with ceph-lts or release with ceph-nlts to be able to support newer ceph features not propagated to lts yet.

We will include LTS libraries by default. But the user is free to run newer version.
 
Hi,

may i ask what "not so good" means in figures? I'm running a test ceph installation, 3 nodes, each with 4 SATA disks with 4 OSDs/node, using 2GBit links (bonding). I'm getting up to 120MB/s, and i'm wondering if 10GBit links would still improve rates in this case... :)

tell exactly what benchmark do you run and I can run a similar one in our lab.
 
Hi,

may i ask what "not so good" means in figures? I'm running a test ceph installation, 3 nodes, each with 4 SATA disks with 4 OSDs/node, using 2GBit links (bonding). I'm getting up to 120MB/s, and i'm wondering if 10GBit links would still improve rates in this case... :)
Hi,
in my setup I have 4 storage-nodes with 52 4TB-hdds (13 in each host). An separate 10GB-Ceph cluster connection and 10GB to the pve-hosts. My reads are only app. 40 MB/s (inside VM) - with enabled scrub and deep-scrub 25MB/s only...
If I clear the VM cache and read the File again (from the cache of the osd-hosts) I got 177MB/s - that is ok for one thread.

If I test the single components, all looks ok - network speed 9.7GB/s with iperf; reading of an single disk is fast and if I move hdds from one node to the other the "rebuild" has values of 400MB/s, sometimes 1237 MB/s and so on...
I switch the mon-host from the osd-host, switch the osds from self-formating xfs to chephdeploy (little different format parameter (inodes)), change the OS from debian to ubuntu (perhaps an driver problem?) and at this time I move all disks from one node to an new one, because the old has an strange format of the omap-files ( /var/lib/ceph/osd/ceph-26/current/omap/003543.ldb instead of 003543.sst) - perhaps after that all it's better?!

Udo
 
Is there a reason for staying with 0.67.x instead of going with 0.72.2 right now? I've been running a 0.72.2 cluster and it seems quite stable.
 
Hi,
in my setup I have 4 storage-nodes with 52 4TB-hdds (13 in each host). An separate 10GB-Ceph cluster connection and 10GB to the pve-hosts. My reads are only app. 40 MB/s (inside VM) - with enabled scrub and deep-scrub 25MB/s only...
If I clear the VM cache and read the File again (from the cache of the osd-hosts) I got 177MB/s - that is ok for one thread.

If I test the single components, all looks ok - network speed 9.7GB/s with iperf; reading of an single disk is fast and if I move hdds from one node to the other the "rebuild" has values of 400MB/s, sometimes 1237 MB/s and so on...
I switch the mon-host from the osd-host, switch the osds from self-formating xfs to chephdeploy (little different format parameter (inodes)), change the OS from debian to ubuntu (perhaps an driver problem?) and at this time I move all disks from one node to an new one, because the old has an strange format of the omap-files ( /var/lib/ceph/osd/ceph-26/current/omap/003543.ldb instead of 003543.sst) - perhaps after that all it's better?!

Udo

interesting figures; do you use SSDs for journals? i am using one SSD per 4 OSDs - which is the recommended maximum. Without these separate journals, my throughputs are significantly smaller; i also increased journal size to 10GB/OSD (default was only 1GB, if i remember correctly).
I used a Win2k8r2 for testing; image caching also yielded in differents speeds - i found "writeback" to be fastest, and it also "scaled" quite good: running crystaldisk on 3 different Ceph-Volumes in parallel, each Volume still has > 60MB/sec (Random Read/Write Test, Block Size = 512KB). I found this to be quite promising...
 
interesting figures; do you use SSDs for journals? i am using one SSD per 4 OSDs - which is the recommended maximum. Without these separate journals, my throughputs are significantly smaller; i also increased journal size to 10GB/OSD (default was only 1GB, if i remember correctly).
I used a Win2k8r2 for testing; image caching also yielded in differents speeds - i found "writeback" to be fastest, and it also "scaled" quite good: running crystaldisk on 3 different Ceph-Volumes in parallel, each Volume still has > 60MB/sec (Random Read/Write Test, Block Size = 512KB). I found this to be quite promising...
Hi,
yes I'm using SSD for journaling (3GB file, not partition), but only 2 SSDs for 13 osd-disks. But the performances issue is also during reads and the SSDs are wonly for writes, so I guess it's has nothing to to with this...


Udo
 
I've not tested CEPH with SSD disks for journals or for OSDs.
In my setup of 4 nodes and 12 crappy (old, used, ready to fail ) SATA disks for OSDs performance was barely acceptable.

I did find it promising that I was unable to permenatly break CEPH even when disks failed (as expected would happen)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!