Help with my first HA CEPH Setup

rengiared

Renowned Member
Sep 8, 2010
96
0
71
Austria
Hi

i would like to take your help to build my first HA Ceph setup and would have some questions about the generel operation of CEPH

For my first basic test i was only able to do it with 3 old Workstations to see the basic functionality of CEPH. It worked quite flawlessly except for the poor performance, but this is the problem of my test-setup with only one OSD and GBit eth per Server.

My current planned Setup would look like this (3 times of course)
Dell PowerEdge R720
2x Xeon E5-2630v2, 2.6GHz
128GB RAM
Perc H710 Controller
Dual Hot-Plug 750W Netzteil
iDrac Enterprise for HA Fencing

The first big question would be if it would be possible or better advisable to use the Dell Internal Dual SD Module to install ProxmoxVE on it.
Of course i would replace the default 2x2GB SD Cards with 2x32GB SDHC Cards (http://geizhals.at/a1131370.html)
Background is that with this i could spare 2 HDD Caddys.

Should i mirror the recommended Journal-SSD.
What happens if a non-mirrored Journal-SSD fails?

With the Storage i'm quite unsure what to take, either 4x4TB 7k2 SAS Disks (http://geizhals.at/a860321.html) or better 6x1TB 10k SATA Disks (http://geizhals.at/a764000.html)
I think the 6x1TB Disks would provide a better performance because of more disks and the better rotation rate they have.
And 6 TB Storage would be perfectly adequate for us at the moment and the coming future.

The Ceph-Storage Network would be implemented by 2x10GBase-T interfaces, here a simple visio how i imagined it.

ceph-ha.JPG


What do you think, will i get a proper performance. At the end i think the disks sould be the limitating factor and not the ceph network.


Furthermore i would have a few short questions about ceph and some constellations that may occur:

- How is the loss of one HDD handled? Delete the associated OSD, change the the disk and create a new OSD?
- If i want to change the Disks against larger ones in the future. Do i just have to change disk per disk and everytime delete and create the new osd and wait for the rebuild after every single change?


I'm very grateful for any help and advisory i can get in this situation.
regards
 
Last edited:
Hello

We've been using a 4 node ceph cluster for production kvm's for a few months. We also backup up our data often and have hot spare non ceph pve systems ready to use.

We use one osd per node. Each node has a 4 disk raid 10 + host spare. used as one osd . The reason is that we want to stay far away from osd failure , as during the failure recovery data input can be very very slow.

For our raid based storage search forum and ceph mail list for ' anti-cephalopod ' .

I'd test putting the journal on the osd instead of using a separate ssd . If the storage system speed is fine then I think it is more reliable set up.

For journal we use 200GB Intel DC S3700 . I picked that after reading a lot of ceph mail list suggestions.
 
thx for your answer and infos
as far as i have seen you weren't lookin for speed, only reliability but i'm looking for both :D

thats why i think the underlying raid wouldn't fit my needs to 100%
in the meantime i thought of something like 12 disks with only 7k2rpm
if i were able to find some comparative values, between different disk types (SAS, SATA, 10k, 7k2, numbers of disks) that would be great
especially at the disks i have to find the sweet spot between price, reliability and speed because i'm trying to get the whole project done with about ~20k€

in the meantime i answered a few of my questions above by my self, so now i know i have to mirror the journal-ssd otherwise all OSDs an the node would fail

one big question is still there. if i have a 3 node system now running and what to add a 4th node, how can i update the integrated storage, cause i can't update the monitors?
eeee.JPG
do i have to shutdown all VMs, remove the storage and add it with the same name again + the additional monitors?
and afterwards the VMs will start like nothing happened?
 
As far as reliability and speed: make sure to practice having an osd fail . the more disks the more you'll run into that.

Also look into setting the 'noout 'flag . In case you were to shut a node down for hardware changes and that took longer then ceph's setting for making a node out, you'd want noout already set . It is not something you'd want to deal with after a node was automatically set as out.

It would be good to have a dedicated forum thread to post disk speeds for different types of ceph clusters. If you find a test then I'll run so you can compare results.

Adding a node - I do not remember the exact steps. That I remember it was easy to do , just set up a new node to cluster and add to ceph from cli, then add mon and osd using ceph gui. Check the wiki page.

On mirroring ssd journal - from what I read on ceph mail list that is not a good thing to do. I do not remember the exact reasons but wanted to mention so you could check for yourself.
 
thx for your answer and infos
as far as i have seen you weren't lookin for speed, only reliability but i'm looking for both :D

Hi,
if you looking for speed take a look on DRBD!

We have an ceph-cluster with 5 OSD-hosts (a 12*4TB hdd + SSD DC3700 for journal), connected with 10GB (not base-t, because much more latency!) and the speed is not very high!
Especially for single threads... some file-server and so on.
If you have many, many VMs the overall performance is perhaps better, but for my usecases I got much more speed with drbd than with ceph!

Nevertheless is ceph an nice thing and I'm looking for further improvements, but for me not realy fast.

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!