RAID 1 and CEPH

jgibbons

New Member
Feb 4, 2021
2
0
1
22
Hi guys!

I read that CEPH does not support hardware RAID. Does this mean that I cannot create a virtual disk in DELL PERC, consisting of two disks in RAID 1, and pass it to CEPH? I like the idea of the CEPH drives having extra redundancy.
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
Hi!

Will it somehow work: probably, at least in a wider sense.

Will it make problems, result in poorer performance and even maybe to subtle errors where no one can really help: highly probable.

I like the idea of the CEPH drives having extra redundancy.
Ceph is the redundancy! At least it should be.

Ceph really wants to manage disks directly and alone, i.e., no RAID or whatever in between more than a plain stupid HBA, SATA controller or PCIe bus.

Just add more OSDs spread over different hosts, and if you want different racks/rooms and let ceph handle redundancy alone,
it can do so much better (on object level granularity with view of the full cluster and all their disks) than any plain RAID, especially proprietary HW RAID controller, which in my experience aren't really a joy to work with...
 
Last edited:

jgibbons

New Member
Feb 4, 2021
2
0
1
22
One more note: For Ceph to work correctly you need a cluster of at least "3" nodes!
Ahh yes I did note that. I come from a VMWare VSAN background. We are planning on launching a 4 node cluster, for extra redundancy. Does this mean that we could configure 2 physical disks in each server with the ability to loose a single disk per node?
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
Does this mean that we could configure 2 physical disks in each server with the ability to loose a single disk per node?

This can be more depending on space usage and time between (disk) outages.

In its basics Ceph is an object store, each object is stored multiple times in the cluster in a redundant way. Normally the 3/2 config is chosen - meaning three copies per objects, and two of that copies required to be written out before ceph gives the writer the OK (avoiding split brains and such).
Now you'll say: yeah well any raid does that, big deal. But there's more:
  1. Ceph does the placement in a smart way. For example, it won't save an object twice on the same OSD (= a disk in Ceph terminology) or even on another OSD in the same host if possible.
  2. Once an OSD (disk) or host goes down ceph notices and tries to restore the lost object copies by using the other copies and re-doing its smart placement until enough copies are again available. This means that as long as there's enough space for the copies and enough different OSDs to spread them all is well (mostly, if OSDs start to fail by the minute you may have a more general problem ;)).
  3. This is all handled fully automated, no stress full manual triggered resilver process required. If an OSD fails, just plugin a new disk set it up as OSD (can be done over the web interface) and Ceph does the rest. Even if a whole host burns down, setup Proxmox VE on a new one, add to cluster install ceph, and you will be good again.
That combined with cheap snapshots, thin-provisioning, division of total space capacity in pools and namespaces for different projects/users/..., and scalability in all directions makes it a really good universal solution to shared storage.

You may want to check out our Ceph chapter in our reference documentation: https://pve.proxmox.com/pve-docs/chapter-pveceph.html
 
  • Like
Reactions: Tmanok and Dunuin

pablomart81

New Member
Dec 9, 2020
13
0
1
41
I have seen that if the ssd disk that I use to save the ceph db breaks, it completely drops all the OSDs that are linked to that ceph db.
In the case, the database disk that I am using can be part of a mirror at the HW raid level?
In the case that it is not possible due to a performance issue, what would be the procedure that must be carried out so that the OSDs are online?
 

alexskysilk

Renowned Member
Oct 16, 2015
803
105
63
Chatsworth, CA
www.skysilk.com
I have seen that if the ssd disk that I use to save the ceph db breaks, it completely drops all the OSDs that are linked to that ceph db.
The short answer is- dont share a db device across multiple OSDs unless you have a sufficiently large deployment. If the deployment is large enough, multiple osd's out on a node does not pose significant risk.

But to answer your question specifically- yes, you can set your db device on a raid controller. never done it so I cant speak to its performance expectation. a SAS12 or 24 controller would probably provide reasonable results.
what would be the procedure that must be carried out so that the OSDs are online?
If the db device is truly dead, you'll need to wipe all osds and recreate. if its alive and present, just rescan the lvms and bring them back online (a reboot is the simplest way to accomplish this.)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!