Considering CEPH

stuartbh

Active Member
Dec 2, 2019
120
11
38
59
Forum members,

I am interested in moving to CEPH and am as well desirous of configuring it intelligently. I have enumerated my concerns herein below and perhaps forum members can post of their experiences so I can design a deployment that makes sense.

1) I have a mix of SSDs, NVMe, and mechanical hard drives on 4 Intel NUCs I am using.
2) I have several x3650 M1 and one x3650 M3 servers that I wish to add into the cluster (they have a mix of SAS and SATA mechanical drives).
3) I would like to have CEPH use erasure coding instead of replication
4) I would like CEPH to engage compression, and encryption
5) Where would BlueStore fit into all of this (as this is a new CEPH cluster)?

Any ideas on how I can best plan migrating to CEPH given these considerations would be most welcome.

Stuart
 
Last edited:
1) I have a mix of SSDs, NVMe, and mechanical hard drives on 4 Intel NUCs I am using.
while you didnt mention how many of which type per node, this is likely not ideal. for best results, have 4 of the same type of backing device per node.
3) I would like to have CEPH use erasure coding instead of replication
no you dont. EC is completely unsuited to virtualization workload, and even then you'd want a LOT more nodes. 4 nodes doesnt have room for any workable profile.
4) I would like CEPH to engage compression, and encryption
in line compression is "free" and easy. encryption is less so, and presents challenges and tradeoffs in practice. consider WHY you want it before proceeding; if its just "because I want to" I'd suggest not.
5) Where would BlueStore fit into all of this (as this is a new CEPH cluster)?
that's the on-disk format for your osds. its default in current versions of ceph, so you dont need to worry about it.
 
Alexskysilk,

Please realize I am not trying to argue your points but obtain a more nuanced understanding of them and their relation to my environment. Surely, I want to do what is sensible but also, this is not an enterprise production environment either (it's a home lab).

while you didnt mention how many of which type per node, this is likely not ideal. for best results, have 4 of the same type of backing device per node.

Well the nodes have different sizes and types (NVMe vs SSD) in them and I was hoping that by using CEPH I could have them all leveraged together and replicas would be spread across them all. So if one cluster member had a 2TB NVMe and a 500GB SSD and the other had a 2TB SSD and 256GB NVMe they could all participate as devices within the amalgam of CEPH devices to provide network wide space. This is for a home lab so performance is not a top priority, as long as things are not painfully slow.

no you dont. EC is completely unsuited to virtualization workload, and even then you'd want a LOT more nodes. 4 nodes doesnt have room for any workable profile.

I am not disputing your assertion, however, I would like to understand it. My understanding is EC is analogous to RAID5, and RAID5 is surely not antiseptic to virtualization (qcow2 files). That is interesting that you say that about 4 nodes, as there are several videos of people building home lab CEPH clusters with just 3 nodes.

in line compression is "free" and easy. encryption is less so, and presents challenges and tradeoffs in practice. consider WHY you want it before proceeding; if its just "because I want to" I'd suggest not.

that's the on-disk format for your osds. its default in current versions of ceph, so you dont need to worry about it.

The reason to encrypt is that when a device dies and needs to be replaced there is no meaningful data thereupon, which to me is a very good reason to encrypt data. That said, what are the challenges to using encryption? All of my servers have AES capable Intel processors in them also.

Stuart
 
Well the nodes have different sizes and types (NVMe vs SSD) in them and I was hoping that by using CEPH I could have them all leveraged together and replicas would be spread across them all. So if one cluster member had a 2TB NVMe and a 500GB SSD and the other had a 2TB SSD and 256GB NVMe they could all participate as devices within the amalgam of CEPH devices to provide network wide space. This is for a home lab so performance is not a top priority, as long as things are not painfully slow.
you CAN do that, but realize that the cluster will operate as fast as the slowest OSD. number of OSDs is still consequential, and such a large disparity in OSD capacity will yield very lopsided utilization; in this case, your cluster will perform as if you only had one slow disk.

My understanding is EC is analogous to RAID5
Kind of, but not in the important way. RAID scales only in one dimension- disks on the host. ceph uses EC across all nodes, which means every write only has n OSD members in a stripe (where n is the number of OSD nodes.) with 4 nodes, that means either a 3K+1N in a single parity profile MAX.
RAID5 is surely not antiseptic to virtualization (qcow2 files)
It is, on two levels- 1. a single stripe means only one queue to process IO requests- a RAID10 equivalent on 4 disks would be TWICE as available with two queues. in a virtualized environment you have multiple initiators (each vm would be one) all vying for the attention of the same underlying storage. 2. RAID5 has single parity, which means you are one fault away from operating without a safety net. Nevertheless, single parity in a ceph environment is a non starter, since you'd be effectively dead in the water when a node reboots.
That is interesting that you say that about 4 nodes, as there are several videos of people building home lab CEPH clusters with just 3 nodes.
none of them try to run ec ;)

The reason to encrypt is that when a device dies and needs to be replaced there is no meaningful data thereupon,
That is default in any distributed storage. data doesnt have to be encrypted to be completely unusable from a single OSD.
 
alexskysilk,

I want to first say thank you for engaging in this colloquy as I do appreciate your perspective and learning more about CEPH.

you CAN do that, but realize that the cluster will operate as fast as the slowest OSD. number of OSDs is still consequential, and such a large disparity in OSD capacity will yield very lopsided utilization; in this case, your cluster will perform as if you only had one slow disk.

So, let us presume I ascertain the speed of the different disks I have (be they mechanical, SSD, or NVMe), how do I group the different speed drives to that one group (say mechanical) is used for say backups at night and NVMe is used for storing active virtual machines on or what have you? I would presume CEPH has some sort of volumes that can differentiate the different groups of drives?

Kind of, but not in the important way. RAID scales only in one dimension- disks on the host. ceph uses EC across all nodes, which means every write only has n OSD members in a stripe (where n is the number of OSD nodes.) with 4 nodes, that means either a 3K+1N in a single parity profile MAX.

It is, on two levels- 1. a single stripe means only one queue to process IO requests- a RAID10 equivalent on 4 disks would be TWICE as available with two queues. in a virtualized environment you have multiple initiators (each vm would be one) all vying for the attention of the same underlying storage. 2. RAID5 has single parity, which means you are one fault away from operating without a safety net. Nevertheless, single parity in a ceph environment is a non starter, since you'd be effectively dead in the water when a node reboots.

none of them try to run ec ;)

Well, I presume you are right as I have never heard of any of those small CEPH installations using EC, so I presume that is correct and something I did not consider until I thought more deeply about it after reading your reply.

That is default in any distributed storage. data doesnt have to be encrypted to be completely unusable from a single OSD.

I do want to make sure my data is encrypted.

As I am contemplating all of your responses I am beginning to realize that CEPH is not really usable for the home lab user, or for my purposes. It seems (as far as I can tell thus far) that it does not offer the feature set I am looking for and described originally. I will have to see what other methodologies are available as I now realize CEPH cannot encrypt my data either.

ZFS runs my VMs on RAID-Z1 no problem, I never lost a single bit of data nor had any ugly response times, and ZFS has been encrypting and compressing my data flawlessly with not even a single burp. Thus, maybe for what I desire to have feature wise I am just contemplating to use the wrong technology choice in CEPH and should just stick with ZFS via NFS shares.

Thank you for your thoughts on all of this.

Stuart
 
Last edited:
the use case is secondary to what it takes to run ceph with satisfactory result.

you need, at minimum, 3 nodes, 4 osds per node, enough networking links at an adequate speed to supply the disks, ~4G ram/OSD, 1 core/OSD. You havent mentioned the latter, but since the only prerequisite you meet is the node count I expect your conclusion is correct.
 
  • Like
Reactions: takeokun
I run 2 Ceph clusters in my home environment and while not setup both hardware and network wise to production standards, does meet my needs.

My first cluster is 7 nodes with 2 1Gb links and 2 SSD OSDs per node. It won't win any speed contests but it offers simular performance from either cluster to when I was using a 2 QNAP NASs to host my VMs via NFS.


The second cluster is 3 nodes with 2 1Gb links and 4 HDD OSDs per node. This cluster hosts my docker volumes and has similar performance to when it was on a single QNAP NAS over NFS but also solved the issue of SQL lite databases corrupting when using NFS.