First, let me contextualize our set-up: We have a 3 node cluster, where we will be using CEPH for storage hyperconvergence.
We are familiarizing ourselves with CEPH and would love to have someone more experienced chiming in:
All of our storage hardware are SSDs. (24x 2TB NVMe, 8 per server)
We want to be able to tolerate 1 server going down, and have no downtime for our VMs.
The question I've been working at answering is: What's the most storage efficient configuration we can go with to maximize our available storage space?
After diving through the CEPH documentation, this is what I found regarding Erasure Coded Pools:
K is the number of OSDs worth of available storage we will have, and we can afford to lose M OSDs, the total OSD count being (K+M).
min_size should be set to K+1, and if we go below min_size, we cannot write to the CEPH RBDs any longer
If we aim for a 4+2 (16+8, 66% efficiency) Erasure Code pool, we can afford to lose 1/3rd of our drives, and recover from that without data loss.
But we will be having downtime, because of the min_size parameter. (K+1 would total 17).
Following this logic, I am assuming that the most efficient CEPH configuration possible for a 3 node cluster with 24 OSDs is to have K=15 and M=9, with 62.5% storage efficiency, allowing us to operate normally with one server being down due to min_size=16 (K+1).
Are any of my assumptions here wrong? Have I misinterpreted the CEPH Docs in any way?
Is anyone else running a 3 node cluster out there with CEPH?
I would love to hear some other opinions regarding my setup.
Thank you in advance,
We are familiarizing ourselves with CEPH and would love to have someone more experienced chiming in:
All of our storage hardware are SSDs. (24x 2TB NVMe, 8 per server)
We want to be able to tolerate 1 server going down, and have no downtime for our VMs.
The question I've been working at answering is: What's the most storage efficient configuration we can go with to maximize our available storage space?
After diving through the CEPH documentation, this is what I found regarding Erasure Coded Pools:
K is the number of OSDs worth of available storage we will have, and we can afford to lose M OSDs, the total OSD count being (K+M).
min_size should be set to K+1, and if we go below min_size, we cannot write to the CEPH RBDs any longer
If we aim for a 4+2 (16+8, 66% efficiency) Erasure Code pool, we can afford to lose 1/3rd of our drives, and recover from that without data loss.
But we will be having downtime, because of the min_size parameter. (K+1 would total 17).
Following this logic, I am assuming that the most efficient CEPH configuration possible for a 3 node cluster with 24 OSDs is to have K=15 and M=9, with 62.5% storage efficiency, allowing us to operate normally with one server being down due to min_size=16 (K+1).
Are any of my assumptions here wrong? Have I misinterpreted the CEPH Docs in any way?
Is anyone else running a 3 node cluster out there with CEPH?
I would love to hear some other opinions regarding my setup.
Thank you in advance,