Designing Ceph cluster

Feb 12, 2021
6
1
8
32
hello,
I am new in proxmox world I architect and design new infrastructure for proxmox to help my datacenter to provide me the benfits of hyper-converged infrastructure
model to rent VPSS to my customer with high preformance compute as below speces with nvme harddisk

we have 4 compute node Consist :

procceor 32 core* 4 node = 128 core
RAM 192 gb RAM * 4 node = 768 GB RAM

Ceph cluster consist :

Each Node 2 X 2TB Storage X 4 Node = 16 TB with 8 OSD

Each OSD need 1 core and 3 GB RAM for montoring

1 core X 8 OSD = 8 Core consumed from Compute nodes ( 128 core ) = 120 Cores

3 GB RAM X 8 OSD = 24 GB RAM consumed from compute Node (768 RAM) = 744 GB RAM

we require 2 Replica data from 16 TB sotorage come down to 50 % = 8 TB usable in cluster

note : if our cluster consumed to above 90% by .25 precentege cluster is shut down immeditly

Let's imagine our cluster consumed to 75 % from 8 TB = ( 6 TB ) spreed to 4 node then Ech node have 1.5 TB

If one node is Faulier , 6 TB is spreed on 3 node then each node have 2 TB

our peek for Storage Selling is 6 TB ( 75% from 8 TB)

please proxmox geeks ineed some advise :

1- when iwant to etend the ceph cluster i can pluge per node with the smae speces
2- just in case when ineed to extend compute node ican to pluge it and attach it into ceph cluster
 
I honestly don't understand any logic or question here.

The only thing i think i understand is you wanting 2 replicas which is generally a really stupid idea in case of any node failure,
since setting min_size=1 isn't really an option.

Can you please rephrase your questions and add informations like how much ceph cluster nodes you have for example?
 
i design Ceph cluster consist 4 nodes

procceor 32 core X 4 node = 128 core
RAM 192 gb RAM X 4 node = 768 GB RAM

your recommendation please :
what the number of replicas?
just in case I want to add compute node ( CPU and Ram with out storage ) can add it and attach it to ceph cluster ?
 
i design Ceph cluster consist 4 nodes

procceor 32 core X 4 node = 128 core
RAM 192 gb RAM X 4 node = 768 GB RAM

your recommendation please :
what the number of replicas?
just in case I want to add compute node ( CPU and Ram with out storage ) can add it and attach it to ceph cluster ?
I suggest you to have 5 nodes. It's dangerous to have an even number of nodes.
I think that what you want is very similar to one of our clusters, see screenshot below.
We have separated ssd and hdd-sas pools, distributed in 2 datacenters (one PGs replica on each dc).
In a third datacenter we have the fifth node, using ZFS storage for backups.
And we have 3 backups:
· on fs-cepf
· on a Proxmox Backup Server (VM on 5th node)
· on a NAS over ZFS, synced with Dropbox (VM on 5th node)

Captura de pantalla 2021-03-14 a las 12.48.01.png
 
Thank you for your advice . I already re design my cluster to 5 node but iam realy confused from what the recomended of placement group at pool regarding we expected to extend number of nodes you can advice about
Number of pg
Number of replica and minimun of replica we may have 10 nvme each one have 2 TB
 
I'm also not entirely sure what the question is but will try to give some general advice.

1. Hyperconverged means that both compute and storage nodes can exist together, so yes you can add compute nodes to the storage to the ceph cluster. In fact, ceph like many nodes and many disks. In other words, it's better to have more small disks than few large disks (as each disk can work in parallel to another disk), and it's better to have more small nodes than few large nodes (as each node can work in parallel to another node). This is just very general advice but it means that if you want, you can include all the nodes in the same compute/storage hyperconverged cluster and spread the disks more evenly.

2. As mentioned, having min_size=1 is very risky. In short, you want to have size=3 and min_size=2, which means each piece of data is automatically copied to 3 different failure domains (by default nodes) and as long as there are at least 2 copies online, the system works. If you have size=2 and min_size=2, then the pool stops serving the relevant data if you restart one node. That's because the number of copies online goes below min_size. If you reduce min_size to 1, you are risking dataloss if there is an issue while you restart one node. If you have 4 + 5 nodes (compute + storage) you can easily spread the data out over 3 copies, but of course, that reduces available storage to 1/3. But anyone will say that the default setting of size=3 and min_size=2 is the safest. There is the possibility of having different size settings, failure domain settings and placing data on different type of devices (hdd, sdd, nvme) for different pools within the same cluster, so you can decide the safety and speed level depending on the type of data stored.

3. Number of PGs - data is divided into chunks and they are placed in placement groups. WIth more placement groups, the system needs to keep track of more pieces of data. However, if you are running low on space, more placement groups means the system can spread it around more effectively, especially if the disks aren't the same size. Ceph has an autoscaler, which can automatically change the number of PG groups depending on the amount of data. My feeling is that it tends to select a lower PG count than I want, to be sure to use all the space.
 
  • Like
Reactions: UdoB

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!