ceph storage distribution

mxscbv

Member
Jan 25, 2022
37
0
6
38
I have a 3-node cluster (e.g. server1, server2, server3) located somewhere at DC1 with ceph and HA configured.

Now I'm adding 3 more nodes (e.g. server4, server5, server6) at DC2 and creating a 6-node cluster.

My question is: How do I let set up ceph to always have a copy at another DC's node? For example, I run a VM on server1 (at DC1), and I want to always have a copy on one of the servers at DC2, like a failover if one of the DC fails.

What do I need to do? Just increase the number of replicas?

Thanks.
 
First off, how far away are these two DCs? Or more precisely, what is the latency?

The other thing can either be done with a custom CRUSH rule or by leveraging the newer stretch cluster mode of Ceph.

But in any case, you need to think about two cluster stacks. The Proxmox VE one and the Ceph one.
In any case, you will need to have a witness node for Ceph and Proxmox VE in a 3rd location.

And calculate with more OSD space because each DC will need to hold 2 replicas and the pools will be run with size=4, min_size=2
 
First off, how far away are these two DCs? Or more precisely, what is the latency?
These are cities next to each other, so the latency is negligible.
The other thing can either be done with a custom CRUSH rule or by leveraging the newer stretch cluster mode of Ceph.
I can only see the 'replicated_rule' crush rule. How do I use the newer one?
In any case, you will need to have a witness node for Ceph and Proxmox VE in a 3rd location.
What is the purpose of it and can I read the docs?
 
These are cities next to each other, so the latency is negligible.
Test it! Corosync, used for the Proxmox VE cluster communication, needs low latency (~2ms). Any latency will also have an effect on how fast your Ceph cluster will be, as data needs to be written to all involved OSDs on not cached writes.

I assume that you will have some kind of dark fiber with enough bandwidth in between the DCs?

I can only see the 'replicated_rule' crush rule. How do I use the newer one?
Checkout the Ceph docs regarding stretch clusters: https://docs.ceph.com/en/latest/rados/operations/stretch-mode/

What is the purpose of it and can I read the docs?
Both clusters (Proxmox VE and Ceph Mons) work by forming a majority. If you have an equal number of nodes in each DC and an equal number of Ceph Mons (should be at least 2) in each DC, what happens if one of the two DCs fails? You still have 50% of the votes, but you need more than that. Therefore, the Ceph cluster as well as Proxmox VE will be in a blocked state. This is why you need a witness- / tiebreaker node somewhere else.

In a hyperconverged setup this should be another Proxmox VE node because we use the /etc/pve directory to store the Ceph config file and automatically sync it throughout the hyperconverged cluster. This way that node will be the tie breaker for Proxmox VE and Ceph.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!