fault tolerance of ceph

proxmox_larry

Member
Nov 7, 2019
32
0
6
44
Hey guys,

I'm currently running a 4 Node HA-Ceph-Cluster and I was curious about testing the fault tolerance of it.
According to my observations the cluster itself is still accessible, even with two remaining nodes (due to the changed config of votequorum)

Which configuration do I have to make to access Ceph with 2 remaining nodes?
size=4
min_size=2
Install 4 managers and 4 monitors?

Did I miss something?

Thanks!
 
Did I miss something?
The problematic case here is that you have a potential split-brain situation:
* if 2 of your hosts are enough for quorum and your cluster is still writable after 2 nodes are down then those 2 nodes (that are down) also can become quorate -> you have 2 conflicting views (which are quorate) on the cluster

This is why you always need more than half of the votes and not just exactly half of the votes (which is also why you usually have odd sized (3,5,7) clusters in such setups (both Ceph and PVE's cluster stack belong in this category)

I hope this explains it!
 
Thank you for your fast reply - I understand your concern when it comes to quorum. But here is the thing, votequorum regulates the needed votes for my cluster:
When I have 4 - the expected number of votes is 3.
When I shutdown one node it needs 2 from 3 active ones.
When I shutdown the third one two nodes are remaining and the expected vote is two. That’s the maximum of nodes I can loose to keep the cluster running.

I don’t understand how the powered off nodes can get quorate...

My goal is to keep the cluster and ceph running even if I loose 2 out of 4 nodes.
 
Like I said my 4-Node-Cluster is still quorate when it looses two nodes and ceph is also accessible.
It works.

What others are saying is in the small likely hood you ended up with all 4 servers up but 2 not able to talk to the other two. You would end up with two separate half's of your cluster both working both making changes. When the 4 servers then are able to start talking again you would end up with corruption and a bunch of issues.

Hence why its always suggested to have it setup that this is never possible and only one "half" of the cluster could ever get into Q.

But as others have said, if you had 4 CEPH Mon's and 2 went down CEPH would go read only, you can't force CEPH to have extra votes, as a MON is more than just votes but manging every part of I/O and interaction with the CEPH cluster.
 
  • Like
Reactions: proxmox_larry
I see your point guys, that helps me a lot!

So i case of HA there isn't really no difference between a 3 and 4 node cluster?
The availability and performance is nearly the same?
 
I see your point guys, that helps me a lot!

So i case of HA there isn't really no difference between a 3 and 4 node cluster?
The availability and performance is nearly the same?
no indeed, for availability, with 3 nodes you can loose 1, with 4 nodes you can loose 1, with 5 nodes you can loose 2, with 6 nodes you can loose 2.
 
Hi,
Interested in this matter too...
Is there a rough rule of thumb for calculating the fault tolerance level for a hyperconverged setup ( all nodes running osds as well ) ?
ie: in a X number of nodes hyperconverged cluster, you can loose Y number of nodes and still be able to rw talk to ceph ( regardless of ceph being in a pgs undersized degraded status and having slow running vms performance ).
I assume it also has to do with the number of osds per node, and if that's so, lets consider them as being 6.
Any thoughts ?
Cheers,

Leo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!