Ceph min_size for large clusters

alejandroed

New Member
Sep 9, 2021
2
0
1
38
Hi guys!

I’ve a dilema with medium to big size clusters between 5 and 15 nodes. Working with ceph replica 3 and default min_size 2 if I have a two node failure the service will be interrupted but two node failure in a 15 node cluster is not difficult at all.

How dangerous do you think it is to use a min-size of 1 working with 5 monitors in the cluster?

Talking about hyperconverged clusters, what you think that’s a reasonably cluster node count and ceph min_size?

For example, I think that three 5-node cluster with 3 monitors each and min_size 2 is more reliable than 15-node cluster with min_size 1 and five monitors. In the first scenario we can tolerate “up to” 3 node failures without stopping operations and in the second one only two are allowed.
 

wigor

Member
Dec 5, 2019
40
5
8
Hi!

then you have only one copy of your data left. In case of problems.
min-size is not regarding monitors, but data copies. Nodes that serve data by osd. You must differ between hypervisor cluster and ceph cluster / osd nodes.
If you are planning to loose more than one node at a time, than you should increase the replica to 5 or so. Then you can loose 3 ceph osd nodes.
 

alejandroed

New Member
Sep 9, 2021
2
0
1
38
Yes, I know that min-size is not relevant to monitors, but you can't run a ceph cluster without quorated monitors. In the 15-node (5 monitor) scenario with min-size 1, you have three replicas (replica=3 is not negotiable) and and even if two monitor are failing, you would have quorum in your ceph cluster and at least 1 copy healthy and no operation interruptions.

By the other hand in the three 5-node cluster (3 monitor each), you can't go to min-size=1 because in case of two node failure maybe you can survive with only one replica available, but not without monitor quorum if the nodes failing are monitors.

There is another possibility to run three 5-node cluster with all nodes monitors in all the clusters, these configuration plus min-size=1 will allow to operate with up to 6 node failure (2 nodes on each cluster) maintaining proxmox and ceph quorated. Maybe 5-node all monitors cluster is insane?

I'm talking about the best design to survive in case of catastrophe with 15 hyperconverged nodes, replica 5 is not an option due to the tremendous amount of space it wastes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!