Ceph min_size 1 for Elasticsearch / MySQL Clusters

chriswiggins · Dec 23, 2024

Hi Proxmox Community,

We are currently running Proxmox 8.3.1 with multiple kubernetes clusters on top, accessing ceph using the ceph-csi plugin. The VMs running on Proxmox have access to the ceph public network. In one of our environments, we have 3 Proxmox nodes, each with 8x800GB SSDs in Dell R6615s, presented through Perc H755s in non-raid mode. The server profile is set to performance.

While this is working quite well for us, we have a huge amount of replication redundancy in our database volumes (and I believe it is causing issues with IOPS and throughput):

Elasticsearch
- We have all of our indices configured with a replication of 1, so each write into Elasticsearch translates into 6 writes (two Elastic nodes write to disk, 3 nodes)
MySQL
- We have galera clusters, all in master mode, so each write translates into 9 writes (three nodes write to disk, 3 nodes)
Redis
- This is less important, but we also have Redis doing the same thing - the Redis cluster is 3 masters and 3 slaves, so in theory every background write task causes 6 writes to the underlying disks.

I have read countless forum posts about why a min_size of 1 is a bad idea with Ceph due to bit-rot etc, however for these specific workloads, and given the applications are doing their own replication, is there a reason why we wouldn't go to a min_size of 1 to reduce I/O load? We are also looking to add some CRUSH map labels into ceph-csi so that reads / writes are directed to the closest OSD.

Given the above, what are the community's thoughts on this? We are also keen to look into anything else we can tune to improve our I/O.

Thanks
Chris

alexskysilk · Dec 23, 2024

chriswiggins said:
Given the above, what are the community's thoughts on this? We are also keen to look into anything else we can tune to improve our I/O.

If you read countless forum posts you likely already know the answer. if you choose to do it anyway- well, its your system. you get to.

chriswiggins · Dec 23, 2024

alexskysilk said:
If you read countless forum posts you likely already know the answer. if you choose to do it anyway- well, its your system. you get to.

Those countless forum posts talk about data loss whereby ceph is the only system providing redundancy. Given we have redundancy on top of ceph, I think the question is valid.

ness1602 · Dec 24, 2024

For elastic, replication doesnt get you failover, but you also get higher read speed because you are getting back results from two nodes, but in your case maybe it makes sense to set replication to 0? And of course get snapshots of indices. As always it depends on what you are storing and retrieving from Elastic, but it also makes sense to work with number of shards per index,etc.

gurubert · Dec 24, 2024

All these applications already replicate their data in the application level. They do not need a storage system that does this.

Let these VMs run on local storage and you will get way better performance than Ceph with size=1.

Search

Search

Ceph min_size 1 for Elasticsearch / MySQL Clusters

chriswiggins

New Member

alexskysilk

Distinguished Member

chriswiggins

New Member

ness1602

Famous Member

gurubert

Distinguished Member

We value your privacy