Ceph min_size 1 for Elasticsearch / MySQL Clusters

Dec 23, 2024
2
0
1
New Zealand
Hi Proxmox Community,

We are currently running Proxmox 8.3.1 with multiple kubernetes clusters on top, accessing ceph using the ceph-csi plugin. The VMs running on Proxmox have access to the ceph public network. In one of our environments, we have 3 Proxmox nodes, each with 8x800GB SSDs in Dell R6615s, presented through Perc H755s in non-raid mode. The server profile is set to performance.

While this is working quite well for us, we have a huge amount of replication redundancy in our database volumes (and I believe it is causing issues with IOPS and throughput):

  • Elasticsearch
    • We have all of our indices configured with a replication of 1, so each write into Elasticsearch translates into 6 writes (two Elastic nodes write to disk, 3 nodes)
  • MySQL
    • We have galera clusters, all in master mode, so each write translates into 9 writes (three nodes write to disk, 3 nodes)
  • Redis
    • This is less important, but we also have Redis doing the same thing - the Redis cluster is 3 masters and 3 slaves, so in theory every background write task causes 6 writes to the underlying disks.
I have read countless forum posts about why a min_size of 1 is a bad idea with Ceph due to bit-rot etc, however for these specific workloads, and given the applications are doing their own replication, is there a reason why we wouldn't go to a min_size of 1 to reduce I/O load? We are also looking to add some CRUSH map labels into ceph-csi so that reads / writes are directed to the closest OSD.

Given the above, what are the community's thoughts on this? We are also keen to look into anything else we can tune to improve our I/O.

Thanks
Chris
 
For elastic, replication doesnt get you failover, but you also get higher read speed because you are getting back results from two nodes, but in your case maybe it makes sense to set replication to 0? And of course get snapshots of indices. As always it depends on what you are storing and retrieving from Elastic, but it also makes sense to work with number of shards per index,etc.