Question regarding Ceph redundancy

lug-pm · Jul 2, 2025

Hello, I have a basic question about reliability in the Ceph cluster.

A short description of my setup:
- 5 nodes with 5 SATA SSDs each (1.8TB)
- Pool with size = 5 and min_size = 3, default crush rule “replicated rule”

My Cephpool currently has a storage space utilization of approx. 70%.

Yesterday I simulated the failure of a node by shutting it down. As a result, the utilization increased to approx. 91% (which triggered a Ceph warning). After I started up the node again, the utilization dropped back below 90% (89%). Overnight the total size of my Ceph increased again, the used size remained constant.

I actually wanted to switch off a second node, but I was afraid that the cephpool would then be 100% utilized.

Now to my question. I had assumed that the 5/3 setup of my pool would have ensured the reliability of 2 nodes (for both corosync and ceph). Apparently this is not the case? What has ceph been doing all night, my pool is really not slow, and the ceph is connected via 2x 10G LACP).

Search

Search

Question regarding Ceph redundancy

lug-pm

Member

We value your privacy