Ceph and failed drives

May 28, 2025
1
0
1
Good afternoon. We are new to Proxmox and looking to implement CEPH. We are planning to have 3 identical servers with 15 OSDs in each server. Each OSD will be SSD drives with 1.6TB of storage. What I am after is how many drives can fail on one node before that node would be considered downgraded? Can we configure each node to handle two drive failures or does one drive failure put the node into a downgraded state?

Appreciate any help and pointers.

Thanks
Chris
 
Each OSD is separate. If one is down Ceph will recreate those data copies/chunks to other drives. In the case of 3 nodes it will be on that node to maintain the 3/2 replications. (if more than 3 nodes it will be any node that doesn't already have a copy)

Ceph will be in a warning state until 3 replicas are achieved.

Note this means in a 3 node cluster, if an OSD is down then the data needs to fit into the remaining OSDs on the node...see this post. With a node failure then Ceph has both minimum and maximum 2 copies on the remaining 2 nodes. It will function until another OSD failure occurs, at which point some chunks will have below 2 replicas, until they can be copied to working OSDs.
 
  • Like
Reactions: UdoB