Best practice for replacing all OSDs in CEPH cluster

silashorton

New Member
Mar 5, 2025
1
0
1
For our proxmox cluster, we're running ceph on 5 nodes and each node has 8 OSDs each (40 OSDs total). The OSDs are 2TB SSDs and we're planning to replace each OSD with 4TB SSDs.

What's the best practice for replacing all of the OSDs in our ceph cluster? For data integrity, would we need to out+destroy one OSD at a time or could we out+destroy multiple OSDs at once from one node?

Thank you!
 
one OSD at a time or could we out+destroy multiple OSDs at once from one node?
I am not a Ceph specialist, but: in an unmodified setup the "failure domain" is "host". So one node may fail without getting into trouble.

This leads to the conclusion that all OSDs of one node may get replaced at once.

Probably I would test this with a "slow start" approach, watching the load of the re-balancing traffic...

Disclaimer: I've never done this!
 
You don’t have space to just add them? Most servers have many 2.5” slots, and Ceph handles different sizes really well. Otherwise, yes, just one node at a time, you can even do it ‘live’, delete the OSDs on one host, swap the drives and add the new OSDs back, wait for it to rebuild and do the next one. Ceph will automatically rebalance the second you remove an OSD.
 
  • Like
Reactions: UdoB and Johannes S