Request for Best Practices and Guidance on Adding New Disks/OSDs to Our Ceph Cluster

Jun 24, 2025
1
0
1
Objective: Increase storage capacity while ensuring the stability and performance of the cluster.


We would like to request Proxmox’s advice on the following points:


  • The best practices for adding new disks/OSDs to a production Ceph cluster.
  • Any specific configurations or tuning parameters that should be considered when expanding the OSDs.
  • The recommended procedure for adding these disks and integrating them into the Ceph cluster with no downtime.
  • The preliminary checks or validations of cluster health that should be performed before starting .


We look forward to your recommendations to carry out this extension under the best conditions.


Best regards,
 
The preliminary checks or validations of cluster health that should be performed before starting .
cluster should be healthy, so

Code:
$ pveceph status | grep -c HEALTH_OK
1

he recommended procedure for adding these disks and integrating them into the Ceph cluster with no downtime.
Plug the disks in (if they are detected automatically depends on your hardware) and simple add the osd like outlined in the documentation:

Code:
$ pveceph osd create /dev/sd<character(s)>

the data will then be moved around and depending on your network, this takes a while.
 
The best practices for adding new disks/OSDs to a production Ceph cluster.
The general guidelines apply to adding new disks to Ceph. As you add disks, keep these in mind.

Ideally, all hosts and all disks (in a pool) will be the same. For hosts, this means similar CPU, RAM and networking. For disks, this will mean size and performance. The least performant host in a cluster will decide your performance. The least performant disk in a pool will decide the pool's performance. This is more pronounced in smaller clusters.

If you use disks of different sizes, spread them evenly across the hosts. For example, if you have five PVE hosts and your 10 disks are 1TB and 2TB, put one in each host. If you have a bigger mix of drives, aim for a similar number per host and a similar total storage per host.

If you have disks with different performance, you will want to consider separate pools for the different grouping of performance. For example, you will want to avoid mixing NVMe with HDD or SSDs as the slowest drive will limit the overall performance. Please note that disk performance is more than just read and write speeds. IOPS and latency are very important to Ceph.

Any specific configurations or tuning parameters that should be considered when expanding the OSDs.

None come to mind other than putting disks with different performance in different pools. Others on the forums might have some suggestions for you.

The recommended procedure for adding these disks and integrating them into the Ceph cluster with no downtime.

There is nothing you need to do to avoid downtime. Install the drives and add them to Ceph. If you have to power off a host to install the drives, do that one host at a time. You might want to set noout but that is a preference rather than a requirement.

The preliminary checks or validations of cluster health that should be performed before starting .

You should always check Ceph's health before making any significant changes. Adding drives is a lower risk change, but it is still good practice to always check the health.

Look at Ceph in the web GUI or run ceph -s from the command line. I also like to look at Datacenter > NODE > Ceph > OSD and scan the OSDs looking for outliers like high Apply/Commit Latency.