Best Practices for new Ceph cluster


Renowned Member
Feb 28, 2012
Central Texas
Hi all.

I have an existing PVE/Ceph cluster that I am currently upgrading. The PVE portion is rather straight forward, I'm adding the new nodes to the PVE cluster, moving VMs off the old ones, then removing the old ones from the cluster. Easy Peasy.

However, what I don't know is the best approach for my Ceph cluster. I currently have a Ceph cluster of three nodes shared as nodes in the Proxmox cluster (they are running Proxmox and are also used as OSD nodes...which is causing problems as you would expect). Now I have three new dedicated Ceph servers with which I want to replace the old ones. What I don't know is the best way to go about it. Should I set up a new Ceph cluster with the new nodes, add the storage to the PVE nodes, then move images between the two, or should I add the new nodes to the existing PVE/Ceph cluster and remove the old nodes over time dealing with all the re-balancing? Is there a way to add the new servers/OSDs in such a way that they could be in a separate pool with a separate RBD storage instead of being put in the same pool and start balancing onto the new drives?

What is the best practice when replacing Ceph nodes like this?
I take it that you are determined to replace CEPH nodes vice expand CEPH nodes in this upgrade. IMO, it would be better to use a CEPH RBD mirror (one-way) from the existing cluster to the new cluster. Once replication is complete, demote the existing cluster and promote the new cluster. Ensure everything is working and remove the old cluster and mirror configuration. I haven't tried it, but a quick Google search turns up useful information from RedHat and others.

YMMV, but that seems better than add a mode, remove a node, repeat which would result in rebalancing the cluster six times.
Thanks for that advice. I am definitely replacing the older Ceph nodes, not adding to and would like to avoid the re-balancing act at all cost.

I may have another way to go...The reason the current Ceph cluster is being replaced is that it never quite worked right and had issues when I started loading it after a certain point. As a result, I've had to move a number of images back from Ceph to local storage while I investigated. Now I think I have enough local disk space on the right servers to move everything off the Ceph cluster, delete it, and start fresh with a new one. HA and migrations go away during this time, but given the options, it seems like a good choice.
Just a quick update. In one cluster I was able to simply delete all the OSDs and pools, add new OSDs and create new pools. That worked perfectly. On a second cluster, where all the VM images were already on Ceph, I added all the new OSDs and removed the OSDs from one of the existing nodes. After that settled, I removed the second node, and finally the third once all that settled.
The only thing I had to do after that was manually edit the crush map to remove references to the old nodes.


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!