Replacing a Node in a Ceph Cluster

Oct 8, 2019
7
0
6
44
We have a cluster of 3 nodes running Proxmox in a HA Ceph cluster with a 10GbE mesh network for Ceph. We are wanting to completely replace one of the nodes in the cluster, and was wondering what the best way of going about doing that would be? I tried looking through the documentation about this, but was unable to find anything that seemed to address it specifically. Any assistance would be greatly appreciated.
 
Last edited:
You can just replace the node. Maybe set the OSD flags to noout, nobackfill before but Ceph will handle the rest.
There's enough documentation on how to detach and destroy OSDs, but it doesn't matter if you do this while they're in or out. Same goes for the node itself.
 
We have a cluster of 3 nodes running Proxmox in a HA Ceph cluster with a 10GbE mesh network for Ceph. We are wanting to completely replace one of the nodes in the cluster, and was wondering what the best way of going about doing that would be? I tried looking through the documentation about this, but was unable to find anything that seemed to address it specifically. Any assistance would be greatly appreciated.
Dont forget about quorum, 3 nodes in cluster. This way you can create 1 virtual node, after adding new node delete VM
 
That is for the faint-hearted. Replacement of a node should take minutes. You must have more than bad luck if another node fails in exact this time ...
 
You can just replace the node. Maybe set the OSD flags to noout, nobackfill before but Ceph will handle the rest.
There's enough documentation on how to detach and destroy OSDs, but it doesn't matter if you do this while they're in or out. Same goes for the node itself.
Thanks for your reply. To clarify some things: is your suggestion to basically down the node we want to replace, force it to rejoin the cluster using the Recovery documentation, and then recreate the OSDs and then wait for the Ceph pool to reach a recovered state? Just wanting to make sure we're taking everything into account to try and avoid any major pitfalls or so.
 
That's what I would do, yes. If you want to reuse the OSDs there might even be a way to restore them without having the cluster go through a backfill. I didn't dig into the necessary steps so far, but that should be possible. If you deploy a completely new node I'd just create the new OSDs, destroy the old ones that were removed and the enable the backfilling again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!