Howto recover from single ceph node?

jsterr

Renowned Member
Jul 24, 2020
788
223
68
33
Hi Community,

I tried to recover (testing setup only) from a single ceph node, after loosing 2 nodes of a 3-node-ceph cluster (completly, nodes wont ever come back again)

I managed to fix PVE-Corosync-Quorum but cant get ceph running again, as I cant do any commands or actions via commandline. Also not reading out the current mon-map etc. I managed to get two new pve nodes in the cluster, but cant get ceph running again.

Does anyone now how I can recover from a single ceph node (which still has my data) to a fully working 3 node ceph again?

Thanks for your help,
Jonas
 
First, what about the MONs? That is a bit unclear to me. If you have only one MON left, you need to modify the monmap.

Check the mon status first to know what state it is in, https://docs.ceph.com/en/latest/rad...hooting-mon/#using-the-monitor-s-admin-socket ( and the "understanding mon status section right after).

If it has the other MONs still in the monmap, extract the monmap and remove them. See https://docs.ceph.com/en/reef/rados.../#removing-monitors-from-an-unhealthy-cluster

After that, the single remaining MON should start and be quorate. Then you can add more nodes and new MONs, OSDs and so on to get back up a good cluster.
Assuming that the single remaining node has one replica of all PGs, as in, there are no "unfound" PGs in the cluster.


I hope this helps :)
 
  • Like
Reactions: jsterr
First, what about the MONs? That is a bit unclear to me. If you have only one MON left, you need to modify the monmap.

Check the mon status first to know what state it is in, https://docs.ceph.com/en/latest/rad...hooting-mon/#using-the-monitor-s-admin-socket ( and the "understanding mon status section right after).

If it has the other MONs still in the monmap, extract the monmap and remove them. See https://docs.ceph.com/en/reef/rados.../#removing-monitors-from-an-unhealthy-cluster

After that, the single remaining MON should start and be quorate. Then you can add more nodes and new MONs, OSDs and so on to get back up a good cluster.
Assuming that the single remaining node has one replica of all PGs, as in, there are no "unfound" PGs in the cluster.


I hope this helps :)

Hi Aaron! Thanks! It did help! Steps: Recover PVE-Quorum, edit Monmap on last node, delete old osds, add 2 new nodes, create mon, create osds, recover.
 
  • Like
Reactions: aaron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!