Replace dead node that had ceph services running

lucentwolf

Member
Dec 19, 2019
29
8
23
Hello all

I have a dead node (system hd went bust) with ceph OSDs and monitor running. The manual describes that the node can be removed and that a new node with same IP and hostname can in fact be added provided it is a fresh PVE install.

However, with ceph, things might be more difficult, since I was unable to
(a) remove the OSDs (they're still down & out); and
(b) remove the monitor service

Any help & hint is highly appreciated ;-)

This thread is a spin-off upon aaron's suggestion to open a new thread
 
Okay, so the Proxmox VE and Ceph cluster themselves are healthy?

In that case, removing the MON that was on that node should be ceph mon remove {mon-id} (docs)
You probably also need to check the /etc/pve/ceph.conf file and remove that MON from the mon_host line and the section for the MON itself further down in the file.

The OSDs should be removed with ceph osd purge {id} --yes-i-really-mean-it (docs)

The node will also have an entry in the CRUSH map (Ceph -> Configuration in the GUI). To remove it, you can run ceph osd crush remove {bucket-name} (docs) where the bucket name should be the hostname of the lost node.

If you don't see anything related to Ceph from that node anymore, you can go ahead and remove it from the Proxmox VE cluster (docs). There might be remnants in the /etc/pve/priv/authorized_keys and /etc/pve/priv/known_hosts that you should remove.

In /etc/pve/nodes will also be a directory with the node name. It contains all the node specific configs. You can move that folder to another location to have it on the side, in case you still need something from there.
 
aaron, thank you so much! Cleaning up worked fine - I'll add the replacement node soon and report back how it worked out

_^.^_ (edit:typo)
 
Last edited:
  • Like
Reactions: aaron
All right -> apt-dist-upgraded all nodes (including the replacement node) -> added the replacement node to the cluster, installed ceph, wiped (the old) ceph osd disks -> created new osd's -> it's remapping and backfilling -> all looks fine :cool:

again, aaron: tx a lot!!!
 
  • Like
Reactions: aaron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!