remove dead CEPH monitor after removing cluster node?

athompso

Renowned Member
Sep 13, 2013
129
8
83
I removed a PVE cluster node that was also a CEPH monitor (no OSD, just MON).
Of course, I forgot to remove the CEPH monitor before removing the node from the cluster.

When I attempt to remove the monitor from the PVE GUI, of course it fails because it's trying to cleanly remove it.

If I delete it from ceph.conf, it continues to show up in the GUI but now with hostname "unknown".

It was never referenced in storage.cfg (oops...) so I don't have to worry about that.

The only problem appears to be the permanent CEPH warning state, which makes monitoring (no pun intended) difficult, but I would still like to fix that.

(I'm not the only one with this problem; http://forum.proxmox.com/threads/18941-How-to-remove-a-dead-CEPH-nodes reports a similar situation, but with no answers. Maybe my post will get some answers...)
 
I did read that. Several times, today.

Ah... on about the 4th or 5th pass through that document, I decided to test "ceph mon remove 3", and that does the trick.

Thereafter, I also needed to edit /etc/pve/pve.conf (aka /etc/ceph/ceph.conf) to remove the reference to the missing monitor, and /etc/pve/storage.cfg to fill in the missing monitors.
 
  • Like
Reactions: karnz and AlexLup
I run into a same situation where, when deleting a ceph monitoring, it is away from CEPH - config, but still showing up on the list of monitors with "unknown" and also still is on the list of monitors when creating rbd storage. I did not find any config file in /etc/pve so I suppose another location, does anyone has a trick to do it?
 
Last edited:
I have done ceph mon remove NODEID and then remove it from /etc/ceph/ceph.conf, but as you mentioned it is still in the GUI as Uknown
It needs some time until the state on the GUI is updated.
 
  • Like
Reactions: fireon
If the nodes some got failed and is not online anymore, how can we do this?
I have done ceph mon remove NODEID and then remove it from /etc/ceph/ceph.conf, but as you mentioned it is still in the GUI as Uknown
workable in pve5.3? can you explain the steps
 
Anyone got a solution for this? The ceph monitor was removed, but it is still visible in GUI with a question mark.
Edit: Solved, as the monitor node was dead anyways, i just removed the node from cluster with 'pvecm delnode nodename' and removed the node directory from /etc/pve/nodes while connected to a working node
 
Last edited: