Replacing a failed Ceph/Proxmox node

Drkrieger

Active Member
Jul 27, 2018
14
10
43
42
Hello!

I'm in the process of testing out a small 3 node Proxmox/Ceph cluster, and in my failure/fault testing I've run into an issue rebuilding a node. I've gotten the node to join the Proxmox cluster just fine, however the monitor node still existed in the Ceph cluster.
When I removed the node for testing, I used the 'Ceph' commands to remove the OSDs, as well as the monitors (not realizing there was the 'pveceph destroymon' command), but now I'm unable to remove the monitor via GUI or command line. Whenever I try and run 'pveceph destroymon <hostname>', it give me the following error:
root@ceph-01:~# pveceph destroymon mon.ceph-01
400 Parameter verification failed.
monid: value does not match the regex pattern
pveceph destroymon <monid> [OPTIONS]

When I try and remove it in the web interface, I get this error:
monitor filesystem '/var/lib/ceph/mon/ceph-ceph-01' does not exist on this node (500)

I tried creating the new monitor with the same name, but I get an error stating that the monitor already exists (but it's not in quorum, or functioning at all). When listing the ceph monitors using the ceph command line, it only lists the actual active monitors. The web interface is still showing the old monitor.

Is there any other way to remove the old config for the original monitor?

Thanks in advance for any help!
 
Nevermind, I found the solution. For those of you that break things like me, the solution is to edit the '/etc/pve/ceph.conf' and clear out the lines that are used by the original monitor. As soon as this is done, you can create the monitor without issue.
 
  • Like
Reactions: AlexLup