[Ceph] Can't delete ghost monitor [SOLVED]

fcarucci

New Member
May 13, 2023
26
10
3
Hello,

I have a 3 nodes ceph cluster with 3 OSDs and after some misadventures I ended up reinstalling a node from scratch and adding it back to the proxmox cluster. Ceph is rebalancing as expected.
But the old monitor from the reinstalled node is showing up as "unknown" and I can't seem to be able to delete it. I've read all the forum posts, but I can't get it to go away and I can not create a new monitor on that node.

This is the error I get when I try to delete the existing unknown monitor:
hostname lookup 'undefined' failed - failed to get address info for: undefined: Name or service not known (500)

This is what I get if I try to stop the monitor:
entry has no host

This is what I get if I try to create a new one from the GUI or the command line:
command 'monmaptool --clobber --addv pve '[v2:10.0.20.1:3300,v1:10.0.20.1:6789]' --print /tmp/monmap' failed: exit code 1
root@pve:~# pveceph destroymon pve
monitor filesystem '/var/lib/ceph/mon/ceph-pve' does not exist on this node

root@pve:~# pveceph createmon
monmaptool: monmap file /tmp/monmap
monmaptool: map already contains mon.pve


Mon dump
Code:
root@pve-ceph1:~# ceph mon dump
epoch 3
fsid 8ae0a6fc-9140-4301-95ba-08ec6c78b220
last_changed 2024-03-16T09:05:05.556663-0700
created 2024-03-15T22:06:10.274733-0700
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:10.0.20.4:3300/0,v1:10.0.20.4:6789/0] mon.pve-ceph1
1: [v2:10.0.20.5:3300/0,v1:10.0.20.5:6789/0] mon.pve-ceph2
2: [v2:10.0.20.1:3300/0,v1:10.0.20.1:6789/0] mon.pve
dumped monmap epoch 3

This is my ceph.conf
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.30.0.4/24
         fsid = 8ae0a6fc-9140-4301-95ba-08ec6c78b220
         mon_allow_pool_delete = true
         mon_host = 10.0.20.4 10.0.20.5
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.0.20.4/24
         mon_cluster_log_to_file = false


[osd]
        osd_scrub_begin_hour = 0
        osd_scrub_end_hour = 7


[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring


[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring


[mds.pve-ceph1]
         host = pve-ceph1
         mds_standby_for_name = pve


[mds.pve-ceph2]
         host = pve-ceph2
         mds_standby_for_name = pve


[mon.pve-ceph1]
         public_addr = 10.0.20.4


[mon.pve-ceph2]
         public_addr = 10.0.20.5

How do I get rid of this monitor? Thanks!
 
Last edited:
Hi,

This is what I get if I try to create a new one from the GUI or the command line:
command 'monmaptool --clobber --addv pve '[v2:10.0.20.1:3300,v1:10.0.20.1:6789]' --print /tmp/monmap' failed: exit code 1
root@pve:~# pveceph destroymon pve
monitor filesystem '/var/lib/ceph/mon/ceph-pve' does not exist on this node
May you try the following:

Code:
mkdir -p /var/lib/ceph/mon/<Ceph-MonID>
pveceph mon destroy <Ceph-MonID>

You have to replace the `<Ceph-MonID>` in the above commands with Ceph Monitor name. See [0] for more information.

[0] https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_monitors