[SOLVED] Ghost monitor in CEPH cluster

Whatever

Member
Nov 19, 2012
228
6
18
After an update from 5.x to 6.x one CEPH monitors became "ghost"
With status "stopped" and address "unknown"
It can be neither run, created or deleted with errors as below:
create: monitor address '10.10.10.104' already in use (500 )
destroy : no such monitor id 'pve-node4' (500)

I deleted "alive" mons, pools, ods, mgrs and tried to recreate everything from the scratch - mon on the pve-node4 still had status as described above.

I more thing to be noted: even-though PVE GUI shows 4 mons (3 active) there is only one monitor entry in ceph.conf

Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.10.10.0/24
     fsid = c2d639ef-c720-4c85-ac77-2763ecaa0a5e
     mon_allow_pool_delete = true
     mon_host = 10.10.10.101 10.10.10.102 10.10.10.103
     osd_journal_size = 5120
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.10.10.0/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[osd]
     osd_class_update_on_start = false
     osd_max_backfills = 2
     osd_memory_target = 2147483648

[mon.pve-node1]
     host = pve-node1
     mon_addr = 10.10.10.101:6789
Code:
root@pve-node4:~# ceph -s
  cluster:
    id:     c2d639ef-c720-4c85-ac77-2763ecaa0a5e
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum pve-node1,pve-node3,pve-node2 (age 2h)
    mgr: pve-node1(active, since 12h), standbys: pve-node2, pve-node3
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:
Any ideas how to delete mon entry on pve-node4 or reinstall it?
Thanks in advance
 
Last edited:

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
3,917
357
83
31
Vienna
there is probably the systemd service still enabled
try
Code:
systemctl disable ceph-mon@pve-node4
on pve-node4
 

Whatever

Member
Nov 19, 2012
228
6
18
Yeap, systemd service was enabled but disabling does change nothing

ceph log on pve-node4 on mon start:
Code:
Oct 04 13:41:25 pve-node4 systemd[1]: Started Ceph cluster monitor daemon.
Oct 04 13:41:25 pve-node4 ceph-mon[436732]: 2019-10-04 13:41:25.495 7f5aed4ec440 -1 mon.pve-node4@-1(???) e14 not in monmap and have been in a quorum before; must have been removed
Oct 04 13:41:25 pve-node4 ceph-mon[436732]: 2019-10-04 13:41:25.495 7f5aed4ec440 -1 mon.pve-node4@-1(???) e14 commit suicide!
Oct 04 13:41:25 pve-node4 ceph-mon[436732]: 2019-10-04 13:41:25.495 7f5aed4ec440 -1 failed to initialize
Oct 04 13:41:25 pve-node4 systemd[1]: ceph-mon@pve-node4.service: Main process exited, code=exited, status=1/FAILURE
Oct 04 13:41:25 pve-node4 systemd[1]: ceph-mon@pve-node4.service: Failed with result 'exit-code'.
Oct 04 13:41:35 pve-node4 systemd[1]: ceph-mon@pve-node4.service: Service RestartSec=10s expired, scheduling restart.
Oct 04 13:41:35 pve-node4 systemd[1]: ceph-mon@pve-node4.service: Scheduled restart job, restart counter is at 6.
Oct 04 13:41:35 pve-node4 systemd[1]: Stopped Ceph cluster monitor daemon.
Any other suggesting?
 

Whatever

Member
Nov 19, 2012
228
6
18
Not sure it's somehow related but I dont have any OSDs in my cluster by the moment

Code:
root@pve-node4:~# systemctl | grep ceph-
● ceph-mon@pve-node4.service                                                                                                     loaded failed     failed    Ceph cluster monitor daemon                                                  
● ceph-osd@32.service                                                                                                            loaded failed     failed    Ceph object storage daemon osd.32                                            
  ceph-mgr.target                                                                                                                loaded active     active    ceph target allowing to start/stop all ceph-mgr@.service instances at once   
  ceph-osd.target                                                                                                                loaded active     active
 

Whatever

Member
Nov 19, 2012
228
6
18
Nothing changes(
Code:
root@pve-node4:~# pveceph purge
detected running ceph services- unable to purge data
root@pve-node4:~# pveceph createmon
monitor 'pve-node4' already exists
root@pve-node4:~#
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,816
247
63
As @dcsapak said, disable the mon service and then remove the directory /var/lib/ceph/mon/pve-node4.
 

Whatever

Member
Nov 19, 2012
228
6
18
As @dcsapak said, disable the mon service and then remove the directory /var/lib/ceph/mon/pve-node4.
I did it. I even deleted all /var/lib/ceph folder and all ceph* related services in /etc/system.d/.. and rebooted that node

but pveceph purge still says:
Code:
root@pve-node4:~# pveceph purge
detected running ceph services- unable to purge data
what pveceph purge checks for "running ceph services-" ? How can I completely remove ceph from the node and reinstall it?

Thanks
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,816
247
63
what pveceph purge checks for "running ceph services-" ? How can I completely remove ceph from the node and reinstall it?
Any running Ceph service will keep 'pveceph purge' from removing configs. You don't need to re-install Ceph, once you removed all services of Ceph on all nodes and their directories (as als ceph.conf), the Ceph cluster stops to exist. Then you can already create a new cluster.
 

Fabian_E

Proxmox Staff Member
Staff member
Aug 1, 2019
72
6
8
There was another user with the same issue and they were able to get rid of the ghost monitor by manually adding it using CEPH tools and then removing it using PVE GUI. Here is the thread.
 
  • Like
Reactions: eitikipok

Whatever

Member
Nov 19, 2012
228
6
18
Any running Ceph service will keep 'pveceph purge' from removing configs. You don't need to re-install Ceph, once you removed all services of Ceph on all nodes and their directories (as als ceph.conf), the Ceph cluster stops to exist. Then you can already create a new cluster.
Thanks. Managed to delete ceph and reinstall it
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!