Removing dead ceph monitor

Apr 22, 2025
4
0
1
Hello,


I recently had a node crash. This node had a monitor on it. I managed to remove this nodes OSDs from the CRUSH map and I also removed the host bucket from CRUSH. However, the entries for the node virt07-clus are still showing up in /etc/pve/ceph.conf

Code:
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.40.0.3/24
        fsid = 63efe2d2-7ce8-4dac-9bab-d14fac873567
        mon_allow_pool_delete = true
        mon_host = 10.30.0.2 10.30.0.1 10.30.0.3 10.30.0.4 10.30.0.7 10.30.0.6
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.30.0.3/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.virt01-clus]
        public_addr = 10.30.0.1

[mon.virt02-clus]
        public_addr = 10.30.0.2

[mon.virt03-clus]
        public_addr = 10.30.0.3

[mon.virt04-clus]
        public_addr = 10.30.0.4

[mon.virt06-clus]
        public_addr = 10.30.0.6

[mon.virt07-clus]
        public_addr = 10.30.0.7


Also the monmap doesn't mention the old monitor

Code:
    �c���|�M����O��5g
                         �4?h��4�K�c[�
                                     virt01-clusb
                                                 virt01-clus
                                                            �
��

virt02-clusb
            virt02-clus
                       �
��

virt03-clusb
            virt03-clus
                       �
��

virt04-clusb
            virt04-clus
                       �
��

virt06-clusb
            virt06-clus
                       �
��

virt02-clus
           virt01-clus
                      virt03-clus
                                 virt04-clus
                                            virt06-clusgot monmap epoch 12

Also /etc/corosync/corosync.conf doesn't mention the old monitor

Are there any commands to remove that dead host, or do I have to copy the file, make the changes and then copy it back for corosync to pick up the changes? I rummaged through the forum but only found threads concerning "ghost monitors" that show up in the GUI

Thanks and best regards,

Alex
 
Last edited:
Hello @SteveITS ,

thanks for the reply. I thought the files unter /etc/pve would get synced using corosync, because it is part of the cluster file system? I removed the old node following the docs, but there ist still mention in the ceph.conf. Right now it is (apparently) just a cosmetic problem, but maybe anybody has recommendatiosn how to remove the entry?

Edit: When i destroy an existing monitor from the GUI it will also be removed from the ceph.conf, but the dead monitor and its IP will still be in the ceph.conf.file

When I manually create an empty mon dir under /var/lib/ceph/mon/virt07-clus and then do a pveceph mon destroy virt07-clus (see https://forum.proxmox.com/threads/ceph-cant-delete-ghost-monitor-solved.143588/) the [mon.virt07-clus] section gets successfully purged from ceph.conf. Now all that remains is the IP in the mon_host line

Is it save to remove the IP address of the dead monitor from the mon_host line? If so, should you copy ceph.conf, do the edit and copy it back or should you directly edit and save /etc/pve/ceph.conf?
 
Last edited:
Hello I have the same Problem, i have edited the /etc/pve/ceph.conf.

And now i get this "Error initializing cluster client: InvalidArgumentError('RADOS invalid argument (error calling conf_read_file)')" on all Nodes.
All Monitors and Quorums are shown as none and no Managers are shown in the GUI.
The Cluster Store is shown with an "?" and not availabale, the VMs are currently running.

Has anybody a solution to fix the Problem?