CEPH monitor running on node but showing ? in GUI

unsichtbarre · Oct 3, 2024

Howdy, NOOB again. With a little help I got POC cluster going with CEPH. Unfortunately, I had to re-install one of the nodes. I removed from cluster (pvecm delnode pve102) and wiped all disks before re-installation.

After re-installation and re-creation of OSDs on all disks, the CEPH cluster returned to semi-health. I created a manager and a monitor on the re-installed node without error.

Unfortunately the monitor is showing stopped (in the GUI) and clicking start doesn't help. I did some googleing and checked the manager status at CLI and it shows running.

Here is some info, what do I do to get GUI to show complete health (or to get complete health?

Code:

root@pve102:~# systemctl status ceph-mon@pve102.service
● ceph-mon@pve102.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Wed 2024-10-02 16:57:40 PDT; 4min 12s ago
   Main PID: 12526 (ceph-mon)
      Tasks: 25
     Memory: 136.5M
        CPU: 1.260s
     CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve102.service
             └─12526 /usr/bin/ceph-mon -f --cluster ceph --id pve102 --setuser ceph --setgroup ceph

Oct 02 16:57:40 pve102 systemd[1]: Started ceph-mon@pve102.service - Ceph cluster monitor daemon.
root@pve102:~# pvecm status
Cluster information
-------------------
Name:             sv4-pve-c1
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Oct  2 17:11:27 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000002
Ring ID:          1.6c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.31.30.101
0x00000002          1 172.31.30.102 (local)
0x00000003          1 172.31.30.103
root@pve102:~# ceph -s
  cluster:
    id:     1101c540-2741-48d9-b64d-189700d0b84f
    health: HEALTH_WARN
            3 daemons have recently crashed

  services:
    mon: 2 daemons, quorum pve101,pve103 (age 4h)
    mgr: pve101(active, since 4h), standbys: pve103, pve102
    osd: 34 osds: 34 up (since 17m), 34 in (since 99m)

  data:
    pools:   2 pools, 33 pgs
    objects: 4.78k objects, 19 GiB
    usage:   61 GiB used, 113 TiB / 114 TiB avail
    pgs:     33 active+clean

  io:
    client:   3.3 KiB/s wr, 0 op/s rd, 0 op/s wr

root@pve103:/etc/pve# cat /etc/pve/ceph.conf
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.0.201.1/16
        fsid = 1101c540-2741-48d9-b64d-189700d0b84f
        mon_allow_pool_delete = true
        mon_host = 10.0.201.1 10.0.203.1 10.0.202.1
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.0.201.1/16

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.pve101]
        public_addr = 10.0.201.1

[mon.pve102]
        public_addr = 10.0.202.1

[mon.pve103]
        public_addr = 10.0.203.1

root@pve103:/etc/pve#

unsichtbarre · Oct 3, 2024

Thought I might add this:

unsichtbarre · Oct 3, 2024

OK, patience is a virtue. After about an hour, it shows running/quorum in GUI. All is well.

Search

Search

CEPH monitor running on node but showing ? in GUI

unsichtbarre

New Member

unsichtbarre

New Member

unsichtbarre

New Member

We value your privacy