Ceph monitor "out of quorum" on 3 node cluster, can I remove and readd?

millercentral

Member
Apr 17, 2023
4
0
6
I have a 3 node proxmox cluster running Ceph. Recently is gave a warning that one of the three monitors is down or "out of quorum".

Code:
root@pve-02:~# ceph -s
  cluster:
    id:     f9b7ff0a-17b9-40d8-b897-cebfffb0ee8d
    health: HEALTH_WARN
            1/3 mons down, quorum pve-01,pve-03

  services:
    mon: 3 daemons, quorum pve-01,pve-03 (age 81m), out of quorum: pve-02
    mgr: pve-03(active, since 5w), standbys: pve-01, pve-02
    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 3m), 3 in (since 5w)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 279.30k objects, 1.0 TiB
    usage:   3.1 TiB used, 7.8 TiB / 11 TiB avail
    pgs:     97 active+clean

  io:
    client:   12 KiB/s rd, 204 KiB/s wr, 1 op/s rd, 27 op/s wr

Restarting the monitor from the Proxmox UI, and rebooting the node did not resolve it.

Is the correct next step to remove and readd the monitor on pve-02? Are there any risks to the ceph cluster to be aware of before doing that? Thanks in advance for any help.
 
The correct step is to analyze why it is out of quorum by checking the logs (/var/log/ceph/ceph-mon*) of that MON and the other two when you restart the mon at pve-02. From there, you will probably get clues about what's going on, if it can be fixed, etc.

You should be able to remove it and re-add it with zero impact in the cluster given that you have the other two MONs up and quorate.