[SOLVED] Ceph : OneMonitor down after a server crash

Dark26

Renowned Member
Nov 27, 2017
304
36
68
48
Bonjour;

One of my 3 servers crash.

After reboot , every thing is ok, but the monitor on it, would'n't start.

i tried to restart the server, destroy the monitor , recreate it. No luck.

So now, i only have two monitor :

Code:
root@p2:~# ceph -s
  cluster:
    id:     4124cd8e-01ed-4a0d-b97b-737100ffccd2
    health: HEALTH_WARN
            mon p1 is low on available space

  services:
    mon: 2 daemons, quorum p1,p3 (age 40h)
    mgr: p1(active, since 46h), standbys: p3, p2
    mds: cephfs:1 {0=p3=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 40h), 3 in (since 45h)

  data:
    pools:   3 pools, 250 pgs
    objects: 16.19k objects, 62 GiB
    usage:   184 GiB used, 173 GiB / 357 GiB avail
    pgs:     250 active+clean

  io:
    client:   62 KiB/s rd, 733 KiB/s wr, 1 op/s rd, 40 op/s wr

I think the monitor is running :

Code:
root@p2:/var/lib/ceph# service ceph-mon@p2 status
● ceph-mon@p2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Sat 2019-11-23 18:09:53 CET; 1 day 16h ago
Main PID: 28053 (ceph-mon)
    Tasks: 27
   Memory: 778.9M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@p2.service
           └─28053 /usr/bin/ceph-mon -f --cluster ceph --id p2 --setuser ceph --setgroup ceph

nov. 23 18:09:53 p2 systemd[1]: Started Ceph cluster monitor daemon.
nov. 24 00:00:00 p2 ceph-mon[28053]: 2019-11-24 00:00:00.833 7f96be5d2700 -1 Fail to open '/proc/266427/cmdline' error = (2) No such file or directory
nov. 24 00:00:00 p2 ceph-mon[28053]: 2019-11-24 00:00:00.861 7f96be5d2700 -1 received  signal: Hangup from <unknown> (PID: 266427) UID: 0
nov. 24 00:00:00 p2 ceph-mon[28053]: 2019-11-24 00:00:00.885 7f96be5d2700 -1 received  signal: Hangup from pkill -1 -x ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw  (PID: 266429) UID:
nov. 25 00:00:00 p2 ceph-mon[28053]: 2019-11-25 00:00:00.751 7f96be5d2700 -1 received  signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  (PID: 805408) UID
nov. 25 00:00:00 p2 ceph-mon[28053]: 2019-11-25 00:00:00.775 7f96be5d2700 -1 received  signal: Hangup from pkill -1 -x ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw  (PID: 805409) UID:
lines 1-17/17 (END)

But on the interface, i have status ; stopped , adress : unknown , quorum no.

Any idea how to "clean up " the monitor , in order to recerate it correctly ?

Thanks.

dark26


Solution : with this :

https://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/

i succeeded to repair the monitor.
 
Last edited:
  • Like
Reactions: Alwin