[SOLVED] Ceph : OneMonitor down after a server crash

Dark26

Well-Known Member
Nov 27, 2017
267
24
58
47
Bonjour;

One of my 3 servers crash.

After reboot , every thing is ok, but the monitor on it, would'n't start.

i tried to restart the server, destroy the monitor , recreate it. No luck.

So now, i only have two monitor :

Code:
root@p2:~# ceph -s
  cluster:
    id:     4124cd8e-01ed-4a0d-b97b-737100ffccd2
    health: HEALTH_WARN
            mon p1 is low on available space

  services:
    mon: 2 daemons, quorum p1,p3 (age 40h)
    mgr: p1(active, since 46h), standbys: p3, p2
    mds: cephfs:1 {0=p3=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 40h), 3 in (since 45h)

  data:
    pools:   3 pools, 250 pgs
    objects: 16.19k objects, 62 GiB
    usage:   184 GiB used, 173 GiB / 357 GiB avail
    pgs:     250 active+clean

  io:
    client:   62 KiB/s rd, 733 KiB/s wr, 1 op/s rd, 40 op/s wr

I think the monitor is running :

Code:
root@p2:/var/lib/ceph# service ceph-mon@p2 status
● ceph-mon@p2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Sat 2019-11-23 18:09:53 CET; 1 day 16h ago
Main PID: 28053 (ceph-mon)
    Tasks: 27
   Memory: 778.9M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@p2.service
           └─28053 /usr/bin/ceph-mon -f --cluster ceph --id p2 --setuser ceph --setgroup ceph

nov. 23 18:09:53 p2 systemd[1]: Started Ceph cluster monitor daemon.
nov. 24 00:00:00 p2 ceph-mon[28053]: 2019-11-24 00:00:00.833 7f96be5d2700 -1 Fail to open '/proc/266427/cmdline' error = (2) No such file or directory
nov. 24 00:00:00 p2 ceph-mon[28053]: 2019-11-24 00:00:00.861 7f96be5d2700 -1 received  signal: Hangup from <unknown> (PID: 266427) UID: 0
nov. 24 00:00:00 p2 ceph-mon[28053]: 2019-11-24 00:00:00.885 7f96be5d2700 -1 received  signal: Hangup from pkill -1 -x ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw  (PID: 266429) UID:
nov. 25 00:00:00 p2 ceph-mon[28053]: 2019-11-25 00:00:00.751 7f96be5d2700 -1 received  signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  (PID: 805408) UID
nov. 25 00:00:00 p2 ceph-mon[28053]: 2019-11-25 00:00:00.775 7f96be5d2700 -1 received  signal: Hangup from pkill -1 -x ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw  (PID: 805409) UID:
lines 1-17/17 (END)

But on the interface, i have status ; stopped , adress : unknown , quorum no.

Any idea how to "clean up " the monitor , in order to recerate it correctly ?

Thanks.

dark26


Solution : with this :

https://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/

i succeeded to repair the monitor.
 
Last edited:
  • Like
Reactions: Alwin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!