[SOLVED] ceph mgr and mon issue after upgrade to Luminous

RobFantini

Famous Member
May 24, 2012
2,023
107
133
Boston,Mass
after upgrade here ceph status:
Code:
# ceph -s
  cluster:
    id:     75bc38f7-d42c-449b-88ed-488c7778a551
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum 2,sys8,1
    mgr: no daemons active
    osd: 18 osds: 18 up, 18 in
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   2005 MB used, 7952 GB / 7954 GB avail
    pgs:

so no managers are active.

I tried to delete a mon. then was going to recreate it but got this:
Code:
# pveceph destroymon sys8
ceph manager directory '/var/lib/ceph/mgr/ceph-sys8' not found

Is there another way to delete mons ?

I figure delete and then add will create the managers.
 
I figure delete and then add will create the managers.
this is correct

# pveceph destroymon sys8
ceph manager directory '/var/lib/ceph/mgr/ceph-sys8' not found
did you do this on the host where the 'sys8' monitor is? (the web interface does this automatically on the correct host)
 
the mons were deleted , the missing mdr directory was just a warning.

Now like another recent thread I can no create a mon.
Code:
# pveceph createmon
got timeout
 
Did you destroy all monitors? Because then your cluster is no longer quorate. Then you have to extract the monmap, edit it to contain the monitors you want (at least a quorate amount) and inject it back into all nodes.
 
ok I am following: http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/?highlight=extract monmap

2-No quorum? Grab the monmap directly from another monitor (this assumes the monitor you are grabbing the monmap from has id ID-FOO and has been stopped): ceph-mon -i ID-FOO --extract-monmap /tmp/monmap

i am using cli on the only node that has a mon.

Code:
sys10  ~ # ls /var/lib/ceph/mon/
ceph-1/

sys10  ~ # ceph-mon -i  1 --extract-monmap /tmp/monmap
2017-09-04 10:45:57.943927 7ff4458c1f80 -1 IO error: lock /var/lib/ceph/mon/ceph-1/store.db/LOCK: Resource temporarily unavailable

2017-09-04 10:45:57.944052 7ff4458c1f80 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-1': (22) Invalid argument

am I doing something wrong ?
 
now that ceph status times out , pve is starting to have issues. for instance when I click 'summary' for a node or vm 'status' info is missing .

and a vm reported a time issue. that can lead to data issues.

while I'd like to debug and fix this, I may need to scrap our ceph setup and start over.

Just in case we run in to issues with vm's - what is the command to stop ceph?
 
Code:
USAGE: pveceph stop [<service>]
  Stop ceph services.
  <service>  (mon|mds|osd|mgr)\.[A-Za-z0-9\-]{1,32}
             Ceph service name.
 
thanks and another question.
after this
Code:
sys10  ~ # systemctl stop ceph-mon@1
sys10  ~ # ceph-mon -i  1 --extract-monmap /tmp/monmap
2017-09-04 11:11:55.788874 7f44e5c6ff80 -1 wrote monmap to /tmp/monmap
how do decode /tmp/monmap in to an editable text file? I know how to do for crush. and can not find how to yet.. still searching
 
i think i found it..
monmaptool --print /tmp/monmap
Code:
monmaptool: monmap file /tmp/monmap
epoch 35
fsid 75bc38f7-d42c-449b-88ed-488c7778a551
last_changed 2017-09-04 10:13:46.194524
created 2017-02-26 09:11:33.212436
0: 10.11.12.8:6789/0 mon.sys8
1: 10.11.12.10:6789/0 mon.1
so will try to just inject that file..
 
Code:
sys10  ~ # ceph-mon -i 1 --inject-monmap /tmp/monmap
sys10  ~ # systemctl start ceph-mon@1

# systemctl status ceph-mon@1
● ceph-mon@1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2017-09-04 11:22:45 EDT; 21s ago
 Main PID: 21556 (ceph-mon)
    Tasks: 20
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@1.service
           └─21556 /usr/bin/ceph-mon -f --cluster ceph --id 1 --setuser ceph --setgroup ceph

Sep 04 11:22:45 sys10 systemd[1]: Started Ceph cluster monitor daemon.

However ceph -s still times out..
 
If only mon@1 is running that's no surprise. mon@sys8 has to run too, for the cluster to be quorate.
 
probably i need inject the other mon. will try to do so at correct node.
You really shouldn't run commands without understanding what they do.
Your monmap contains two mons. 1 and sys8. So the IPs they are assigned should be the nodes that run those monitors.
 
so at sys8:
Code:
# ceph-mon -i sys8  --inject-monmap /tmp/monmap    
2017-09-04 11:31:42.511553 7f58f27bcf80 -1 monitor data directory at '/var/lib/ceph/mon/ceph-sys8' does not exist: have yo
u run 'mkfs'?

is there a way around that?
 
It's basically telling you that the monitor doesn't exist. So you need to create it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!