ceph: no active mgr

lifeboy · Jun 22, 2018

I deleted the initial 1st node of setup by removing all the packages relevant to proxmox, deleting some directories (after making copies) like /var/lib/pvecluster and some /etc config files. Then I followed the prommox on debian installation in principle, but the repositories and more where of course already in place.

The snag is that I have to ceph osd's which contain some images that I do not want to reconstruct, so I'm simply want to activate these osd's into the new ceph cluster on that proxmox node. So far so good, I have added them to ceph with as follows:

Code:

root@yster4:~# ceph-volume lvm activate 0 7ae16f65-2247-402b-9d22-156933faa1b8
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-017e914f-8df0-4417-816b-2c7882150372/osd-block-7ae16f65-2247-402b-9d22-156933faa1b8 --path /var/lib/ceph/osd/ceph-0
Running command: ln -snf /dev/ceph-017e914f-8df0-4417-816b-2c7882150372/osd-block-7ae16f65-2247-402b-9d22-156933faa1b8 /var/lib/ceph/osd/ceph-0/block
Running command: chown -R ceph:ceph /dev/dm-8
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: ln -snf /dev/sdc4 /var/lib/ceph/osd/ceph-0/block.db
Running command: chown -R ceph:ceph /dev/sdc4
Running command: systemctl enable ceph-volume@lvm-0-7ae16f65-2247-402b-9d22-156933faa1b8
Running command: systemctl start ceph-osd@0
--> ceph-volume lvm activate successful for osd ID: 0
root@yster4:~# ceph-volume lvm activate 1 78d51ea4-f175-4ac2-af49-eec1bfd042ac
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-e0536371-ea0e-422d-93af-0b1f5d6d55a0/osd-block-78d51ea4-f175-4ac2-af49-eec1bfd042ac --path /var/lib/ceph/osd/ceph-1
Running command: ln -snf /dev/ceph-e0536371-ea0e-422d-93af-0b1f5d6d55a0/osd-block-78d51ea4-f175-4ac2-af49-eec1bfd042ac /var/lib/ceph/osd/ceph-1/block
Running command: chown -R ceph:ceph /dev/dm-7
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: ln -snf /dev/sdc5 /var/lib/ceph/osd/ceph-1/block.db
Running command: chown -R ceph:ceph /dev/sdc5
Running command: systemctl enable ceph-volume@lvm-1-78d51ea4-f175-4ac2-af49-eec1bfd042ac
Running command: systemctl start ceph-osd@1
--> ceph-volume lvm activate successful for osd ID: 1

Then ceph status was:

Code:

root@yster4:~# ceph status
  cluster:
    id:     35030018-ce1d-4336-b62f-7e5be5334cd4
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum yster4
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

So I did:

Code:

root@yster4:~# ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-0/keyring
added key for osd.0
root@yster4:~# ceph auth add osd.1 osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-1/keyring
added key for osd.1

root@yster4:~# ceph -s
  cluster:
    id:     35030018-ce1d-4336-b62f-7e5be5334cd4
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 1 daemons, quorum yster4
    mgr: no daemons active
    osd: 2 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:     

root@yster4:~# ceph osd tree
ID CLASS WEIGHT  TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       1.81940 root default                           
-2       1.81940     host yster4                         
 0       0.90970         osd.0     down        0 1.00000
 1       0.90970         osd.1     down        0 1.00000

Now the problem is that there is no ceph manager active.

I tried:

Code:

root@yster4:~# ceph-mgr -i ceph-yster4

and then I replaced the key with the one I have in client.admin

Code:

root@yster4:~# ceph auth get client.admin
exported keyring for client.admin
[client.admin]
   key = AQD89Q9bUezDFBAAXZZMuuz9eWVwhH/xwNeLhQ==
   auid = 0
   caps mds = "allow"
   caps mgr = "allow *"
   caps mon = "allow *"
   caps osd = "allow *"

So now I have:

Code:

root@yster4:~# cat /var/lib/ceph/mgr/ceph-yster4/keyring
[mgr.yster4]
#    key = AQArhCxbYiYKIhAAL/e20YNpVtEnRen3czRD0g==
    key = AQD89Q9bUezDFBAAXZZMuuz9eWVwhH/xwNeLhQ==

where the generated key is replaced with my original one.

Code:

root@yster4:~# systemctl status ceph ceph-osd
● ceph.service - PVE activate Ceph OSD disks
   Loaded: loaded (/etc/systemd/system/ceph.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Fri 2018-06-22 07:33:19 SAST; 56min ago
  Process: 155712 ExecStart=/usr/sbin/ceph-disk --log-stdout activate-all (code=exited, status=0/SUCCESS)
 Main PID: 155712 (code=exited, status=0/SUCCESS)
      CPU: 139ms

Jun 22 07:33:19 yster4 systemd[1]: Starting PVE activate Ceph OSD disks...
Jun 22 07:33:19 yster4 systemd[1]: Started PVE activate Ceph OSD disks.
Unit ceph-osd.service could not be found.

root@yster4:~# ceph osd tree
ID CLASS WEIGHT  TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       1.81940 root default                           
-2       1.81940     host yster4                         
 0       0.90970         osd.0     down        0 1.00000
 1       0.90970         osd.1     down        0 1.00000
root@yster4:~# ceph -s
  cluster:
    id:     35030018-ce1d-4336-b62f-7e5be5334cd4
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 1 daemons, quorum yster4
    mgr: no daemons active
    osd: 2 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

How can I activate a ceph manager?

lifeboy · Jun 22, 2018

The log shows:

Code:

Jun 22 07:28:07 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:07 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:09 yster4 ceph-osd[154021]: 2018-06-22 07:28:09.729463 7f15fba53e00 -1 osd.0 69 log_to_monitors {default=true}
Jun 22 07:28:09 yster4 ceph-osd[154021]: 2018-06-22 07:28:09.730421 7f15fba53e00 -1 osd.0 69 init authentication failed: (1) Operation not permitted
Jun 22 07:28:09 yster4 systemd[1]: ceph-osd@0.service: Main process exited, code=exited, status=1/FAILURE
Jun 22 07:28:09 yster4 systemd[1]: ceph-osd@0.service: Unit entered failed state.
Jun 22 07:28:09 yster4 systemd[1]: ceph-osd@0.service: Failed with result 'exit-code'.
Jun 22 07:28:17 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:17 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:20 yster4 systemd[1]: ceph-osd@1.service: Service hold-off time over, scheduling restart.
Jun 22 07:28:20 yster4 systemd[1]: Stopped Ceph object storage daemon osd.1.
Jun 22 07:28:20 yster4 systemd[1]: Starting Ceph object storage daemon osd.1...
Jun 22 07:28:20 yster4 systemd[1]: Started Ceph object storage daemon osd.1.
Jun 22 07:28:20 yster4 ceph-osd[154201]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Jun 22 07:28:26 yster4 ceph-osd[154201]: 2018-06-22 07:28:26.255910 7fb477146e00 -1 osd.1 69 log_to_monitors {default=true}
Jun 22 07:28:27 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:27 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:29 yster4 ceph-mon[143582]: 2018-06-22 07:28:29.841228 7ff768f5e700 -1 log_channel(cluster) log [ERR] : Health check failed: no active mgr (MGR_DOWN)
Jun 22 07:28:29 yster4 systemd[1]: ceph-osd@0.service: Service hold-off time over, scheduling restart.
Jun 22 07:28:29 yster4 systemd[1]: Stopped Ceph object storage daemon osd.0.
Jun 22 07:28:29 yster4 systemd[1]: Starting Ceph object storage daemon osd.0...
Jun 22 07:28:29 yster4 systemd[1]: Started Ceph object storage daemon osd.0.
Jun 22 07:28:29 yster4 ceph-osd[154358]: starting osd.0 at - osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Jun 22 07:28:34 yster4 ceph-osd[154358]: 2018-06-22 07:28:34.100513 7f4c3838ee00 -1 osd.0 69 log_to_monitors {default=true}
Jun 22 07:28:37 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:37 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:47 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:47 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:57 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:28:57 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:29:00 yster4 systemd[1]: Starting Proxmox VE replication runner...
Jun 22 07:29:00 yster4 systemd[1]: Started Proxmox VE replication runner.
Jun 22 07:29:07 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:29:07 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:29:17 yster4 pvestatd[2568]: rados_connect failed - Operation not supported
Jun 22 07:29:17 yster4 pvestatd[2568]: rados_connect failed - Operation not supported

lifeboy · Jun 22, 2018

ceph-osd.0.log attached

Alwin · Jun 22, 2018

I don't think this is the issue here. If your cluster only had one ceph mon and that mon was reinstalled, the mon db is gone. This would mean that you need to recover the mon store from the OSDs.
http://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures

lifeboy · Jun 24, 2018

Thanks for that, Alwin. I abandoned the recovery effort and started from scratch. Too risky to later have some weird problem when things are expected to be stable.

Search

Search

ceph: no active mgr

lifeboy

Renowned Member

lifeboy

Renowned Member

lifeboy

Renowned Member

Attachments

Alwin

Proxmox Retired Staff

lifeboy

Renowned Member