[SOLVED] CephFS stuck in 'resolve' state

Jun 8, 2016
344
75
93
48
Johannesburg, South Africa
I got a little excited about being able to run multiple MDS instances for CephFS and brought all MDS instances up concurrently on 3 separate Ceph clusters. After restarting the MDS instances they are stuck in 'resolve' state and appear to be limited by the 'allow ' permission, instead of 'allow *'.

I would appreciate some assistance with resolving the problem. I only store ISO images on CephFS, to provide a shared file system for all cluster nodes, so the problem isn't critical and I have some time to try to fix things before blowing away CephFS and starting new.

Firstly, how I set it up on Jewel (before upgrading to Luminous):
Code:
cat >> /etc/ceph/ceph.conf <<EOF
[mds]
     mds data = /var/lib/ceph/mds/$cluster-$id
     keyring = /varar/lib/ceph/mds/$cluster-$id/keyring
[mds.kvm1a]
     host = kvm1a
[mds.kvm1b]
     host = kvm1b
[mds.kvm1c]
     host = kvm1c
EOF


On each host (kvm1a, kvm1b, kvm1c):
Code:
id='kvm1a';
apt-get -y install ceph-mds;
mkdir -p /var/lib/ceph/mds/ceph-$id;
ceph auth get-or-create mds.$id mds 'allow ' osd 'allow *' mon 'allow rwx' > /var/lib/ceph/mds/ceph-$id/keyring;
chown ceph.ceph /var/lib/ceph/mds -R;
systemctl enable ceph-mds@$id;
systemctl start ceph-mds@$id;


Finally create CephFS:
Code:
ceph osd pool create cephfs_data 12;    # 2 x OSDs
ceph osd pool create cephfs_metadata 12;    # 2 x OSDs
ceph fs new cephfs cephfs_metadata cephfs_data;

I like being able to easily (un/)mount '/var/lib/vz' so I define it in /etc/fstab. There is a regression with 'noauto' and 'x-systemd.automount' in systemd, in Debian Stretch, so I disable the delayed mount to ensure systemd requirements resolve appropriately and then mount it via rc.local:
Code:
vi /etc/fstab;
  id=admin,conf=/etc/ceph/ceph.conf /var/lib/vz fuse.ceph defaults,_netdev,noauto,nonempty,x-systemd.requires=ceph.target 0 0
vi /etc/rc.local
  sleep 60 && mount /var/lib/vz;

Last piece it telling Proxmox that '/var/lib/vz' is shared:
Code:
vi /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        maxfiles 0
        shared
        content iso,vztmpl


Everything was working perfectly, but only one of the 3 MDS instances is ever active so I wanted to try the multi-mds feature available in Celph Luminous:
Code:
[root@kvm1a ~]# ceph fs status
cephfs - 3 clients
======
+------+--------+-------+---------------+-------+-------+
| Rank | State  |  MDS  |    Activity   |  dns  |  inos |
+------+--------+-------+---------------+-------+-------+
|  0   | active | kvm1a | Reqs:    0 /s |   46  |   29  |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 1232k |  968G |
|   cephfs_data   |   data   | 12.6G |  968G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
|    kvm1b    |
|    kvm1c    |
+-------------+
MDS version: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)


[root@kvm1a ~]# ceph fs set cephfs allow_multimds yes
enabled creation of more than 1 active MDS
[root@kvm1a ~]# ceph fs set cephfs max_mds 3

[root@kvm1a ~]# ceph fs status
cephfs - 3 clients
======
+------+--------+-------+---------------+-------+-------+
| Rank | State  |  MDS  |    Activity   |  dns  |  inos |
+------+--------+-------+---------------+-------+-------+
|  0   | active | kvm1a | Reqs:    0 /s |   46  |   29  |
|  1   | active | kvm1c | Reqs:    0 /s |   10  |   11  |
|  2   | active | kvm1b | Reqs:    0 /s |   10  |   11  |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 1235k |  968G |
|   cephfs_data   |   data   | 12.6G |  968G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)


Everything was still working at this stage. CephFS had created additional ranks and divided it among the MDS instances. Ceph status was complaining about there being no standby MDS instances, due to 'standby_count_wanted' being 1 by default:
Code:
[root@kvm1a ~]# ceph -s
  cluster:
    id:     97eac23e-10e4-4b53-aa2d-e013e50ff782
    health: HEALTH_WARN
            insufficient standby MDS daemons available
            noout flag(s) set

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: kvm1a(active), standbys: kvm1b, kvm1c
    mds: cephfs-3/3/3 up  {0=kvm1a=up:active,1=kvm1c=up:active,2=kvm1b=up:active}
    osd: 6 osds: 6 up, 6 in
         flags noout

  data:
    pools:   3 pools, 280 pgs
    objects: 199k objects, 782 GB
    usage:   2316 GB used, 3269 GB / 5586 GB avail
    pgs:     280 active+clean

  io:
    client:   8186 B/s rd, 1004 kB/s wr, 0 op/s rd, 88 op/s wr


When restarting a MDS instance it however remains in 'resolve' state' and CephFS became unavailable:
Code:
[root@kvm1c ~]# systemctl restart ceph-mds@kvm1c
[root@kvm1c ~]# systemctl status ceph-mds@kvm1c
● ceph-mds@kvm1c.service - Ceph metadata server daemon
   Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mds@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2017-11-20 07:33:08 SAST; 5s ago
 Main PID: 10984 (ceph-mds)
    Tasks: 21
   CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@kvm1c.service
           └─10984 /usr/bin/ceph-mds -f --cluster ceph --id kvm1c --setuser ceph --setgroup ceph

Nov 20 07:33:08 kvm1c systemd[1]: Started Ceph metadata server daemon.
Nov 20 07:33:08 kvm1c ceph-mds[10984]: starting mds.kvm1c at -

[root@kvm1c ~]# ceph fs status
cephfs - 3 clients
======
+------+---------+-------+---------------+-------+-------+
| Rank |  State  |  MDS  |    Activity   |  dns  |  inos |
+------+---------+-------+---------------+-------+-------+
|  0   |  active | kvm1a | Reqs:    0 /s |   46  |   29  |
|  1   | resolve | kvm1c |               |    0  |    0  |
|  2   |  active | kvm1b | Reqs:    0 /s |   10  |   11  |
+------+---------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 1235k |  968G |
|   cephfs_data   |   data   | 12.6G |  968G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)


[root@kvm1c ~]# ceph -s
  cluster:
    id:     97eac23e-10e4-4b53-aa2d-e013e50ff782
    health: HEALTH_WARN
            1 filesystem is degraded
            insufficient standby MDS daemons available
            noout flag(s) set

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: kvm1a(active), standbys: kvm1b, kvm1c
    mds: cephfs-3/3/2 up  {0=kvm1a=up:active,1=kvm1c=up:resolve,2=kvm1b=up:active}
    osd: 6 osds: 6 up, 6 in
         flags noout

  data:
    pools:   3 pools, 280 pgs
    objects: 199k objects, 782 GB
    usage:   2316 GB used, 3269 GB / 5586 GB avail
    pgs:     280 active+clean

  io:
    client:   2728 B/s rd, 1300 kB/s wr, 0 op/s rd, 118 op/s wr


The following log output is from 'kvm1c' where I restarted the mds instance:
Too long to include, available here: https://paste.ofcode.org/jGkGjSzfuWmLh3QKruEpV6


I recovered from my stupidity on another 2 clusters perfectly (I didn't restart the MDS processes there):

The short version:
Code:
[root@kvm5b ~]# ceph fs set cephfs max_mds 1
[root@kvm5b ~]# ceph mds deactivate cephfs:2
telling mds.1:2 10.254.1.4:6800/1105446330 to deactivate
[root@kvm5b ~]# watch -d "ceph -s"
[root@kvm5b ~]# ceph mds deactivate cephfs:1
telling mds.1:1 10.254.1.5:6800/963736084 to deactivate
[root@kvm5b ~]# watch -d "ceph -s"
[root@kvm5b ~]# ceph fs set cephfs standby_count_wanted 1
[root@kvm5b ~]# ceph fs set cephfs allow_multimds no
disallowed increasing the cluster size past 1


The granular output version:
Code:
[root@kvm1 ~]# ceph -s
  cluster:
    id:     c49b0dce-44a7-4546-9b16-3864d30f8833
    health: HEALTH_WARN
            noout flag(s) set

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: kvm1(active), standbys: kvm2, kvm3
    mds: cephfs-3/3/3 up  {0=kvm2=up:active,1=kvm3=up:active,2=kvm1=up:active}
    osd: 14 osds: 14 up, 14 in
         flags noout

  data:
    pools:   3 pools, 1084 pgs
    objects: 391k objects, 1565 GB
    usage:   4701 GB used, 3030 GB / 7732 GB avail
    pgs:     1084 active+clean

  io:
    client:   681 B/s rd, 411 kB/s wr, 0 op/s rd, 52 op/s wr

[root@kvm1 ~]# ceph fs set cephfs max_mds 1
[root@kvm1 ~]# ceph -s
  cluster:
    id:     c49b0dce-44a7-4546-9b16-3864d30f8833
    health: HEALTH_WARN
            noout flag(s) set

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: kvm1(active), standbys: kvm2, kvm3
    mds: cephfs-3/3/1 up  {0=kvm2=up:active,1=kvm3=up:active,2=kvm1=up:active}
    osd: 14 osds: 14 up, 14 in
         flags noout

  data:
    pools:   3 pools, 1084 pgs
    objects: 391k objects, 1565 GB
    usage:   4701 GB used, 3030 GB / 7732 GB avail
    pgs:     1084 active+clean

  io:
    client:   1022 B/s rd, 500 kB/s wr, 0 op/s rd, 63 op/s wr

[root@kvm1 ~]# ceph mds deactivate cephfs:2
telling mds.1:2 1.1.7.9:6800/2986760451 to deactivate
[root@kvm1 ~]# ceph -s
  cluster:
    id:     c49b0dce-44a7-4546-9b16-3864d30f8833
    health: HEALTH_WARN
            noout flag(s) set

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: kvm1(active), standbys: kvm2, kvm3
    mds: cephfs-2/2/1 up  {0=kvm2=up:active,1=kvm3=up:active}, 1 up:standby
    osd: 14 osds: 14 up, 14 in
         flags noout

  data:
    pools:   3 pools, 1084 pgs
    objects: 391k objects, 1565 GB
    usage:   4701 GB used, 3030 GB / 7732 GB avail
    pgs:     1084 active+clean

  io:
    client:   23526 B/s rd, 528 kB/s wr, 1 op/s rd, 80 op/s wr

[root@kvm1 ~]# ceph mds deactivate cephfs:1
telling mds.1:1 1.1.7.11:6800/3470392200 to deactivate
[root@kvm1 ~]# ceph fs set cephfs standby_count_wanted 1
[root@kvm1 ~]# ceph fs set cephfs allow_multimds no
disallowed increasing the cluster size past 1

[root@kvm1 ~]# ceph fs status
cephfs - 3 clients
======
+------+--------+------+---------------+-------+-------+
| Rank | State  | MDS  |    Activity   |  dns  |  inos |
+------+--------+------+---------------+-------+-------+
|  0   | active | kvm2 | Reqs:    0 /s |   26  |   24  |
+------+--------+------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata |  184k |  490G |
|   cephfs_data   |   data   |  557M |  490G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
|     kvm3    |
|     kvm1    |
+-------------+
MDS version: ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)

[root@kvm1 ~]# ceph -s
  cluster:
    id:     c49b0dce-44a7-4546-9b16-3864d30f8833
    health: HEALTH_WARN
            noout flag(s) set

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: kvm1(active), standbys: kvm2, kvm3
    mds: cephfs-1/1/1 up  {0=kvm2=up:active}, 2 up:standby
    osd: 14 osds: 14 up, 14 in
         flags noout

  data:
    pools:   3 pools, 1084 pgs
    objects: 391k objects, 1565 GB
    usage:   4701 GB used, 3030 GB / 7732 GB avail
    pgs:     1084 active+clean

  io:
    client:   12953 B/s rd, 931 kB/s wr, 0 op/s rd, 86 op/s wr
 
Last edited:
I'm essentially looking for assistance on resolving the 'resolving' state in a CephFS MDS instance, so that I can undo multi-mds in CephFS:

Herewith the log output after restarting the MDS instance:
Code:
2017-11-20 07:33:08.339749 7f02e2b2a640  0 set uid:gid to 64045:64045 (ceph:ceph)
2017-11-20 07:33:08.339783 7f02e2b2a640  0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 10984
2017-11-20 07:33:08.345473 7f02e2b2a640  0 pidfile_write: ignore empty --pid-file
2017-11-20 07:33:12.435106 7f02db096700  1 mds.kvm1c handle_mds_map standby
2017-11-20 07:33:12.442049 7f02db096700  1 mds.1.171 handle_mds_map i am now mds.1.171
2017-11-20 07:33:12.442060 7f02db096700  1 mds.1.171 handle_mds_map state change up:boot --> up:replay
2017-11-20 07:33:12.442078 7f02db096700  1 mds.1.171 replay_start
2017-11-20 07:33:12.442088 7f02db096700  1 mds.1.171  recovery set is 0,2
2017-11-20 07:33:12.442097 7f02db096700  1 mds.1.171  waiting for osdmap 1385 (which blacklists prior instance)
2017-11-20 07:33:12.448277 7f02d4889700  0 mds.1.cache creating system inode with ino:0x101
2017-11-20 07:33:12.448690 7f02d4889700  0 mds.1.cache creating system inode with ino:0x1
2017-11-20 07:33:12.452387 7f02d3887700  1 mds.1.171 replay_done
2017-11-20 07:33:12.452411 7f02d3887700  1 mds.1.171 making mds journal writeable
2017-11-20 07:33:13.457172 7f02db096700  1 mds.1.171 handle_mds_map i am now mds.1.171
2017-11-20 07:33:13.457185 7f02db096700  1 mds.1.171 handle_mds_map state change up:replay --> up:resolve
2017-11-20 07:33:13.457214 7f02db096700  1 mds.1.171 resolve_start
2017-11-20 07:33:13.457221 7f02db096700  1 mds.1.171 reopen_log
2017-11-20 07:33:13.457253 7f02db096700  1 mds.1.171  recovery set is 0,2
2017-11-20 07:33:13.458169 7f02dd869700  1 mds.mds.kvm1b ms_verify_authorizer: auth cap parse error: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
 parsing 'allow '
2017-11-20 07:33:13.458209 7f02dd869700  0 log_channel(cluster) log [WRN] : mds.kvm1b mds cap 'allow ' does not parse: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
2017-11-20 07:33:13.458223 7f02dd869700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.3:6800/3788195021 conn(0x55d9d17e1800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2017-11-20 07:33:13.458395 7f02de06a700  1 mds.mds.kvm1a ms_verify_authorizer: auth cap parse error: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
 parsing 'allow '
2017-11-20 07:33:13.458417 7f02de06a700  0 log_channel(cluster) log [WRN] : mds.kvm1a mds cap 'allow ' does not parse: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
2017-11-20 07:33:13.458427 7f02de06a700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.2:6800/3691710949 conn(0x55d9d17e3000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2017-11-20 07:33:13.458493 7f02dd068700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.2:6800/3691710949 conn(0x55d9d1783800 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER
2017-11-20 07:33:13.458689 7f02dd869700  1 mds.mds.kvm1b ms_verify_authorizer: auth cap parse error: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
 parsing 'allow '
2017-11-20 07:33:13.458713 7f02dd869700  0 log_channel(cluster) log [WRN] : mds.kvm1b mds cap 'allow ' does not parse: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
2017-11-20 07:33:13.458722 7f02dd869700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.3:6800/3788195021 conn(0x55d9d17e1800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2017-11-20 07:33:13.458833 7f02de06a700  1 mds.mds.kvm1a ms_verify_authorizer: auth cap parse error: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
 parsing 'allow '
2017-11-20 07:33:13.458852 7f02de06a700  0 log_channel(cluster) log [WRN] : mds.kvm1a mds cap 'allow ' does not parse: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
2017-11-20 07:33:13.458860 7f02de06a700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.2:6800/3691710949 conn(0x55d9d17e3000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer
2017-11-20 07:33:13.459013 7f02dd068700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.2:6800/3691710949 conn(0x55d9d1783800 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER
2017-11-20 07:33:13.459247 7f02de06a700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.3:6800/3788195021 conn(0x55d9d1785000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER
2017-11-20 07:33:13.459904 7f02de06a700  0 -- 10.254.1.4:6800/2347371316 >> 10.254.1.3:6800/3788195021 conn(0x55d9d1785000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER
2017-11-20 07:33:13.660120 7f02dd869700  1 mds.mds.kvm1a ms_verify_authorizer: auth cap parse error: MDSAuthCaps parse failed, stopped at 'allow ' of 'allow '
 parsing 'allow '
 
Alright... I needed to adjust the capabilities of the mds users, update the keys, restart the mds processes, wait for everything to be active and could ultimately then decrease the concurrent mds processes. I ultimately restored everything back to the state I had before I started fiddling...

The problem I was experiencing:
Code:
id='kvm1a';
ceph auth get-or-create mds.$id mds 'allow *' osd 'allow *' mon 'allow rwx';
Error EINVAL: key for mds.kvm1a exists but cap mds does not match

[root@kvm1a ~]# ceph auth get mds.kvm1a
exported keyring for mds.kvm1a
[mds.kvm1a]
        key = AQCCTVVZhMWlGhAAdsrBdnEtQwDgdqILRLN4gQ==
        caps mds = "allow "
        caps mgr = "allow profile mds"
        caps mon = "allow rwx"
        caps osd = "allow *"


I did the following on each of the nodes:
Code:
[root@kvm1b ~]# id='kvm1b';
[root@kvm1b ~]# ceph auth caps mds.$id mds 'allow *' osd 'allow *' mon 'allow rwx';
updated caps for mds.kvm1b
[root@kvm1b ~]# ceph auth get-or-create mds.$id mds 'allow *' osd 'allow *' mon 'allow rwx';
[mds.kvm1b]
        key = AQDtQ1dZ5OI9HhAAqVZdFPDJShZrpjqBviY7BQ==
[root@kvm1b ~]# ceph auth get-or-create mds.$id mds 'allow *' osd 'allow *' mon 'allow rwx' > /var/lib/ceph/mds/ceph-$id/keyring;

After restarting the mds instances on each node (eg systemctl restart ceph-mds@kvm1a):
Code:
ceph -s : mds: cephfs-3/3/3 up  {0=kvm1b=up:reconnect,1=kvm1c=up:resolve,2=kvm1a=up:reconnect}
Then transitions to:
Code:
ceph -s : mds: cephfs-3/3/3 up  {0=kvm1b=up:reconnect,1=kvm1c=up:rejoin,2=kvm1a=up:rejoin}
and finally to:
Code:
ceph -s : mds: mds: cephfs-3/3/1 up  {0=kvm1b=up:active,1=kvm1c=up:active,2=kvm1a=up:active}

CephFS status fully functional again:
Code:
[root@kvm1a ~]# ceph fs status
cephfs - 0 clients
======
+------+--------+-------+---------------+-------+-------+
| Rank | State  |  MDS  |    Activity   |  dns  |  inos |
+------+--------+-------+---------------+-------+-------+
|  0   | active | kvm1b | Reqs:    0 /s |   46  |   29  |
|  1   | active | kvm1c | Reqs:    0 /s |    0  |    0  |
|  2   | active | kvm1a | Reqs:    0 /s |    0  |    0  |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 1246k |  968G |
|   cephfs_data   |   data   | 12.6G |  968G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
 
Once I'd removed concurrent MDS instances I restored mds auth values back to their prior state:

One each node:
Code:
[root@kvm1c ~]# id='kvm1c';
[root@kvm1c ~]# ceph auth caps mds.$id mds 'allow ' osd 'allow *' mon 'allow rwx';
updated caps for mds.kvm1c
[root@kvm1c ~]# ceph auth get-or-create mds.$id mds 'allow ' osd 'allow *' mon 'allow rwx';
[mds.kvm1c]
        key = AQBVRldZeWbTHxAAtiD4rTsFXmGcRIvJoV59Ow==
[root@kvm1c ~]#
[root@kvm1c ~]# ceph auth get-or-create mds.$id mds 'allow ' osd 'allow *' mon 'allow rwx' > /var/lib/ceph/mds/ceph-$id/keyring;