I got a little excited about being able to run multiple MDS instances for CephFS and brought all MDS instances up concurrently on 3 separate Ceph clusters. After restarting the MDS instances they are stuck in 'resolve' state and appear to be limited by the 'allow ' permission, instead of 'allow *'.
I would appreciate some assistance with resolving the problem. I only store ISO images on CephFS, to provide a shared file system for all cluster nodes, so the problem isn't critical and I have some time to try to fix things before blowing away CephFS and starting new.
Firstly, how I set it up on Jewel (before upgrading to Luminous):
On each host (kvm1a, kvm1b, kvm1c):
Finally create CephFS:
I like being able to easily (un/)mount '/var/lib/vz' so I define it in /etc/fstab. There is a regression with 'noauto' and 'x-systemd.automount' in systemd, in Debian Stretch, so I disable the delayed mount to ensure systemd requirements resolve appropriately and then mount it via rc.local:
Last piece it telling Proxmox that '/var/lib/vz' is shared:
Everything was working perfectly, but only one of the 3 MDS instances is ever active so I wanted to try the multi-mds feature available in Celph Luminous:
Everything was still working at this stage. CephFS had created additional ranks and divided it among the MDS instances. Ceph status was complaining about there being no standby MDS instances, due to 'standby_count_wanted' being 1 by default:
When restarting a MDS instance it however remains in 'resolve' state' and CephFS became unavailable:
The following log output is from 'kvm1c' where I restarted the mds instance:
Too long to include, available here: https://paste.ofcode.org/jGkGjSzfuWmLh3QKruEpV6
I recovered from my stupidity on another 2 clusters perfectly (I didn't restart the MDS processes there):
The short version:
The granular output version:
I would appreciate some assistance with resolving the problem. I only store ISO images on CephFS, to provide a shared file system for all cluster nodes, so the problem isn't critical and I have some time to try to fix things before blowing away CephFS and starting new.
Firstly, how I set it up on Jewel (before upgrading to Luminous):
Code:
cat >> /etc/ceph/ceph.conf <<EOF
[mds]
mds data = /var/lib/ceph/mds/$cluster-$id
keyring = /varar/lib/ceph/mds/$cluster-$id/keyring
[mds.kvm1a]
host = kvm1a
[mds.kvm1b]
host = kvm1b
[mds.kvm1c]
host = kvm1c
EOF
On each host (kvm1a, kvm1b, kvm1c):
Code:
id='kvm1a';
apt-get -y install ceph-mds;
mkdir -p /var/lib/ceph/mds/ceph-$id;
ceph auth get-or-create mds.$id mds 'allow ' osd 'allow *' mon 'allow rwx' > /var/lib/ceph/mds/ceph-$id/keyring;
chown ceph.ceph /var/lib/ceph/mds -R;
systemctl enable ceph-mds@$id;
systemctl start ceph-mds@$id;
Finally create CephFS:
Code:
ceph osd pool create cephfs_data 12; # 2 x OSDs
ceph osd pool create cephfs_metadata 12; # 2 x OSDs
ceph fs new cephfs cephfs_metadata cephfs_data;
I like being able to easily (un/)mount '/var/lib/vz' so I define it in /etc/fstab. There is a regression with 'noauto' and 'x-systemd.automount' in systemd, in Debian Stretch, so I disable the delayed mount to ensure systemd requirements resolve appropriately and then mount it via rc.local:
Code:
vi /etc/fstab;
id=admin,conf=/etc/ceph/ceph.conf /var/lib/vz fuse.ceph defaults,_netdev,noauto,nonempty,x-systemd.requires=ceph.target 0 0
vi /etc/rc.local
sleep 60 && mount /var/lib/vz;
Last piece it telling Proxmox that '/var/lib/vz' is shared:
Code:
vi /etc/pve/storage.cfg
dir: local
path /var/lib/vz
maxfiles 0
shared
content iso,vztmpl
Everything was working perfectly, but only one of the 3 MDS instances is ever active so I wanted to try the multi-mds feature available in Celph Luminous:
Code:
[root@kvm1a ~]# ceph fs status
cephfs - 3 clients
======
+------+--------+-------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+-------+---------------+-------+-------+
| 0 | active | kvm1a | Reqs: 0 /s | 46 | 29 |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 1232k | 968G |
| cephfs_data | data | 12.6G | 968G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| kvm1b |
| kvm1c |
+-------------+
MDS version: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
[root@kvm1a ~]# ceph fs set cephfs allow_multimds yes
enabled creation of more than 1 active MDS
[root@kvm1a ~]# ceph fs set cephfs max_mds 3
[root@kvm1a ~]# ceph fs status
cephfs - 3 clients
======
+------+--------+-------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+-------+---------------+-------+-------+
| 0 | active | kvm1a | Reqs: 0 /s | 46 | 29 |
| 1 | active | kvm1c | Reqs: 0 /s | 10 | 11 |
| 2 | active | kvm1b | Reqs: 0 /s | 10 | 11 |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 1235k | 968G |
| cephfs_data | data | 12.6G | 968G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
Everything was still working at this stage. CephFS had created additional ranks and divided it among the MDS instances. Ceph status was complaining about there being no standby MDS instances, due to 'standby_count_wanted' being 1 by default:
Code:
[root@kvm1a ~]# ceph -s
cluster:
id: 97eac23e-10e4-4b53-aa2d-e013e50ff782
health: HEALTH_WARN
insufficient standby MDS daemons available
noout flag(s) set
services:
mon: 3 daemons, quorum 0,1,2
mgr: kvm1a(active), standbys: kvm1b, kvm1c
mds: cephfs-3/3/3 up {0=kvm1a=up:active,1=kvm1c=up:active,2=kvm1b=up:active}
osd: 6 osds: 6 up, 6 in
flags noout
data:
pools: 3 pools, 280 pgs
objects: 199k objects, 782 GB
usage: 2316 GB used, 3269 GB / 5586 GB avail
pgs: 280 active+clean
io:
client: 8186 B/s rd, 1004 kB/s wr, 0 op/s rd, 88 op/s wr
When restarting a MDS instance it however remains in 'resolve' state' and CephFS became unavailable:
Code:
[root@kvm1c ~]# systemctl restart ceph-mds@kvm1c
[root@kvm1c ~]# systemctl status ceph-mds@kvm1c
● ceph-mds@kvm1c.service - Ceph metadata server daemon
Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mds@.service.d
└─ceph-after-pve-cluster.conf
Active: active (running) since Mon 2017-11-20 07:33:08 SAST; 5s ago
Main PID: 10984 (ceph-mds)
Tasks: 21
CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@kvm1c.service
└─10984 /usr/bin/ceph-mds -f --cluster ceph --id kvm1c --setuser ceph --setgroup ceph
Nov 20 07:33:08 kvm1c systemd[1]: Started Ceph metadata server daemon.
Nov 20 07:33:08 kvm1c ceph-mds[10984]: starting mds.kvm1c at -
[root@kvm1c ~]# ceph fs status
cephfs - 3 clients
======
+------+---------+-------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+---------+-------+---------------+-------+-------+
| 0 | active | kvm1a | Reqs: 0 /s | 46 | 29 |
| 1 | resolve | kvm1c | | 0 | 0 |
| 2 | active | kvm1b | Reqs: 0 /s | 10 | 11 |
+------+---------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 1235k | 968G |
| cephfs_data | data | 12.6G | 968G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
[root@kvm1c ~]# ceph -s
cluster:
id: 97eac23e-10e4-4b53-aa2d-e013e50ff782
health: HEALTH_WARN
1 filesystem is degraded
insufficient standby MDS daemons available
noout flag(s) set
services:
mon: 3 daemons, quorum 0,1,2
mgr: kvm1a(active), standbys: kvm1b, kvm1c
mds: cephfs-3/3/2 up {0=kvm1a=up:active,1=kvm1c=up:resolve,2=kvm1b=up:active}
osd: 6 osds: 6 up, 6 in
flags noout
data:
pools: 3 pools, 280 pgs
objects: 199k objects, 782 GB
usage: 2316 GB used, 3269 GB / 5586 GB avail
pgs: 280 active+clean
io:
client: 2728 B/s rd, 1300 kB/s wr, 0 op/s rd, 118 op/s wr
The following log output is from 'kvm1c' where I restarted the mds instance:
Too long to include, available here: https://paste.ofcode.org/jGkGjSzfuWmLh3QKruEpV6
I recovered from my stupidity on another 2 clusters perfectly (I didn't restart the MDS processes there):
The short version:
Code:
[root@kvm5b ~]# ceph fs set cephfs max_mds 1
[root@kvm5b ~]# ceph mds deactivate cephfs:2
telling mds.1:2 10.254.1.4:6800/1105446330 to deactivate
[root@kvm5b ~]# watch -d "ceph -s"
[root@kvm5b ~]# ceph mds deactivate cephfs:1
telling mds.1:1 10.254.1.5:6800/963736084 to deactivate
[root@kvm5b ~]# watch -d "ceph -s"
[root@kvm5b ~]# ceph fs set cephfs standby_count_wanted 1
[root@kvm5b ~]# ceph fs set cephfs allow_multimds no
disallowed increasing the cluster size past 1
The granular output version:
Code:
[root@kvm1 ~]# ceph -s
cluster:
id: c49b0dce-44a7-4546-9b16-3864d30f8833
health: HEALTH_WARN
noout flag(s) set
services:
mon: 3 daemons, quorum 0,1,2
mgr: kvm1(active), standbys: kvm2, kvm3
mds: cephfs-3/3/3 up {0=kvm2=up:active,1=kvm3=up:active,2=kvm1=up:active}
osd: 14 osds: 14 up, 14 in
flags noout
data:
pools: 3 pools, 1084 pgs
objects: 391k objects, 1565 GB
usage: 4701 GB used, 3030 GB / 7732 GB avail
pgs: 1084 active+clean
io:
client: 681 B/s rd, 411 kB/s wr, 0 op/s rd, 52 op/s wr
[root@kvm1 ~]# ceph fs set cephfs max_mds 1
[root@kvm1 ~]# ceph -s
cluster:
id: c49b0dce-44a7-4546-9b16-3864d30f8833
health: HEALTH_WARN
noout flag(s) set
services:
mon: 3 daemons, quorum 0,1,2
mgr: kvm1(active), standbys: kvm2, kvm3
mds: cephfs-3/3/1 up {0=kvm2=up:active,1=kvm3=up:active,2=kvm1=up:active}
osd: 14 osds: 14 up, 14 in
flags noout
data:
pools: 3 pools, 1084 pgs
objects: 391k objects, 1565 GB
usage: 4701 GB used, 3030 GB / 7732 GB avail
pgs: 1084 active+clean
io:
client: 1022 B/s rd, 500 kB/s wr, 0 op/s rd, 63 op/s wr
[root@kvm1 ~]# ceph mds deactivate cephfs:2
telling mds.1:2 1.1.7.9:6800/2986760451 to deactivate
[root@kvm1 ~]# ceph -s
cluster:
id: c49b0dce-44a7-4546-9b16-3864d30f8833
health: HEALTH_WARN
noout flag(s) set
services:
mon: 3 daemons, quorum 0,1,2
mgr: kvm1(active), standbys: kvm2, kvm3
mds: cephfs-2/2/1 up {0=kvm2=up:active,1=kvm3=up:active}, 1 up:standby
osd: 14 osds: 14 up, 14 in
flags noout
data:
pools: 3 pools, 1084 pgs
objects: 391k objects, 1565 GB
usage: 4701 GB used, 3030 GB / 7732 GB avail
pgs: 1084 active+clean
io:
client: 23526 B/s rd, 528 kB/s wr, 1 op/s rd, 80 op/s wr
[root@kvm1 ~]# ceph mds deactivate cephfs:1
telling mds.1:1 1.1.7.11:6800/3470392200 to deactivate
[root@kvm1 ~]# ceph fs set cephfs standby_count_wanted 1
[root@kvm1 ~]# ceph fs set cephfs allow_multimds no
disallowed increasing the cluster size past 1
[root@kvm1 ~]# ceph fs status
cephfs - 3 clients
======
+------+--------+------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+------+---------------+-------+-------+
| 0 | active | kvm2 | Reqs: 0 /s | 26 | 24 |
+------+--------+------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 184k | 490G |
| cephfs_data | data | 557M | 490G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| kvm3 |
| kvm1 |
+-------------+
MDS version: ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)
[root@kvm1 ~]# ceph -s
cluster:
id: c49b0dce-44a7-4546-9b16-3864d30f8833
health: HEALTH_WARN
noout flag(s) set
services:
mon: 3 daemons, quorum 0,1,2
mgr: kvm1(active), standbys: kvm2, kvm3
mds: cephfs-1/1/1 up {0=kvm2=up:active}, 2 up:standby
osd: 14 osds: 14 up, 14 in
flags noout
data:
pools: 3 pools, 1084 pgs
objects: 391k objects, 1565 GB
usage: 4701 GB used, 3030 GB / 7732 GB avail
pgs: 1084 active+clean
io:
client: 12953 B/s rd, 931 kB/s wr, 0 op/s rd, 86 op/s wr
Last edited: