reset ceph cluster

RobFantini

Famous Member
May 24, 2012
2,042
110
133
Boston,Mass
We've a 3 node test ceph cluster.

I'd like to redo ceph from scratch.

Is the there a simple way to destroy the ceph cluster ? I've tried to remove all osd's but one is stuck. can';t remove using cli of pve:
Code:
0       1.82                    osd.0   DNE

and ceph -s :
Code:
 ceph -s    cluster 4267b4fe-78bb-4670-86e5-60807f39e6c1
     health HEALTH_ERR 435 pgs degraded; 45 pgs incomplete; 4 pgs inconsistent; 4 pgs recovering; 480 pgs stale; 45 pgs stuck inactive; 480 pgs stuck stale; 480 pgs stuck unclean; recovery 253909/390921 objects degraded (64.951%); 19/130307 unfound (0.015%); 25 scrub errors; no osds; 1 mons down, quorum 0,1,2 0,2,3
     monmap e16: 4 mons at {0=10.11.12.41:6789/0,1=10.11.12.182:6789/0,2=10.11.12.42:6789/0,3=10.11.12.46:6789/0}, election epoch 3844, quorum 0,1,2 0,2,3
     osdmap e8224: 0 osds: 0 up, 0 in
      pgmap v4085336: 480 pgs, 3 pools, 498 GB data, 127 kobjects
            0 kB used, 0 kB / 0 kB avail
            253909/390921 objects degraded (64.951%); 19/130307 unfound (0.015%)
                   1 stale+active+degraded+inconsistent
                 271 stale+active+degraded+remapped
                   1 stale+active+recovering+degraded
                   3 stale+active+degraded+remapped+inconsistent
                  45 stale+incomplete
                   3 stale+active+recovering+degraded+remapped
                 156 stale+active+degraded

So is there a way to remove a ceph set up , or should we just reinstall pve to the 3 hosts?
 
the easiest way I can think of, would be with ceph-deploy.

Sadly I don't know whether thats available in the repositories a vanilla pve system has registered. If its not (check by "aptitude install ceph-deploy"), you'll have to add the repo:

NOTE: all the following steps only need to be done on one node. ceph-deploy also requires password-less ssh to work between the nodes

EDIT: After looking at what I put together here, reinstalling the machines might be faster than this... even though it's certainly less elegant

Code:
wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -
echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main | tee /etc/apt/sources.list.d/ceph.list #exchange 'dumpling' for 'firefly' if you installed firefly
apt-get update && apt-get install ceph-deploy

now. ceph-deploy always operates out of a working directory, where it stores the config and keys for your cluster. so:

Code:
mkdir /root/cephdeploy/
cd ~/cephdeploy/
cp /etc/ceph/ceph.conf .
cp /etc/pve/priv/ceph.client.admin.keyring . #filename or folder might not be correct, cant check at the moment, sorry for that. might also be in /etc/pve/nodes/X/priv/...

You should now be able to issue

Code:
cp ceph.conf ceph.conf.bck
ceph-deploy purgedata node1 node2 node3
ceph-deploy forgetkeys
where node{1-3} are resolvable hostnames of your nodes.

you can then issue

Code:
ceph-deploy new node1
to create a fresh ceph.conf (you only need the fsid from it, if anything at all)
and a fresh ceph.client.admin.keyring

use the fsid from the new ceph.conf in your ceph.conf.bck file, then

Code:
cp ceph.conf.bck ceph.conf
ceph-deploy config push node1 node2 node3
cp ceph.client.admin.keyring /etc/pve/priv/ceph/ceph.client.admin.keyring #replace target file with where you had found it initially

Final words: this is sort of a roundabout way of doing things, but it should be faster than doing everything manually (which entails removing almost everything from /var/lib/ceph/ as well as using ceph-disk to overwrite a couple of sectors of your OSD disks to void them) and less error prone.
 
Last edited:
OK thanks for that!

however I ran in to this.
Code:
 ceph4-ib  ~/cephdeploy # ceph-deploy purgedata ceph4 ceph2 ceph1[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.9): /usr/bin/ceph-deploy purgedata ceph4 ceph2 ceph1
[ceph_deploy.install][DEBUG ] Purging data from cluster ceph hosts ceph4 ceph2 ceph1
[ceph4][DEBUG ] connected to host: ceph4 
[ceph4][DEBUG ] detect platform information from remote host
[ceph4][DEBUG ] detect machine type
[ceph4][DEBUG ] find the location of an executable
[ceph2][DEBUG ] connected to host: ceph2 
[ceph2][DEBUG ] detect platform information from remote host
[ceph2][DEBUG ] detect machine type
[ceph2][DEBUG ] find the location of an executable
[ceph1][DEBUG ] connected to host: ceph1 
[ceph1][DEBUG ] detect platform information from remote host
[ceph1][DEBUG ] detect machine type
[ceph1][DEBUG ] find the location of an executable
[ceph_deploy.install][ERROR ] ceph is still installed on: ['ceph4', 'ceph2', 'ceph1']
[ceph_deploy][ERROR ] RuntimeError: refusing to purge data while ceph is still installed
so did this on all nodes:

Code:
aptitude purge ceph

then same error looks like:
Code:
ceph4-ib  ~/cephdeploy # ceph-deploy purgedata ceph4 ceph2 ceph1
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.9): /usr/bin/ceph-deploy purgedata ceph4 ceph2 ceph1
[ceph_deploy.install][DEBUG ] Purging data from cluster ceph hosts ceph4 ceph2 ceph1
[ceph4][DEBUG ] connected to host: ceph4 
[ceph4][DEBUG ] detect platform information from remote host
[ceph4][DEBUG ] detect machine type
[ceph4][DEBUG ] find the location of an executable
[ceph2][DEBUG ] connected to host: ceph2 
[ceph2][DEBUG ] detect platform information from remote host
[ceph2][DEBUG ] detect machine type
[ceph2][DEBUG ] find the location of an executable
[ceph1][DEBUG ] connected to host: ceph1 
[ceph1][DEBUG ] detect platform information from remote host
[ceph1][DEBUG ] detect machine type
[ceph1][DEBUG ] find the location of an executable
[ceph_deploy.install][ERROR ] ceph is still installed on: ['ceph4', 'ceph2', 'ceph1']
[ceph_deploy][ERROR ] RuntimeError: refusing to purge data while ceph is still installed

Note ceph-deploy was already installed. besides that i followed instructions , here is history:
Code:
 1031  Tuesday  2014-07-29  [14:21:59 -0400] which ceph-deploy
 1032  Tuesday  2014-07-29  [14:22:15 -0400] mkdir /root/cephdeploy/
 1033  Tuesday  2014-07-29  [14:22:19 -0400] cd ~/cephdeploy/
 1034  Tuesday  2014-07-29  [14:22:22 -0400] cp /etc/ceph/ceph.conf .
 1035  Tuesday  2014-07-29  [14:22:39 -0400] cp /etc/pve/priv/ceph.client.admin.keyring .
 1036  Tuesday  2014-07-29  [14:22:46 -0400] cp ceph.conf ceph.conf.bck
 1037  Tuesday  2014-07-29  [14:23:02 -0400] ceph-deploy purgedata ceph4 ceph2 ceph1
 1038  Tuesday  2014-07-29  [14:23:26 -0400] aps ceph
 1039  Tuesday  2014-07-29  [14:23:37 -0400] aptitude purge ceph
 1040  Tuesday  2014-07-29  [14:23:59 -0400] ceph-deploy purgedata ceph4 ceph2 ceph1

I may have done something wrong.... hopefully the above will help the next person...

So I'm going to reinstall..

Thanks again for the help

Best Regards.
Rob Fantini
 
maybe ceph-deploy is looking for more than just the "ceph" package (ceph-common and librbd come to mind).
luckily, ceph-deploy can take care of that too:

Code:
ceph-deploy uninstall {hostname [hostname] ...}

to get the packages back onto your nodes after you issued purgedata, you can use
Code:
ceph-deploy install {hostname [hostname] ...}