reset ceph cluster

RobFantini

Famous Member
May 24, 2012
2,009
102
133
Boston,Mass
We've a 3 node test ceph cluster.

I'd like to redo ceph from scratch.

Is the there a simple way to destroy the ceph cluster ? I've tried to remove all osd's but one is stuck. can';t remove using cli of pve:
Code:
0       1.82                    osd.0   DNE

and ceph -s :
Code:
 ceph -s    cluster 4267b4fe-78bb-4670-86e5-60807f39e6c1
     health HEALTH_ERR 435 pgs degraded; 45 pgs incomplete; 4 pgs inconsistent; 4 pgs recovering; 480 pgs stale; 45 pgs stuck inactive; 480 pgs stuck stale; 480 pgs stuck unclean; recovery 253909/390921 objects degraded (64.951%); 19/130307 unfound (0.015%); 25 scrub errors; no osds; 1 mons down, quorum 0,1,2 0,2,3
     monmap e16: 4 mons at {0=10.11.12.41:6789/0,1=10.11.12.182:6789/0,2=10.11.12.42:6789/0,3=10.11.12.46:6789/0}, election epoch 3844, quorum 0,1,2 0,2,3
     osdmap e8224: 0 osds: 0 up, 0 in
      pgmap v4085336: 480 pgs, 3 pools, 498 GB data, 127 kobjects
            0 kB used, 0 kB / 0 kB avail
            253909/390921 objects degraded (64.951%); 19/130307 unfound (0.015%)
                   1 stale+active+degraded+inconsistent
                 271 stale+active+degraded+remapped
                   1 stale+active+recovering+degraded
                   3 stale+active+degraded+remapped+inconsistent
                  45 stale+incomplete
                   3 stale+active+recovering+degraded+remapped
                 156 stale+active+degraded

So is there a way to remove a ceph set up , or should we just reinstall pve to the 3 hosts?
 
the easiest way I can think of, would be with ceph-deploy.

Sadly I don't know whether thats available in the repositories a vanilla pve system has registered. If its not (check by "aptitude install ceph-deploy"), you'll have to add the repo:

NOTE: all the following steps only need to be done on one node. ceph-deploy also requires password-less ssh to work between the nodes

EDIT: After looking at what I put together here, reinstalling the machines might be faster than this... even though it's certainly less elegant

Code:
wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -
echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main | tee /etc/apt/sources.list.d/ceph.list #exchange 'dumpling' for 'firefly' if you installed firefly
apt-get update && apt-get install ceph-deploy

now. ceph-deploy always operates out of a working directory, where it stores the config and keys for your cluster. so:

Code:
mkdir /root/cephdeploy/
cd ~/cephdeploy/
cp /etc/ceph/ceph.conf .
cp /etc/pve/priv/ceph.client.admin.keyring . #filename or folder might not be correct, cant check at the moment, sorry for that. might also be in /etc/pve/nodes/X/priv/...

You should now be able to issue

Code:
cp ceph.conf ceph.conf.bck
ceph-deploy purgedata node1 node2 node3
ceph-deploy forgetkeys
where node{1-3} are resolvable hostnames of your nodes.

you can then issue

Code:
ceph-deploy new node1
to create a fresh ceph.conf (you only need the fsid from it, if anything at all)
and a fresh ceph.client.admin.keyring

use the fsid from the new ceph.conf in your ceph.conf.bck file, then

Code:
cp ceph.conf.bck ceph.conf
ceph-deploy config push node1 node2 node3
cp ceph.client.admin.keyring /etc/pve/priv/ceph/ceph.client.admin.keyring #replace target file with where you had found it initially

Final words: this is sort of a roundabout way of doing things, but it should be faster than doing everything manually (which entails removing almost everything from /var/lib/ceph/ as well as using ceph-disk to overwrite a couple of sectors of your OSD disks to void them) and less error prone.
 
Last edited:
OK thanks for that!

however I ran in to this.
Code:
 ceph4-ib  ~/cephdeploy # ceph-deploy purgedata ceph4 ceph2 ceph1[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.9): /usr/bin/ceph-deploy purgedata ceph4 ceph2 ceph1
[ceph_deploy.install][DEBUG ] Purging data from cluster ceph hosts ceph4 ceph2 ceph1
[ceph4][DEBUG ] connected to host: ceph4 
[ceph4][DEBUG ] detect platform information from remote host
[ceph4][DEBUG ] detect machine type
[ceph4][DEBUG ] find the location of an executable
[ceph2][DEBUG ] connected to host: ceph2 
[ceph2][DEBUG ] detect platform information from remote host
[ceph2][DEBUG ] detect machine type
[ceph2][DEBUG ] find the location of an executable
[ceph1][DEBUG ] connected to host: ceph1 
[ceph1][DEBUG ] detect platform information from remote host
[ceph1][DEBUG ] detect machine type
[ceph1][DEBUG ] find the location of an executable
[ceph_deploy.install][ERROR ] ceph is still installed on: ['ceph4', 'ceph2', 'ceph1']
[ceph_deploy][ERROR ] RuntimeError: refusing to purge data while ceph is still installed
so did this on all nodes:

Code:
aptitude purge ceph

then same error looks like:
Code:
ceph4-ib  ~/cephdeploy # ceph-deploy purgedata ceph4 ceph2 ceph1
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.9): /usr/bin/ceph-deploy purgedata ceph4 ceph2 ceph1
[ceph_deploy.install][DEBUG ] Purging data from cluster ceph hosts ceph4 ceph2 ceph1
[ceph4][DEBUG ] connected to host: ceph4 
[ceph4][DEBUG ] detect platform information from remote host
[ceph4][DEBUG ] detect machine type
[ceph4][DEBUG ] find the location of an executable
[ceph2][DEBUG ] connected to host: ceph2 
[ceph2][DEBUG ] detect platform information from remote host
[ceph2][DEBUG ] detect machine type
[ceph2][DEBUG ] find the location of an executable
[ceph1][DEBUG ] connected to host: ceph1 
[ceph1][DEBUG ] detect platform information from remote host
[ceph1][DEBUG ] detect machine type
[ceph1][DEBUG ] find the location of an executable
[ceph_deploy.install][ERROR ] ceph is still installed on: ['ceph4', 'ceph2', 'ceph1']
[ceph_deploy][ERROR ] RuntimeError: refusing to purge data while ceph is still installed

Note ceph-deploy was already installed. besides that i followed instructions , here is history:
Code:
 1031  Tuesday  2014-07-29  [14:21:59 -0400] which ceph-deploy
 1032  Tuesday  2014-07-29  [14:22:15 -0400] mkdir /root/cephdeploy/
 1033  Tuesday  2014-07-29  [14:22:19 -0400] cd ~/cephdeploy/
 1034  Tuesday  2014-07-29  [14:22:22 -0400] cp /etc/ceph/ceph.conf .
 1035  Tuesday  2014-07-29  [14:22:39 -0400] cp /etc/pve/priv/ceph.client.admin.keyring .
 1036  Tuesday  2014-07-29  [14:22:46 -0400] cp ceph.conf ceph.conf.bck
 1037  Tuesday  2014-07-29  [14:23:02 -0400] ceph-deploy purgedata ceph4 ceph2 ceph1
 1038  Tuesday  2014-07-29  [14:23:26 -0400] aps ceph
 1039  Tuesday  2014-07-29  [14:23:37 -0400] aptitude purge ceph
 1040  Tuesday  2014-07-29  [14:23:59 -0400] ceph-deploy purgedata ceph4 ceph2 ceph1

I may have done something wrong.... hopefully the above will help the next person...

So I'm going to reinstall..

Thanks again for the help

Best Regards.
Rob Fantini
 
maybe ceph-deploy is looking for more than just the "ceph" package (ceph-common and librbd come to mind).
luckily, ceph-deploy can take care of that too:

Code:
ceph-deploy uninstall {hostname [hostname] ...}

to get the packages back onto your nodes after you issued purgedata, you can use
Code:
ceph-deploy install {hostname [hostname] ...}
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!