[SOLVED] Ceph Help...

jkirker · Mar 8, 2016

I set up a test environment and started over a few times.

What's strange is each time I restart the Ceph network, even after writing 0's to all the osd's to make sure things were cleared out - I end up with:
HEALTH_WARN 1 pgs degraded; 1 pigs stuck degraded; 64 pgs stuck unclean; 1pgs stuck undersized; 1 pgs undersized;

My Ceph Status shows:
cluster 53d58faa-b12b-4fdb-a131-21370562f573

health HEALTH_WARN
1 pgs degraded
1 pgs stuck degraded
64 pgs stuck unclean
1 pgs stuck undersized
1 pgs undersized
monmap e5: 5 mons at {0=172.16.0.60:6789/0,1=172.16.0.61:6789/0,2=172.16.0.48:6789/0,3=172.16.0.47:6789/0,4=172.16.0.46:6789/0}
election epoch 34, quorum 0,1,2,3,4 4,3,2,0,1
osdmap e65: 8 osds: 8 up, 8 in; 63 remapped pgs
pgmap v3879: 576 pgs, 2 pools, 0 bytes data, 1 objects
307 MB used, 22304 GB / 22305 GB avail
512 active+clean
41 active+remapped
22 active
1 active+undersized+degraded

What I don't understand is that I haven't populated the CephFS with any data yet - but still have a dirty FS.

For the test environment I have 3 compute nodes, 2 ceph storage nodes and 4 osd's in each storage node. All 5 nodes are active as monitors.

For my Pool I've got:
Size 2/2 | pg_num 512

All OSD's are up, All Monitors are up. Crush Map couldn't be any simpler.

Where is the 1 object coming from? Can it be removed? What can I do to get back to a Healthy Status?

udo · Mar 8, 2016

Hi,
I guess you have one pool with replica 3 left?!

BTW. you don't need 5 mons!! It's very oversized for your cluster ;-)

look with something like

Code:

ceph osd lspools

ceph osd pool get data size
ceph osd pool get rbd size

Udo

jkirker · Mar 8, 2016

Thanks Udo...

While browsing around I did find an orphaned disk image called vm-103-disk-1 which is probably the culprit.

Any idea how I can remove it or kill it?

I found this image under the Storage view and then the Content tab.

udo · Mar 8, 2016

jkirker said:
Thanks Udo...

While browsing around I did find an orphaned disk image called vm-103-disk-1 which is probably the culprit.

Any idea how I can remove it or kill it?

Hi,
why should vm-103-disk-1 be the issue, when your pool is replica 2?? Again, you are sure that your metadata pool hasn't replica 3??

Udo

jkirker · Mar 8, 2016

While diving into Ceph I ran all kinds of configs and tests. I did previously have a pool with the same name and a replica 3. But that was before the last reinstall of everything.

udo · Mar 8, 2016

jkirker said:
Thanks Udo...

While browsing around I did find an orphaned disk image called vm-103-disk-1 which is probably the culprit.

Any idea how I can remove it or kill it?

for pool rbd:

Code:

rbd -p rbd ls -l
rbd -p rbd rm <image-name>

Udo

jkirker · Mar 9, 2016

Thank you udo! It was the default rbd that was killing me. All healthy now. How can I thank you for your time and efforts?

Search

Search

[SOLVED] Ceph Help...

jkirker

Member

udo

Distinguished Member

jkirker

Member

udo

Distinguished Member

jkirker

Member

udo

Distinguished Member

jkirker

Member

We value your privacy