help full OSD need help

Andrew Holybee · Jul 1, 2017

I showed plenty of room on my ceph when i went to move a disk but its says 1 osd full so i stopped move but it still shows error how do i delete the move and go back to using the vm where it was

Ashley · Jul 1, 2017

Can you explain exactly what you mean by move a disk, within CEPH the data is shared across all OSD's and not held on one particular disk.

Andrew Holybee · Jul 2, 2017

Had a vm running on a nfs path and click moved disk to ceph. It had a issue in the move as it filled 1 OSD more then the others and halted the move. When I clicked stop I can't figure out how to delete the big copy I was hoping it would just go back to the path it was coming from but now my ceph is showing its still having issues. There seems to be false copies of the vm see attached, we should only have 1 103 vm and its a large one but it says there are multiples however its just 1 OSD that is near full not all of them so this seems to be a false reading.

We tried manually removing the disks but it didn't work using
rbd list --pool name -> to see your disks
rbd info --pool name vm-xxx-disk-x-> information about your disk
rbd rm --pool name vm-xxx-disk-x -> to delete the disk

consultant was able to get it VM's working but it hasn't healed. Any ideas on how to find bad backup and delete?

Andrew Holybee · Jul 2, 2017

The only thing I can think is when we did the above commands we didn't have a dash after the disk and it said invalid directory.
rbd info --pool name vm-xxx-diskx-> instead of disk-x
rbd rm --pool name vm-xxx-diskx -> instead of disk-x
I am wondering if that is what did it

Andrew Holybee · Jul 2, 2017

seems like syntax is still hanging us up I can't even get rbd info to work can someone help me on the syntax here is a screen shot. You can see all the extra disks for vm103 I am just trying to run info so I can get info on which one is the right one and which is just filling up ceph.

Andrew Holybee · Jul 2, 2017

If I can just figure out how to remove the failed copy I would be good

Ashley · Jul 2, 2017

You can just delete them using the command

rbd -p *poolname* rm *diskname*

So yours should be

rbd -p cephStor rm vm-103-disk-1

Andrew Holybee · Jul 2, 2017

disks 1 and 2 should be the good ones 3, 4 and 5 should be the bad ones what would syntax be for info so I can check would it be
rbd -p cephStor info vm-103-disk-1

Ashley · Jul 2, 2017

Need to set the number at the end to 3 / 4 / 5

Andrew Holybee · Jul 2, 2017

Awesome thanks i was just making sure the way I typed the info command was right. Attached you can see the vm 103 hardware screen where it shows disks 1 and 2 in use and not the other 3 /4/5 and in the OSD percentage you can see the one that is almost full preventing it from fully healing. It is backing up that vm so I am hoping it sees those other disks aren't being used and removes them. if not i will try and remove it this way thanks so much.

Andrew Holybee · Jul 3, 2017

This fixed it thanks so much I have removed disk 5 will be waiting til after hours to remove 4 and 3. Saved my bacon what a great community.

Andrew Holybee · Jul 4, 2017

Strange thing is ceph still hasn't healed says no out set. we do have pm05 not attached to ceph as we are thinking about dismantling ceph as it has been problematic.

Ashley · Jul 4, 2017

Try unsetting noout

If not what is full output of ceph health detail

Andrew Holybee · Jul 4, 2017

Yes this is what I wanted to try first would syntax be "ceph osd unset noout" ?

And pm05 not being a part of ceph wouldn't be affected by this correct?

Ashley · Jul 4, 2017

Correct, and how did you remove pm05?

Andrew Holybee · Jul 4, 2017

ok cool, pm05 is a new server that we added so it was never part of ceph. The issue we have had with ceph and proxmox is it will seemingly randomly loose quorum and fence. Or I will take down 1 node for maint and it will take down another node or multiple nodes.

Andrew Holybee · Jul 6, 2017

Ok so I unset noout so that cleared the error but Ceph still hasn't healed. Do I need to reload ceph or will that take down the OSD's? Here is the health detail

Andrew Holybee · Jul 6, 2017

attached

Andrew Holybee · Jul 9, 2017

looks like we just have the 3 pgs that haven't been restored is there a command that I need to manually get them to repair?

help full OSD need help

Well-Known Member

Member

Well-Known Member

Attachments

Well-Known Member

Well-Known Member

Attachments

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Attachments

Well-Known Member

Well-Known Member

Attachments

Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Attachments

Well-Known Member

Attachments

Well-Known Member

We value your privacy