help full OSD need help

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
I showed plenty of room on my ceph when i went to move a disk but its says 1 osd full so i stopped move but it still shows error how do i delete the move and go back to using the vm where it was
 

Ashley

Member
Jun 28, 2016
267
15
18
33
Can you explain exactly what you mean by move a disk, within CEPH the data is shared across all OSD's and not held on one particular disk.
 

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
Had a vm running on a nfs path and click moved disk to ceph. It had a issue in the move as it filled 1 OSD more then the others and halted the move. When I clicked stop I can't figure out how to delete the big copy I was hoping it would just go back to the path it was coming from but now my ceph is showing its still having issues. There seems to be false copies of the vm see attached, we should only have 1 103 vm and its a large one but it says there are multiples however its just 1 OSD that is near full not all of them so this seems to be a false reading.

We tried manually removing the disks but it didn't work using
rbd list --pool name -> to see your disks
rbd info --pool name vm-xxx-disk-x-> information about your disk
rbd rm --pool name vm-xxx-disk-x -> to delete the disk

consultant was able to get it VM's working but it hasn't healed. Any ideas on how to find bad backup and delete?
 

Attachments

  • ceph status.jpeg
    ceph status.jpeg
    154 KB · Views: 19
  • ceph storage.jpeg
    ceph storage.jpeg
    163.3 KB · Views: 17

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
The only thing I can think is when we did the above commands we didn't have a dash after the disk and it said invalid directory.
rbd info --pool name vm-xxx-diskx-> instead of disk-x
rbd rm --pool name vm-xxx-diskx -> instead of disk-x
I am wondering if that is what did it
 

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
seems like syntax is still hanging us up I can't even get rbd info to work can someone help me on the syntax here is a screen shot. You can see all the extra disks for vm103 I am just trying to run info so I can get info on which one is the right one and which is just filling up ceph.
 

Attachments

  • cephstor info syntax.JPG
    cephstor info syntax.JPG
    52.9 KB · Views: 13

Ashley

Member
Jun 28, 2016
267
15
18
33
You can just delete them using the command

rbd -p *poolname* rm *diskname*

So yours should be

rbd -p cephStor rm vm-103-disk-1
 
  • Like
Reactions: Andrew Holybee

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
disks 1 and 2 should be the good ones 3, 4 and 5 should be the bad ones what would syntax be for info so I can check would it be
rbd -p cephStor info vm-103-disk-1
 

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
Awesome thanks i was just making sure the way I typed the info command was right. Attached you can see the vm 103 hardware screen where it shows disks 1 and 2 in use and not the other 3 /4/5 and in the OSD percentage you can see the one that is almost full preventing it from fully healing. It is backing up that vm so I am hoping it sees those other disks aren't being used and removes them. if not i will try and remove it this way thanks so much.
 

Attachments

  • osd percentage.JPG
    osd percentage.JPG
    66.1 KB · Views: 16
  • vm103 hardware.JPG
    vm103 hardware.JPG
    59 KB · Views: 16

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
This fixed it thanks so much I have removed disk 5 will be waiting til after hours to remove 4 and 3. Saved my bacon what a great community.
 

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
Strange thing is ceph still hasn't healed says no out set. we do have pm05 not attached to ceph as we are thinking about dismantling ceph as it has been problematic.
 

Attachments

  • ceph not healed.jpeg
    ceph not healed.jpeg
    347.1 KB · Views: 11

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
Yes this is what I wanted to try first would syntax be "ceph osd unset noout" ?

And pm05 not being a part of ceph wouldn't be affected by this correct?
 

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
ok cool, pm05 is a new server that we added so it was never part of ceph. The issue we have had with ceph and proxmox is it will seemingly randomly loose quorum and fence. Or I will take down 1 node for maint and it will take down another node or multiple nodes.
 

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
Ok so I unset noout so that cleared the error but Ceph still hasn't healed. Do I need to reload ceph or will that take down the OSD's? Here is the health detail
 

Attachments

  • ceph still stuck.JPG
    ceph still stuck.JPG
    65.9 KB · Views: 8
  • set noout.JPG
    set noout.JPG
    73.5 KB · Views: 7

Andrew Holybee

Active Member
Mar 27, 2017
52
0
26
42
looks like we just have the 3 pgs that haven't been restored is there a command that I need to manually get them to repair?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!