Recovering Data from Ceph OSD Cluster

Kombonut

Member
Oct 10, 2021
3
1
8
26
Hey guys,

I have a ceph cluster with 3 OSDs on 3 nodes, 1 osd each node. 2 of the osds went offline and won't come back (pretty sure the disks died). 1 OSD is still alive with monitor. I can see the data from ceph -s:

Code:
id:     c42a9057-9b43-4e68-afe8-d2cac60a8a6c
    health: HEALTH_WARN
            mon SpaceDewdy3 is low on available space
            1 osds down
            2 hosts (2 osds) down
            Reduced data availability: 33 pgs inactive
            Degraded data redundancy: 236254/354381 objects degraded (66.667%), 33 pgs degraded, 33 pgs undersized
            33 pgs not deep-scrubbed in time
            33 pgs not scrubbed in time
 
  services:
    mon: 3 daemons, quorum SpaceDewdy3,tasty2,capstone (age 29m)
    mgr: SpaceDewdy3(active, since 45h), standbys: tasty2
    osd: 3 osds: 1 up (since 29m), 2 in (since 37m)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 118.13k objects, 457 GiB
    usage:   457 GiB used, 2.3 TiB / 2.7 TiB avail
    pgs:     100.000% pgs not active
             236254/354381 objects degraded (66.667%)
             33 undersized+degraded+peered

And from ceph osd tree

Code:
ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-1         8.18697  root default                                   
-3         2.72899      host SpaceDewdy3                           
 2    hdd  2.72899          osd.2           down         0  1.00000
-7         2.72899      host capstone                             
 0    hdd  2.72899          osd.0             up   1.00000  1.00000
-9               0      host neocapstone                           
-5         2.72899      host tasty2                               
 1    hdd  2.72899          osd.1           down   1.00000  1.00000

When I try to look at rdb ls -l Ceph-3t, it hangs. I think it might be corrupted.

I just want to get the data from the functioning OSD out since I used it as a file share on my CT. Would that be possible?
 
Ceph distributes all data in smaller Chunks across the different OSDs, so even if you retrieve the data from one OSD, it would unfortunately be unusable.
 
Yes, but restoring data from Ceph can be quite complex. I think you need a monitor as well, since it stores information about where each file is located.
 
The monitors are still up, that's good.
As @UdoB wrote, recovery should be possible if the pools had size=3,min_size=2 set before the failure. With this you've got one full copy per OSD.

I would just delete one of the failed OSDs (be cautious, you must *not* delete the healthy one), replace the failed disk and create a new OSD.
Then wait for full sync. After that continue with the 2nd failed OSD.

I am currently underway, so I can't test this right now.
 
OSD 0 is still up, and since this is a 3-node cluster with 1 OSD, you should have all copies on that disk, as already mentioned. If you can, add new disks to the other hosts and create OSDs for them. Once Ceph has 2 copies/replicas, the pool(s) should be operational again. If you can't quickly add new disks to replace the failed ones and want to back data off the Ceph cluster to somewhere else, you could think about setting min_size=1. But that should be a measure of last resort to only get data out!

Usually, you should not set min_size for the pools to 1. Because it can increase the chances for corrupt data!

There is no need to delete OSDs just now, so let the old faulty ones be to reduce the risk of removing the wrong one.
 
Hey Guys, sorry for the delay on this. I was getting HDDs and replaced the dead HDDs on my servers and let the Ceph pool propogate it's data across the OSDs. It worked. The data is there and I can access it. Will be backing this up now. Thanks for the insights on this.
 
  • Like
Reactions: UdoB

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!