CEPH PG Data Recovery / PG Down

anush.intech

Member
Sep 13, 2020
8
0
21
30
I have a 2 Node CEPH cluster for a Pool.

The data is in 2 replica mode.

I took down one of the node for maintenance whereas the other node was working.

Later there was a power outage which caused the second node to restart but the problem is it had RAID card caching enabled and the battery had degraded which caused the cache to be lost.

Later the same node came up once we cleared the Write Back Cache. Now the Pool was not at all functional as it had missed some data due to cache clearance.

We then started the first node which we had taken down for maintenance and now when this node came up, few pgs became recovery_unfound, I later marked the pgs mark_unfound_lost delete. This cleared the errors but all the PGs now came to down state.

Apart from this, OSDs of only one node remain up because when I try to start other OSDs in any node. The OSDs from other node goes down automatically.

Now, I am fine with loosing data after the point where we had shut down first node for maintenance.

But all I am expecting is to bring the Pool up such that I can get the data from that first node atleast.

I have tried mounting the ceph bluestores via fuse and tried recovering the data using pg data present but still no luck as I have no idea as of where to find the RBD image hash id.

Any help would be greatly appreciated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!