is it possible to recover ceph data

imoniker

Member
Aug 28, 2023
32
2
8
I'm testing PVE ceph on a 3 node cluster.
Today, after several host reboots, ceph crashed with the following error:
rocksdb: Corruption: Corruption: IO error: No such file or directoryWhile open a file for random read: /var/lib/ceph/mon/ceph-pve3n1/store.db/001272.ldb: No such file or directory
I deleted mon config in ceph.cfg, and removed directory in /var/lib/ceph/mon/ for all 3 nodes. When I tried to run pveceph create mon, it returned:
unable to get monitor info from DNS SRV with service name: ceph-mon

Is there anyway to recover data on ceph?
Other ceph services, such as OSD seemed OK.
 

Attachments

  • ceph.PNG
    ceph.PNG
    79.3 KB · Views: 9
Thanks for the reply. I had 3 monitors, all of them went down. It's a testing cluster so it doesn't matter.
I searched on this forum and I also found this:
How to recover a ceph cluster if all monitors are down:
https://access.redhat.com/documenta...-ceph-monitor-for-bare-metal-deployments_diag
I'll deliberately delete the database and test a recovery in the following weeks.

But it's really annoying since I just boot and reboot the server, the ceph mon db file is lost.

I guess I need to do some crash testing (blackout the cluster, blackout servers, reboot server randomly and frequently, delete mon db file deliberately, write until ceph is full, replace nodes, unplug hard driver...) before I could really put ceph in production. It seems there is no other better choices(raid or ZFS)...

Have you done this kind of crash test before? If you could summarize some common problems and solutions it would be very great. Losing data is really annoying.
 
I rebooted all nodes in less than 1 minutes frequently in my test environment.
I wanted to see if something unnormal could crash the cluster. ^^
It happened once in 30 times I guess.
 
That's true. I'll continue my crash testing.
If there is anything I'll let you know.
I would rather a crash and fix happened in the testing environment.
I also need to have some experience to fix a crash if it really happens...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!