Hello, I'm managing a cluster of 4 nodes using proxmox 7.4-17 with CEPH. After a messy shutdown caused by a long power outage, a couple of VM images were corrupted, but we could restore them from backup. However, two of the OSD services refuse to come back, showing this kind of error:
I have been searching the web for a solution, but all I find is very technical, and refers to bug present in old versions of Ceph that are supposedly fixed. I had a try with ceph-bluestore-tool, but the obvious doesn ot seem to be sufficient:
Do you know anything that could be tried, or since all the data is recovered, is there a way to just tell ceph to forget about it and reclaim this storage space?
thank you for your help
Code:
Apr 18 11:25:58 pve03.impmc.upmc.fr ceph-osd[2976008]: 2024-04-18T11:25:58.854+0200 7f20e86d5080 -1 rocksdb: verify_sharding unable to list column families: Corruption: CURRENT file does n
ot end with newline
Apr 18 11:25:58 pve03.xxx.fr ceph-osd[2976008]: 2024-04-18T11:25:58.854+0200 7f20e86d5080 -1 bluestore(/var/lib/ceph/osd/ceph-10) _open_db erroring opening db:
Apr 18 11:25:59 pve03.xxx.fr ceph-osd[2976008]: 2024-04-18T11:25:59.322+0200 7f20e86d5080 -1 osd.10 0 OSD:init: unable to mount object store
Apr 18 11:25:59 pve03.xxx.fr ceph-osd[2976008]: 2024-04-18T11:25:59.322+0200 7f20e86d5080 -1 ** ERROR: osd init failed: (5) Input/output error
Apr 18 11:25:59 pve03.xxx.fr systemd[1]: ceph-osd@10.service: Main process exited, code=exited, status=1/FAILURE
I have been searching the web for a solution, but all I find is very technical, and refers to bug present in old versions of Ceph that are supposedly fixed. I had a try with ceph-bluestore-tool, but the obvious doesn ot seem to be sufficient:
Code:
# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-10 fsck
2024-04-18T11:54:04.644+0200 7fcb4245a3c0 -1 rocksdb: verify_sharding unable to list column families: Corruption: CURRENT file does not end with newline
2024-04-18T11:54:04.644+0200 7fcb4245a3c0 -1 bluestore(/var/lib/ceph/osd/ceph-10) _open_db erroring opening db:
repair failed: (5) Input/output error
Do you know anything that could be tried, or since all the data is recovered, is there a way to just tell ceph to forget about it and reclaim this storage space?
thank you for your help