[SOLVED] Restore / rebuild Ceph monitor after HW crash

cmonty14

Well-Known Member
Mar 4, 2014
343
5
58
Hello!
Due to an HD crash I was forced to rebuild a server node from scratch, means I installed OS and Proxmox VE (
apt install proxmox-ve postfix open-iscsi) fresh on the server.
Then I executed and Ceph (pveceph install) on greenfield.
Then I ran pvecm add 192.168.10.11 -ring0_addr 192.168.10.12 -ring1_addr 192.168.20.12 to add the node to the existing cluster.

This all worked well.

As a next step I started installation of Ceph (pveceph install) and finally executed pveceph createmon.

The Ceph status shows that the relevant node is out of quorum:
ceph health detail
HEALTH_WARN noout flag(s) set; 20 osds down; 3 hosts (22 osds) down; Reduced data availability: 1429 pgs inactive; Degraded data redundancy: 714
7773/14446416 objects degraded (49.478%), 1444 pgs degraded, 1845 pgs undersized; mon ld4257 is low on available space; 1/3 mons down, quorum ld
4257,ld4465
OSDMAP_FLAGS noout flag(s) set
OSD_DOWN 20 osds down
[...]
MON_DOWN 1/3 mons down, quorum ld4257,ld4465
mon.ld4464 (rank 1) addr 10.97.206.98:6789/0 is down (out of quorum)


Question:
Does it makes sense to continue like this?
Will it be possible to rebuild the cluster?

In my understanding I must fix the issue with failed monitoring service on node ld4464.
How can I do this?

THX
 
Hi,
thanks for this input.

After successfully removing the relevant node ld4464 from Ceph the relevant error message is gone.

What would be the next steps?
Do you advice to re-enter this node ld4464 to the existing Ceph cluster?
Or should I first fix the OSDs and ensure that they will start up?

THX
 
After successfully removing the relevant node ld4464 from Ceph the relevant error message is gone.
You removed the MON as described in the link? If os, then you can use the pveceph tool to create a new one.

Or should I first fix the OSDs and ensure that they will start up?
I would start first with the OSDs, to get the cluster back to a healthy state. It should only need a restart of the ceph-osd.target to get them started.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!