[SOLVED] Ceph monitors and manager corrupt - how to restore?

Natmac21007 · Apr 12, 2023

I have a development setup with 3 nodes that unexpectedly had a few power outages and that has caused some corruption. I have tried to follow the documentation from the ceph site for troubleshooting monitors, but I can't get them to restart, and I can't get the manager to restart.

I deleted one of the monitors and was able to create a new one and get it running, but I think the main problem now might be getting the manager running. In the pve console I can't create any new ones, or destroy the one I had previously running on only one of the nodes.

I was hoping there might be a way to remove the current config info without deleting the osd's and the data on them, and restoring them somehow.

shanreich · Apr 12, 2023

Can you post your ceph status and health detail output (ceph -s / ceph health detail)? If you had 3 monitors and only restored one, it is likely that you do not have a quorate Ceph cluster.

Additionally the status output from all monitors on all hosts would be interesting:

Code:

systemctl status ceph-mon@<host-name>

Natmac21007 · Apr 12, 2023

ceph -s was just timing out previously, I hadn't been able to get the other monitors running, so I was now working towards getting those restored, but thought that it might be helpful to have a mgr running as well.

node 1 & 2 which I haven't got the mon back up and running ceph -s / ceph health detail produce basically:
2023-04-12T21:14:29.576+1000 7f6786386700 0 monclient(hunting): authenticate timed out after 300

node 3 I believe I have got the mon running, but it also times out.
when I run systemctl status ceph-mon@<node 3> it says it failed. so I guess I haven't fixed that.

shanreich · Apr 12, 2023

Natmac21007 said:
when I run systemctl status ceph-mon@<node 3> it says it failed. so I guess I haven't fixed that.

what is the exact output of this command (including the logs)?

Might be interesting to get the whole log of the monitor to search for potential issues:

Code:

journalctl -b -u ceph-mon@<node 3>

Natmac21007 · Apr 12, 2023

root@pve-hv03:~# systemctl status ceph-mon@pve-hv03
● ceph-mon@pve-hv03.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Wed 2023-04-12 21:43:03 AEST; 3s ago
Process: 212932 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id pve-hv03 --setuser ceph --setgroup c>
Main PID: 212932 (code=exited, status=1/FAILURE)
CPU: 169ms

Apr 12 21:43:03 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Scheduled restart job, restart counter is at 5.
Apr 12 21:43:03 pve-hv03 systemd[1]: Stopped Ceph cluster monitor daemon.
Apr 12 21:43:03 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Start request repeated too quickly.
Apr 12 21:43:03 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Failed with result 'exit-code'.
Apr 12 21:43:03 pve-hv03 systemd[1]: Failed to start Ceph cluster monitor daemon.

journalctl:
Apr 12 21:42:53 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Scheduled restart job, restart counter is at 4.
Apr 12 21:42:53 pve-hv03 systemd[1]: Stopped Ceph cluster monitor daemon.
Apr 12 21:42:53 pve-hv03 systemd[1]: Started Ceph cluster monitor daemon.
Apr 12 21:42:53 pve-hv03 ceph-mon[212932]: 2023-04-12T21:42:53.829+1000 7fc89daab700 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-pve-hv03': (13) Permission denied
Apr 12 21:42:53 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Main process exited, code=exited, status=1/FAILURE
Apr 12 21:42:53 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Failed with result 'exit-code'.
Apr 12 21:43:03 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Scheduled restart job, restart counter is at 5.
Apr 12 21:43:03 pve-hv03 systemd[1]: Stopped Ceph cluster monitor daemon.
Apr 12 21:43:03 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Start request repeated too quickly.
Apr 12 21:43:03 pve-hv03 systemd[1]: ceph-mon@pve-hv03.service: Failed with result 'exit-code'.
Apr 12 21:43:03 pve-hv03 systemd[1]: Failed to start Ceph cluster monitor daemon.

Thanks,
Nathan

shanreich · Apr 12, 2023

Could you attach the complete log output from journalctl?

Code:

journalctl --since '1 week ago' -u ceph-mon@pve-hv03 > ceph-mon_log.txt

Natmac21007 · Apr 12, 2023

I have also attached the log from one of the other cluster nodes (pve-hv02) that I haven't tried fixing the monitor on yet.

I'm wondering can I just delete/remove all the configuration data on the nodes, without destroying the osd's, and just rebuild the "ceph-cluster" from scratch and add the osd's in and my data will be there?

Different info in hv01 so added that log as well.

Natmac21007 · Apr 12, 2023

So the problems started with a power outage on the evening of the 6th April, and then several on the evening of the 7th. I was remoted in from home working on them when I noticed power issues on the evening of the 7th so I shut them down for over Easter.

I'm on leave, but a colleague returned to work today and restarted them. I remoted in to check on things and noticed no monitor's running, no manager running and no dashboard/osd info in the proxmox console.

shanreich · Apr 12, 2023

Looks like the issue is buried here:

Code:

Apr 12 21:53:16 pve-hv03 ceph-mon[215347]: 2023-04-12T21:53:16.334+1000 7f3c4f4dc700 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-pve-hv03': (13) Permission denied

How do the permissions on this folder look like? Maybe a simple fix for this would be to bring this in order?

Code:

ls -alh /var/lib/ceph/mon/ceph-pve-hv03

Natmac21007 · Apr 12, 2023

I wondered about that permission denied. In the ceph monitor trouble shooting there was a step to chown.

root@pve-hv03:~# ls -alh /var/lib/ceph/mon/ceph-pve-hv03
total 16K
drwxr-xr-x 3 root root 4.0K Apr 12 20:29 .
drwxr-xr-x 3 ceph ceph 4.0K Apr 12 17:49 ..
-rw------- 1 root root 8 Apr 12 20:29 kv_backend
drwxr-xr-x 2 ceph ceph 4.0K Apr 12 17:21 store.db

Natmac21007 · Apr 12, 2023

I checked the other nodes and kv_backend was owned by ceph so I did chown on this node and tried to restart but still failed.

shanreich · Apr 12, 2023

Did you chown the folder itself as well? (i.e. chown -R ceph:ceph /var/lib/ceph/mon/ceph-pve-hv03)?

You probably need to run reset-failed as well before you can restart: systemctl reset-failed && systemctl start ceph-mon@pve-hv03. If this doesn't work the log output would be interesting again.

Natmac21007 · Apr 12, 2023

Based on some reading on the Ceph docs I tried to restart the mon on node 2 and got this error:

root@pve-hv02:~# ceph-mon -i pve-hv02
2023-04-12T22:36:02.536+1000 7f00250a9700 -1 error: monitor data filesystem reached concerning levels of available storage space (available: 0% 79 MiB)
you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%)
but I can't see which file system it is talking about, there should be no issues with space, I'm guessing a file is corrupted.

Natmac21007 · Apr 12, 2023

Still failed

Natmac21007 · Apr 12, 2023

Hi @shanreich,

Thanks for your help trying to solve the issue, I'm heading to bed now. I'm at the point where I'm wanting to move on with getting it going and less about solving the issue in the morning.

Can I just delete/remove all the configuration data on the nodes, without destroying the osd's, and just rebuild the "ceph-cluster" from scratch and add the osd's in and my data will be there?

Thank you again,
Nathan.

shanreich · Apr 12, 2023

Since your cluster is unhealthy, you would have to follow the following tutorial [1] in order to remove the non-functioning monitors from your cluster.

Additionally I would check on the host pve-hv02 if there is enough space available in the (presumably) root filesystem via df -h. Otherwise, you will run into the same problem there again (monitor data filesystem reached concerning levels of available storage space).

[1] https://docs.ceph.com/en/latest/rad.../#removing-monitors-from-an-unhealthy-cluster

Natmac21007 · Apr 13, 2023

Hi @shanreich ,

Thanks for you help and pointers, after spending further time looking at the filesystem size issue I was able to restart the first two nodes monitors and then destroy and create the third monitor again and it is all back healthy again.

Thanks,
Nathan.

Natmac21007 · Apr 13, 2023

TL.DR
If your monitors aren't running check the logs. A full filesystem where the monitor data is stored will prevent them from running.
systemctl status ceph-mon@<host-name>
journalctl --since '1 week ago' -u ceph-mon@<host-name> > ceph-mon_log.txt
systemctl reset-failed && systemctl start ceph-mon@<host-name>

[SOLVED] Ceph monitors and manager corrupt - how to restore?

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Attachments

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Member

Attachments

Member

Proxmox Staff Member

Member

Member

We value your privacy