ceph osd failure alert

kellogs · Aug 12, 2024

May I know if using the following is supported by promox ceph deployment?

https://docs.ceph.com/en/nautilus/mgr/alerts/

gurubert · Aug 12, 2024

Better use an external IT infrastructure monitoring system.

This module is not intended to be a robust monitoring solution. The fact that it is run as part of the Ceph cluster itself is fundamentally limiting in that a failure of the ceph-mgr daemon prevents alerts from being sent. This module can, however, be useful for standalone clusters that exist in environments where existing monitoring infrastructure does not exist.

kellogs · Aug 12, 2024

gurubert said:
Better use an external IT infrastructure monitoring system.

Hello gurubert,

Thank you for the reply. May i know if you have any recommendation for ceph OSD disk monitoring please?

gurubert · Aug 12, 2024

I would recommend Checkmk https://checkmk.com/

But I am biased, as I wrote their Ceph plugin.

leedys90 · Jul 12, 2025

gurubert said:
I would recommend Checkmk https://checkmk.com/

But I am biased, as I wrote their Ceph plugin.

Just the guy I need to speak to! I'm trying to get the mk_ceph plugin working. I've actually got mk_ceph and mk_ceph2 downloaded onto a monitor node. I have dumped the binaries in /usr/lib/check_mk_agent/plugins, made them executable, and have populated /etc/checkmk/ceph.conf, but a discovery scan still doesn't detect the plugin. I know this is technically a proxmox forum for proxmox issues, but considering you're the developer of the checkmk ceph plugin, I just had to ask!

@gurubert - Is this your plugin?

https://dahlem-consulting.de/new-ceph-plugin-for-checkmk/

As I'm currently trying to get the mk_ceph plugin working but will try switching to this new ceph plugin instead.

gurubert · Jul 14, 2025

What version of Checkmk are you running?
Starting with 2.4 my extension was incorporated upstream and does not need to be installed separately any more.

The mk_ceph.py agent plugin (for Python 3) needs to be deployed to /usr/lib/check_mk_agent/plugins on all Ceph nodes, not on the monitoring server.

The mk_ceph_2.py agent plugin is only for very old hosts with Python 2 and not Python 3. You should not have these running in production.

The config file for the agent plugin is /etc/check_mk/ceph.cfg and not /etc/checkmk/ceph.conf. It should contain two settings:

Code:

CONFIG=/etc/ceph/ceph.conf
CLIENT=client.admin

These are the defaults. The config file is not necessary if you are using the defaults.

leedys90 · Jul 14, 2025

Hey @gurubert,

Thanks for your reply.

I'm running 2.4.0p7 RAW

I did see the note regarding the integration of the new ceph plugin into 2.4+, but I couldn't understand why the proxmox host discovery scan wasn't detecting ceph checks, so started looking at adding the check manually.

mk_ceph.py is in the correct location and has been made executable.

as is the config file

however a discovery scan on the host only grabs the generic proxmox checks

I'm new to checkmk, so any advice would be much appreciated.

gurubert · Jul 14, 2025

Does the host mepprox01 have the Checkmk agent installed?
Is it configured to query the agent?

leedys90 · Jul 14, 2025

gurubert said:
Does the host mepprox01 have the Checkmk agent installed?
Is it configured to query the agent?

Yes, the agent is installed

And I am assuming that this is the correct service for the agent.

As check-mk-agent by itself wasn't found

Okay, I just changed "API integrations if configured, else Checkmk agent" to AND checkmk agent, and that seems to have pulled a lot more checks though, but still no sign of ceph stats.

Do I have to activate the ceph plugin somehow?

leedys90 · Jul 14, 2025

Hold up, I just installed the agent on a machine that hasn't had any manual plugin intervention - mepprox04 - and made the same change to pull from API AND agent, and that seems to be showing correctly:

Does this mean I should just clean up the plugins directory on prox01 and re-run the discovery?

gurubert · Jul 14, 2025

It's hard to tell from afar. Try it. And maybe you should move the conversation to the Checkmk forum. https://forum.checkmk.com/

Search

Search

ceph osd failure alert

kellogs

Active Member

gurubert

Distinguished Member

kellogs

Active Member

gurubert

Distinguished Member

leedys90

New Member

gurubert

Distinguished Member

leedys90

New Member

gurubert

Distinguished Member

leedys90

New Member

leedys90

New Member

gurubert

Distinguished Member

We value your privacy