ceph osd failure alert

Better use an external IT infrastructure monitoring system.

This module is not intended to be a robust monitoring solution. The fact that it is run as part of the Ceph cluster itself is fundamentally limiting in that a failure of the ceph-mgr daemon prevents alerts from being sent. This module can, however, be useful for standalone clusters that exist in environments where existing monitoring infrastructure does not exist.
 
I would recommend Checkmk https://checkmk.com/

But I am biased, as I wrote their Ceph plugin.

Just the guy I need to speak to! I'm trying to get the mk_ceph plugin working. I've actually got mk_ceph and mk_ceph2 downloaded onto a monitor node. I have dumped the binaries in /usr/lib/check_mk_agent/plugins, made them executable, and have populated /etc/checkmk/ceph.conf, but a discovery scan still doesn't detect the plugin. I know this is technically a proxmox forum for proxmox issues, but considering you're the developer of the checkmk ceph plugin, I just had to ask!

@gurubert - Is this your plugin?

https://dahlem-consulting.de/new-ceph-plugin-for-checkmk/

As I'm currently trying to get the mk_ceph plugin working but will try switching to this new ceph plugin instead.
 
Last edited:
What version of Checkmk are you running?
Starting with 2.4 my extension was incorporated upstream and does not need to be installed separately any more.

The mk_ceph.py agent plugin (for Python 3) needs to be deployed to /usr/lib/check_mk_agent/plugins on all Ceph nodes, not on the monitoring server.

The mk_ceph_2.py agent plugin is only for very old hosts with Python 2 and not Python 3. You should not have these running in production.

The config file for the agent plugin is /etc/check_mk/ceph.cfg and not /etc/checkmk/ceph.conf. It should contain two settings:
Code:
CONFIG=/etc/ceph/ceph.conf
CLIENT=client.admin
These are the defaults. The config file is not necessary if you are using the defaults.
 
Hey @gurubert,

Thanks for your reply.

I'm running 2.4.0p7 RAW

1752487478760.png

I did see the note regarding the integration of the new ceph plugin into 2.4+, but I couldn't understand why the proxmox host discovery scan wasn't detecting ceph checks, so started looking at adding the check manually.

mk_ceph.py is in the correct location and has been made executable.

1752482718232.png
as is the config file
1752487627951.png
however a discovery scan on the host only grabs the generic proxmox checks
1752487688061.png

I'm new to checkmk, so any advice would be much appreciated.
 
Does the host mepprox01 have the Checkmk agent installed?
Is it configured to query the agent?

Yes, the agent is installed

1752492862043.png

And I am assuming that this is the correct service for the agent.

1752492934907.png

As check-mk-agent by itself wasn't found

1752492967225.png

1752493146123.png

Okay, I just changed "API integrations if configured, else Checkmk agent" to AND checkmk agent, and that seems to have pulled a lot more checks though, but still no sign of ceph stats.

1752493234305.png
1752493248953.png

Do I have to activate the ceph plugin somehow?
 
Hold up, I just installed the agent on a machine that hasn't had any manual plugin intervention - mepprox04 - and made the same change to pull from API AND agent, and that seems to be showing correctly:

1752493442216.png

Does this mean I should just clean up the plugins directory on prox01 and re-run the discovery?