Ceph Manager Alerter

fxandrei

Renowned Member
Jan 10, 2013
163
17
83
So i have a proxmox ceph cluster with 4 nodes, and ive been trying to user the alerter module.

So ive configured everything that is instructed in the docs : https://docs.ceph.com/docs/master/mgr/alerts/

The problem is that when i try to execute "ceph alerts send" i get this error :

MGR_MODULE_ERROR: Module 'alerts' has failed: %d format: a number is required, not str

The problem is that this crashes my whole managers, and cannot get them back. So i destroy them and recreate them from the proxmox gui.

Im not sure what to check and where.

Did anyone manage to use this alerter module ?
 
MGR_MODULE_ERROR: Module 'alerts' has failed: %d format: a number is required, not str
This seems to be a bug. Remove the interval, ceph config rm mgr mgr/alerts/interval. This will revert to the default of 60 sec.
https://tracker.ceph.com/issues/45151

And this issue doesn't seem to have made it into Ceph 14.2.9.
https://tracker.ceph.com/issues/43820

There is a commit that seems to fix the error, but this one wasn't backported yet either.
https://github.com/ceph/ceph/commit/368c810a4200a57a772b8122fc606a2b36de7413
 
So i ran that command to remove the interval, and re-enabled the alerter. Now i se this in the mgr logs :

-1 ceph_set_health_checks check ALERTS_SMTP_ERROR unexpected key count
1 client.0 error registering admin socket command: (17) File exists
-1 client.0 error registering admin socket command: (17) File exists
 
See the other tracker link. Possibly that has an influence as well.