Ceph Alerts Module SMTP-Error

Kadrim

Well-Known Member
May 20, 2018
48
2
48
42
Hi there,

i am running proxmox 6.2-10 with ceph 14.2.9

i also installed the ceph-manager-dashboard (to get the alert module) via

apt install ceph-mgr-dashboard

and then configured the alerts module like this:

Bash:
ceph mgr module enable alerts
ceph config set mgr mgr/alerts/smtp_host '172.18.0.112'
ceph config set mgr mgr/alerts/smtp_ssl false
ceph config set mgr mgr/alerts/smtp_port 25
ceph config set mgr mgr/alerts/smtp_destination 'user@example.com'
ceph config set mgr mgr/alerts/smtp_sender 'pve@example.com'
ceph config set mgr mgr/alerts/smtp_user 'pve'
ceph config set mgr mgr/alerts/smtp_password 'my-scrambled-pass'
ceph config set mgr mgr/alerts/smtp_from_name 'Ceph Cluster Foo'

If i then run the test-command or even on purpose do something that will change the health of the ceph no email is beeing sent.

Instead i find the following in my logs:

Code:
# ceph-mgr.pve.log

2020-07-29 11:30:07.220 7f6b7d943700 -1 ceph_set_health_checks check ALERTS_SMTP_ERROR unexpected key count

and the output of ceph health detail:

Bash:
HEALTH_WARN unable to send alert email
ALERTS_SMTP_ERROR unable to send alert email
    [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:727)

On the mail server (mail relay) i can see, that a connection is beeing made, but with an immediate disconnect:

Code:
Jul 29 11:30:07 Mail postfix/smtpd[352]: connect from unknown[172.18.0.2]
Jul 29 11:30:07 Mail postfix/smtpd[352]: lost connection after UNKNOWN from unknown[172.18.0.2]
Jul 29 11:30:07 Mail postfix/smtpd[352]: disconnect from unknown[172.18.0.2] unknown=0/3 commands=0/3

The mail server does work and can send mails, i even stitched together a simple test program in php to see if the smtp auth is working which it does.

Does anyone have a hint for me?

Thanks in advance!
 
So maybe you meant this tracker?
Hm... no I didn't. :D I just thought about the email alert being more completely broken. :rolleyes:

But if i read that correctly this should have already been fixed in 14.2.9? Am i wrong?
Some of the things should have landed in 14.2.9, yes. But I suppose there might still be some bugs floating around.
 
ok, as a workaround i activated implicit ssl on port 465 on my mail-server with an invalid (aka snakeoil) cert, because the machine is only a relay and not available to the public.

to do that, one has to remove the smtp_port and smtp_ssl variable via

Bash:
ceph config rm mgr mgr/alerts/smtp_ssl
ceph config rm mgr mgr/alerts/smtp_port

that works for now but i too think, that the alert module needs some fixing from the ceph team ;-)
 
  • Like
Reactions: Alwin