Backup mails cannot be sent for some nodes in the cluster

Jul 31, 2023
13
3
8
I have a Proxmox cluster with 5 nodes, let's call them pve1 to pve5 for simplicity.
In this cluster there is a single global backup job which is configured as follows:

Code:
vzdump: backup-ebe5bd1e-e3fa
    comment daily
    schedule 2:30
    all 1
    enabled 1
    fleecing 0
    mode snapshot
    notes-template {{guestname}}
    notification-mode notification-system
    storage backup

I also have the following notification concept:

Code:
matcher: default-matcher
    comment Route all notifications to mail-to-root
    mode all
    target SMTP

sendmail: mail-to-root
    comment Send mails to root@pam's email address
    disable true
    mailto-user root@pam

smtp: SMTP
    from-address cluster@example.org
    mailto-user root@pam
    mode starttls
    server mail.example.org
    username user@example.org

smtp: status
    from-address cluster@example.org
    mailto status@example.org
    mode starttls
    server mail.example.org
    username user@example.org

matcher: Backup-Notification
    match-field exact:type=vzdump
    mode all
    target status
    target SMTP

Now two of the 5 nodes, lets say pve2 und pv3 receive the following error when sending the notification:
Code:
ERROR: could not notify via target `status`: could not notify via endpoint(s): status: Connection error: Connection refused (os error 111)
ERROR: could not notify via target `SMTP`: could not notify via endpoint(s): SMTP: Connection error: Connection refused (os error 111)

Which is strange, because if I manually start a backup job (with Notification Mode: Notification System) of a single VM on the two nodes, then the two notifications are sent successfully.

Furthermore, I can reach the mail server mail.example.org on port 587 via telnet from pve2 and pve3 without any problems.

So it seems to be some problem within the backup notification process, but I can't find it.

I am at a complete loss as to what could be causing this error.
 
I have found the problem. The backup mails often triggered the log line “lost connection after AUTH from” on the target mail server (postffix), whereupon fail2ban blocked the IP. However, I cannot say how this error occurs, I have tested it with two different Postfix servers, and both show the same behavior.