2-node cluster - mail loops back to myself on target domains for one node only

Oct 13, 2020
41
2
13
43
Hello Proxmox-Forum,

we do have a storage error. We are running a 2-node cluster of the latest mail gateway version in HA mode. One of the nodes started to make trouble delivering mail to external domains (Microsoft-based and others). The error is reproducible. The error message on the node1 is as follows:

Bash:
postfix/smtp[697866]: D72D0C0E9F: to=<destination@address.redacted>, relay=mx.some.otherdomain[6.23.8.17]:25, delay=0.42, delays=0.07/0/0.35/0, dsn=5.4.6, status=bounced (mail for address.redacted loops back to myself)

The same message from the same client delivers with the node2 without any issue. The nasty part with this is, that the erroneous delivery attempts drops silently the message and there is no failure reporting.

As both servers are in one cluster, I am uncertain, what may cause the issue. Furthermore it is a bit strange that things started happening about 3-4 weeks ago, whilst this configuration is running since the beginning of 2022 and there was no change in configuration.

I already did a lot of testing for DKIM, SPF, DNS in general, and the mails get 10/10 with mail-tester, regardless which node is used.

Have you had this issue before? Is there anything I need to check here?

Thank you and best regards,
Nico
 
mail for address.redacted loops back to myself
sounds odd - things I would check:
* does the DNS-setup for both nodes differ in any way (do they use the same dns-server, do they have /etc/hosts entries)?
* what's the output on both nodes for:
** `dig mx address.redacted`
** `dig address.redacted`

EDIT: do you have any modifications to the configuration templates on that cluster? do both nodes directly send mail to the internet - or is there some smarthost configured - do they use some kind of firewall that might modify smtp and or dns traffic?
 
Last edited:
sounds odd - things I would check:
* does the DNS-setup for both nodes differ in any way (do they use the same dns-server, do they have /etc/hosts entries)?
* what's the output on both nodes for:
** `dig mx address.redacted`
** `dig address.redacted`

EDIT: do you have any modifications to the configuration templates on that cluster? do both nodes directly send mail to the internet - or is there some smarthost configured - do they use some kind of firewall that might modify smtp and or dns traffic?
Hello @Stoiko Ivanov,

thanks for your reply.

  • DNS is exactly the same for both nodes.
  • /etc/hosts is unmodified and shows this (example for mx01):
    Code:
    root@mx01:~# cat /etc/hosts
    127.0.0.1       localhost
    ::1             localhost ip6-localhost ip6-loopback
    ff02::1         ip6-allnodes
    ff02::2         ip6-allrouters
    # --- BEGIN PVE ---
    10.11.12.9 mx01.honicon.com mx01
    # --- END PVE ---
  • node1 (the problematic one):
    • Bash:
      dig +short mx microsoft.com
      10 microsoft-com.mail.protection.outlook.com.
    • Code:
      dig +short microsoft.com
      20.76.201.171
      20.112.250.133
      20.70.246.20
      20.236.44.162
      20.231.239.246
  • node2 (the working one):
    • Bash:
      dig +short mx microsoft.com
      10 microsoft-com.mail.protection.outlook.com.
    • Code:
      dig +short microsoft.com
      20.76.201.171
      20.112.250.133
      20.70.246.20
      20.236.44.162
      20.231.239.246

I am not sure, I have tested DNS as probable error source already and found nothing as the mail also tries to get delivered. I also did some tcpdump but there was nothing serious in there.

Are you sure that the message is coming form out gateway then? It looks a bit like it is generated at the destination gateway?

Thank you.
 
Are you sure that the message is coming form out gateway then? It looks a bit like it is generated at the destination gateway?
usually postfix writes that the remote server 'said: <smtp-response from server>' in that case:
Code:
status=bounced (host gmail-smtp-in.l.google.com[2001:xxxx]  said: 550-5.7.26 This mail has been blocked...

dig +short mx microsoft.com
do you really have issues with mails to microsoft.com (vs. stuff hosted at o365,....)?
I also did some tcpdump but there was nothing serious in there.
does tcpdump (especially for dns) look identical from both nodes for a mail that fails?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!