Issue with caching when destination is offline

Jul 16, 2021
11
1
3
31
Hi There,

We're running a cluster of two PMGs. They're located in different networks in different locations for geo redundancy. Now we have the following issue:

One location will be offline for a day or two - this also means that the mail servers are offline and the second gateway should cache the incoming messages until the main location is available again. The last time this happened it did not work - all incoming messages on the second gateway were immediately rejected. In the logs i have the following lines:

Code:
Dec 27 10:03:36 mailgw-02 postfix/smtpd[1825397]: NOQUEUE: reject: RCPT from unknown[10.10.10.1]: 450 4.1.2 <user1@domain.ch>: Recipient address rejected: Domain not found; from=<sender@externaldomain.ch> to=<user1@domain.ch> proto=ESMTP helo=<mailserver.externaldomain.ch>
Dec 27 10:03:36 mailgw-02 postfix/smtpd[1825397]: disconnect from unknown[10.10.10.1] ehlo=2 starttls=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=6/7
Dec 27 10:05:46 mailgw-02 postfix/postscreen[1825418]: CONNECT from [10.10.10.1]:55855 to [192.168.104.137]:25
Dec 27 10:05:46 mailgw-02 postfix/postscreen[1825418]: PASS OLD [10.10.10.1]:55855
Dec 27 10:05:46 mailgw-02 postfix/smtpd[1825419]: connect from unknown[10.10.10.1]
Dec 27 10:05:53 mailgw-02 postfix/smtpd[1825419]: NOQUEUE: reject: RCPT from unknown[10.10.10.1]: 450 4.1.2 <user1@domain.ch>: Recipient address rejected: Domain not found; from=<sender@externaldomain.ch> to=<user1@domain.ch> proto=ESMTP helo=<mailserver.externaldomain.ch
Dec 27 10:05:53 mailgw-02 postfix/smtpd[1825419]: disconnect from unknown[10.10.10.1] ehlo=2 starttls=1 mail=1 rcpt=0/1 rset=1 quit=1 commands=6/7
Dec 27 10:06:34 mailgw-02 postfix/postscreen[1825418]: CONNECT from [10.10.10.1]:30106 to [192.168.104.137]:25
Dec 27 10:06:34 mailgw-02 postfix/postscreen[1825418]: PASS OLD [10.10.10.1]:30106
Dec 27 10:06:34 mailgw-02 postfix/smtpd[1825419]: connect from unknown[10.10.10.1]

I tested it now and got again the same behaviour again.


In our secondary location we have a double NAT: Router <-- 10.10.10.0/24 --> Firewall <-- 192.168.104.128/27 --> Mailgateway

When the mail is sent via our primary gateway, it gets deferred with status=deferred (connect to 192.168.101.249[192.168.101.249]:25: No route to host). During testing the first mail via our secondary gateway landed in the queue (all others were rejected as shown above). This one mail didn't get the same message as the ones on our primary gateway, it says `Connection timed out`.

I suspect it has to to with our network config, but right now I don't have an idea.
Has anyone an idea what the issue might be?
 
The logs look odd - you should configure your Firewall/NAT router to correctly pass through the public-IP address of the outside mail-server
(else most anti-spam measures won't work, since quite a few depend on checking the IP against DNSBLs, also SPF needs the IPs)

apart from that - I'd say - make sure DNS works on your PMG
Recipient address rejected: Domain not found;
sounds like DNS is not working, or you have not entered the recipient address as relay domain.
 
The logs look odd - you should configure your Firewall/NAT router to correctly pass through the public-IP address of the outside mail-server
(else most anti-spam measures won't work, since quite a few depend on checking the IP against DNSBLs, also SPF needs the IPs)

apart from that - I'd say - make sure DNS works on your PMG

sounds like DNS is not working, or you have not entered the recipient address as relay domain.
Hi @Stoiko Ivanov

Still have to find the reason, why `RCPT from unknown[10.10.10.1]:` is being gererated, but there I suspect a misconfiguration on our NAT.

The reason for the `Recipient address rejected: Domain not found` lies, as you suspected, in DNS. We use a DNS-Resolver on our second site, which forwards queries for our domain to our internal DNS server. When now the connection was down, the resolver couldn't resolve our internal domain anymore.

What I did not suspect was that in this case PMG will reject the message with a `domain not found` error. I thought it will cache all messages for the domain as soon as it's configured as relay domain, no matter if it can resolve the configured transport or not. Is this on purpose? Because if i configure an IP address instead of the dns name it works - the mail is deferred with `no route to host`.
 
What I did not suspect was that in this case PMG will reject the message with a `domain not found` error. I
the mail was deferred with a temporary error code 450 - so the sending mailserver should try again after a while (postfix and also PMG will try for up to 5 days, before giving up and notifying the sender that the mail could not be delivered) - the behavior is usually working out quite nicely

I hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!