[SOLVED] Postscreen different behaviour in PMG cluster

rileonar

Active Member
Jul 18, 2016
8
2
43
Hi,
I just installed a new PMG cluster, and the problem is that the behaviour about postscreen is different between the two nodes.
The pmg0 node is the master and pmg1 is the slave.

In pmg0 the Pregreet test is active, so the first time a new IP connects to the server, the usual behaviour take place:

Connected to pmg0 Escape character is '^]'. 220-pmg0 ESMTP Mail Gateway 220 pmg0 ESMTP Mail Gateway

So if the client doesn't wait for the second line ("220 "), the connection correctly fails for "protocol error"

In pmg1 the Pregreet test is NOT active, so every time an IP (even a new one) connects to the server, only the "second" line is output:

Connected to pmg1 Escape character is '^]'. 220 pmg1 ESMTP Mail Gateway

The cluster has been configured via GUI, and the /etc/postfix directories content seems the same.

How can I troubleshoot the different behaviour between these cluster nodes?

Any help is greatly appreciated.
 
on a hunch - could it be that you connect to the internal port (defaults to 26) on pmg1, while you connect to the external port (defaults to 25) on pmg0?

postscreen is only used on the external port
 
Hi Stoiko,
thanks for your answer.
We don't use port 26: the cluster nodes are used for incoming mail only, so there must be some other reason for the different behaviour
 
Hi Stoiko,
thanks for your answer.
We don't use port 26: the cluster nodes are used for incoming mail only, so there must be some other reason for the different behaviour
hmm - please compare (with `diff -u`):
* /etc/pmg/pmg.conf
* /etc/postfix/main.cf
* /etc/postfix/master.cf
 
Hi Stoiko,
it seems there is no relevant difference between conf files... in /tmp the ones taken from pmg1:
root@pmg0:~# diff /tmp/pmg.conf /etc/pmg/pmg.conf root@pmg0:~# diff /tmp/main.cf /etc/postfix/main.cf 21c21 < myhostname = pmg1 --- > myhostname = pmg0 root@pmg0:~# diff /tmp/master.cf /etc/postfix/master.cf 93c93 < -o mynetworks=127.0.0.0/8,192.168.77.4 --- > -o mynetworks=127.0.0.0/8,192.168.77.3

Thanks again for your help!
 
hmm - ok then the issue is not there...

from which IP's do you connect to both nodes? (if it's in the trusted networks then postscreen skips the tests iirc)

* make sure you don't have any different NAT rules (on the pmg nodes as well as on any router/firewall in their way to the internet)
* try connecting from pmg0->pmg1 and from pmg1->pmg0 with `nc -v pmg0 25` (or pmg1))
 
The IP from I connect is an external IP, not listed in trusted networks.
Both pmg0 and pmg1 have public IP address, no NAT at all.
Then, I tried to connect using nc, and here is the difference: connecting to pmg1 from everywhere using telnet the pregreet test is NOT performed, while connecting using nc the pregreet test IS performed::

# nc -v pmg1 25 pmg1 [x.x.x.x] 25 (smtp) open 220-pmg1.xtsystem.it ESMTP Mail Gateway 220 pmg1.xtsystem.it ESMTP Mail Gateway quit 221 2.0.0 Bye # telnet pmg1 25 Trying x.x.x.x... Connected to pmg1 Escape character is '^]'. 220 pmg1 ESMTP Mail Gateway quit 221 2.0.0 Bye Connection closed by foreign host.

Even stranger, when connecting to pmg0 from everywhere the pregreet test is NOT performed, neither using nc nor telnet.
 
Hi Stoiko,
first of all thanks again for your support.
In main.cf postscreen is configured as following:

postscreen_access_list = permit_mynetworks, cidr:/etc/postfix/postscreen_access

The configuration is the same for both pmg0 and pmg1, and was made from pmg GUI.

The file /etc/postfix/postscreen_access contain 4 lines only (same lines on both hosts) and the source IPs are not included in the list of the "whitelisted" ones.

Looking into /var/log/mail.log I can't see anything wrong, it seems both the servers have "CONNECT", "PASS NEW", "PASS OLD" and "PREGREET" events.
 
The file /etc/postfix/postscreen_access contain 4 lines only (same lines on both hosts) and the source IPs are not included in the list of the "whitelisted" ones.
I meant the ones that are automatically added for a while - as written in the linked howto - postscreen seems to keep a cache ...
this also means for the connections where you don't see the 220- line you should get a corresponding logline
 
Hi Stoiko,
yes, you're right: the difference in behaviour between pmg0 and pmg1 should be caused by cache, rather than some difference in configuration.
The postfix howto describes some statements related to cache, all starting with postscreen_cache_ but in my /etc/postfix/main.cf there isn't any of these statements.
So I supposed postfix in my setup rely on default configuration, where the cache database is in:
/var/lib/postfix/postscreen_cache.db
I've removed this file on both pmg0 and pmg1, then issued a 'postfix reload' and postfix re-created it with smaller size.
Then I tried to telnet port 25 from outside and now both pmg0 and pmg1 behave the same!
Many thanks again for your support, we can consider this problem as SOLVED.
 
  • Like
Reactions: Stoiko Ivanov
glad we found the cause of the mismatch ;)


Thanks for coming back and providing feedback

Edit: removed the hint to setting the thread as solved (did not see that this was already done)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!