[SOLVED] Some nodes do not send mail correctly

linkstat

Renowned Member
Mar 15, 2015
39
22
73
Córdoba, Argentina
Hello.

I have a 7-node cluster, with the same Postfix configuration (identical /etc/postfix/main.cf except for the mydestination line, where, among other things, the corresponding hostname is specified).

Mail for root@pam is correctly configured from the Web GUI, and I can verify that mail is configured from bash:
Bash:
cat /etc/pve/user.cfg | grep root@pam
user:root@pam:1:0:::my.address@some.mail.com:::
Furthermore, if on each node, I run:
Bash:
echo "Test from PVE Node: $(hostname)" | /usr/bin/pvemailforward
the mails are dispatched correctly.

But if instead, the command to execute is:
Bash:
echo "Another test from PVE Node: $(hostname)" | mail -s "A test from $(hostname)" root

four of seven nodes correctly send the mail to the destination. But the other three nodes try to send to root@nodename.localdomain and in GMail I get a Delivery Status Notification (Failure) from Mail Delivery Subsystem <mailer-daemon@googlemail.com>

I thought it might be the Debian alternatives configuration, but on all nodes, the configuration is the same:
Bash:
ls -lh /usr/bin/mail
lrwxrwxrwx 1 root root 22 Jun 16 2015 /usr/bin/mail -> /etc/alternatives/mail

ls -lh /etc/alternatives/mail
lrwxrwxrwx 1 root root 18 Jun 16 2015 /etc/alternatives/mail -> /usr/bin/bsd-mailx

Finally, the .forward file exists in the root directory of all nodes, and the content is the same:
Bash:
cat /root/.forward
|/usr/bin/pvemailforward

And since I don't know what else to look at, I'm here for help, so they can give me a hand.

Thank you!!!
 
But if instead, the command to execute is:
Bash:
echo "Another test from PVE Node: $(hostname)" | mail -s "A test from $(hostname)" root
On a hunch the issue might be:
except for the mydestination line, where, among other things, the corresponding hostname is specified).
do all nodes have correct entries in their /etc/hosts (pointing the name in mydestination to an ip on the host)?

to say more - we'd need the journal from one node where the forwarding works, and from one where it doesn't
(just starting with the first line from a postfix process)
 
Hi Stoiko Ivanov.
in the /etc/postfix/main.cf the lines change only the hostname across nodes, for example:

PVE Node 1 (sends mail correctly):
Code:
myhostname=node-a.mydom.local
mydestination = $myhostname, node-a.mydom.local, localhost.mydom.local, localhost

PVE Node 2 (does not send emails correctly):
Code:
myhostname=node-b.mydom.local
mydestination = $myhostname, node-b.mydom.local, localhost.mydom.local, localhost

The /etc/hosts files are practically the same on all nodes (except for the specific node name itself). For example:
PVE Node 1:
Code:
127.0.0.1        localhost.localdomain      localhost
10.4.44.1        node-a.mydom.local       node-a        pvelocalhost

# ProxmoxVE Cluster
10.4.44.2        node-b.mydom.local   node-b
10.4.44.3        node-c.mydom.local   node-c
...

PVE Node 2:
Code:
127.0.0.1        localhost.localdomain      localhost
10.4.44.2        node-b.mydom.local       node-b        pvelocalhost

# ProxmoxVE Cluster
10.4.44.1        node-a.mydom.local   node-a
10.4.44.3        node-c.mydom.local   node-c
...

and so...

Regarding the log emails, they would be these:
First, run the mail command on each node:
echo "Mail de prueba desde PVE Node: $(hostname)" | mail -s "PVE Mail Test desde nodo $(hostname)" root
and then, we see the mail logs cat /var/log/mail.log
(do not pay attention to the "Network is unreachable" of some hosts, it is because they first try to connect via IPv6 instead of IPv4)

node-a (sends mails correctly):
Code:
Apr 21 08:19:14 node-a postfix/pickup[687772]: 268534C0817: uid=0 from=<root>
Apr 21 08:19:14 node-a postfix/cleanup[692401]: 268534C0817: message-id=<20220421111914.268534C0817@node-a.mydom.lan>
Apr 21 08:19:14 node-a postfix/qmgr[3339358]: 268534C0817: from=<root@node-a.mydom.lan>, size=469, nrcpt=1 (queue active)
Apr 21 08:19:14 node-a postfix/pickup[687772]: D3CF64C084F: uid=65534 from=<root>
Apr 21 08:19:14 node-a postfix/cleanup[692401]: D3CF64C084F: message-id=<20220421111914.268534C0817@node-a.mydom.lan>
Apr 21 08:19:14 node-a postfix/qmgr[3339358]: D3CF64C084F: from=<root@node-a.mydom.lan>, size=651, nrcpt=1 (queue active)
Apr 21 08:19:14 node-a postfix/local[692404]: 268534C0817: to=<root@node-a.mydom.lan>, orig_to=<root>, relay=local, delay=0.73, delays=0.03/0.01/0/0.69, dsn=2.0.0, status=sent (delivered to command: /usr/bin/pvemailforward)
Apr 21 08:19:14 node-a postfix/qmgr[3339358]: 268534C0817: removed
Apr 21 08:19:19 node-a postfix/smtp[692413]: D3CF64C084F: to=<my.address@gmail.com>, relay=smtp.gmail.com[64.233.190.109]:587, delay=4.2, delays=0/0.04/1.4/2.7, dsn=2.0.0, status=sent (250 2.0.0 OK  1650539958 t3-20020a4a7603000000b0033a53c11f82sm3704887ooc.20 - gsmtp)
Apr 21 08:19:19 node-a postfix/qmgr[3339358]: D3CF64C084F: removed



node-b (does not send mail correctly):
Code:
Apr 21 08:19:14 node-b postfix/pickup[2045673]: 214F1140B24: uid=0 from=<root>
Apr 21 08:19:14 node-b postfix/cleanup[2047938]: 214F1140B24: message-id=<20220421111914.214F1140B24@node-a.mydom.lan>
Apr 21 08:19:14 node-b postfix/qmgr[416655]: 214F1140B24: from=<root@node-b.mydom.lan>, size=485, nrcpt=1 (queue active)
Apr 21 08:19:18 node-b postfix/smtp[2047940]: 214F1140B24: to=<root@node-b.mydom.lan>, orig_to=<root>, relay=smtp.gmail.com[64.233.190.109]:587, delay=3.9, delays=0.03/0.02/2/1.8, dsn=2.0.0, status=sent (250 2.0.0 OK  1650539957 r35-20020a056870582300b000df0dc42ff5sm1005008oap.0 - gsmtp)
Apr 21 08:19:18 node-b postfix/qmgr[416655]: 214F1140B24: removed

node-c (does not send mail correctly):
Code:
Apr 21 08:19:14 node-c postfix/pickup[94338]: 45B5D9942F: uid=0 from=<root>
Apr 21 08:19:14 node-c postfix/cleanup[197254]: 45B5D9942F: message-id=<20220421111914.45B5D9942F@node-a.mydom.lan>
Apr 21 08:19:14 node-c postfix/qmgr[2807116]: 45B5D9942F: from=<root@node-c.mydom.lan>, size=483, nrcpt=1 (queue active)
Apr 21 08:19:16 node-c postfix/smtp[197257]: 45B5D9942F: to=<root@node-c.mydom.lan>, orig_to=<root>, relay=smtp.gmail.com[64.233.190.109]:587, delay=2.6, delays=0.04/0.01/1.6/1, dsn=2.0.0, status=sent (250 2.0.0 OK  1650539956 n66-20020acabd45000000b002ef6c6992e8sm7212994oif.42 - gsmtp)
Apr 21 08:19:16 node-c postfix/qmgr[2807116]: 45B5D9942F: removed

node-d (does not send mail correctly):
Code:
Apr 21 08:19:14 node-d postfix/pickup[1303050]: 2A0884C05F5: uid=0 from=<root>
Apr 21 08:19:14 node-d postfix/cleanup[1306301]: 2A0884C05F5: message-id=<20220421111914.2A0884C05F5@node-a.mydom.lan>
Apr 21 08:19:14 node-d postfix/qmgr[2990464]: 2A0884C05F5: from=<root@node-d.mydom.lan>, size=473, nrcpt=1 (queue active)
Apr 21 08:19:14 node-d postfix/smtp[1306303]: connect to smtp.gmail.com[2800:3f0:4003:c01::6d]:587: Network is unreachable
Apr 21 08:19:16 node-d postfix/smtp[1306303]: 2A0884C05F5: to=<root@node-d.mydom.lan>, orig_to=<root>, relay=smtp.gmail.com[64.233.190.109]:587, delay=2.1, delays=0.07/0.01/1.2/0.86, dsn=2.0.0, status=sent (250 2.0.0 OK  1650539956 u7-20020a4a85c7000000b0035c12c8be73sm549906ooh.29 - gsmtp)
Apr 21 08:19:16 node-d postfix/qmgr[2990464]: 2A0884C05F5: removed

node-e (sends mails correctly):
Code:
Apr 21 08:19:14 node-e postfix/pickup[758580]: 294F9320E25: uid=0 from=<root>
Apr 21 08:19:14 node-e postfix/cleanup[801867]: 294F9320E25: message-id=<20220421111914.294F9320E25@node-e.mydom.lan>
Apr 21 08:19:14 node-e postfix/qmgr[1606]: 294F9320E25: from=<root@node-e.mydom.lan>, size=481, nrcpt=1 (queue active)
Apr 21 08:19:14 node-e postfix/pickup[758580]: 6F393320E3A: uid=65534 from=<root>
Apr 21 08:19:14 node-e postfix/cleanup[801867]: 6F393320E3A: message-id=<20220421111914.294F9320E25@node-e.mydom.lan>
Apr 21 08:19:14 node-e postfix/qmgr[1606]: 6F393320E3A: from=<root@node-e.mydom.lan>, size=667, nrcpt=1 (queue active)
Apr 21 08:19:14 node-e postfix/local[801869]: 294F9320E25: to=<root@node-e.mydom.lan>, orig_to=<root>, relay=local, delay=0.3, delays=0.01/0/0/0.28, dsn=2.0.0, status=sent (delivered to command: /usr/bin/pvemailforward)
Apr 21 08:19:14 node-e postfix/qmgr[1606]: 294F9320E25: removed
Apr 21 08:19:14 node-e postfix/smtp[801890]: connect to smtp.gmail.com[2800:3f0:4003:c01::6c]:587: Network is unreachable
Apr 21 08:19:17 node-e postfix/smtp[801890]: 6F393320E3A: to=<my.address@gmail.com>, relay=smtp.gmail.com[64.233.190.109]:587, delay=2.7, delays=0/0.03/1.5/1.1, dsn=2.0.0, status=sent (250 2.0.0 OK  1650539957 ga18-20020a056870ee1200b000e602e45cf8sm955005oab.42 - gsmtp)
Apr 21 08:19:17 node-e postfix/qmgr[1606]: 6F393320E3A: removed

node-f (sends mails correctly):
Code:
Apr 21 08:19:14 node-f postfix/pickup[2410053]: 2DEC5240A40: uid=0 from=<root>
Apr 21 08:19:14 node-f postfix/cleanup[2430213]: 2DEC5240A40: message-id=<20220421111914.2DEC5240A40@node-f.mydom.lan>
Apr 21 08:19:14 node-f postfix/qmgr[1707]: 2DEC5240A40: from=<root@node-f.mydom.lan>, size=493, nrcpt=1 (queue active)
Apr 21 08:19:14 node-f postfix/pickup[2410053]: 893AC240AA5: uid=65534 from=<root>
Apr 21 08:19:14 node-f postfix/cleanup[2430213]: 893AC240AA5: message-id=<20220421111914.2DEC5240A40@node-f.mydom.lan>
Apr 21 08:19:14 node-f postfix/qmgr[1707]: 893AC240AA5: from=<root@node-f.mydom.lan>, size=683, nrcpt=1 (queue active)
Apr 21 08:19:14 node-f postfix/local[2430215]: 2DEC5240A40: to=<root@node-f.mydom.lan>, orig_to=<root>, relay=local, delay=0.4, delays=0.03/0.01/0/0.36, dsn=2.0.0, status=sent (delivered to command: /usr/bin/pvemailforward)
Apr 21 08:19:14 node-f postfix/qmgr[1707]: 2DEC5240A40: removed
Apr 21 08:19:14 node-f postfix/smtp[2430219]: connect to smtp.gmail.com[2800:3f0:4003:c01::6c]:587: Network is unreachable
Apr 21 08:19:17 node-f postfix/smtp[2430219]: 893AC240AA5: to=<my.address@gmail.com>, relay=smtp.gmail.com[64.233.190.109]:587, delay=3.2, delays=0.01/0.03/1.5/1.6, dsn=2.0.0, status=sent (250 2.0.0 OK  1650539957 g8-20020a056830160800b0060548e5f69csm5132865otr.2 - gsmtp)
Apr 21 08:19:17 node-f postfix/qmgr[1707]: 893AC240AA5: removed

node-g (sends mails correctly):
Code:
Apr 21 08:19:14 node-g postfix/pickup[1687848]: 2F22B21014: uid=0 from=<root>
Apr 21 08:19:14 node-g postfix/cleanup[1706426]: 2F22B21014: message-id=<20220421111914.2F22B21014@node-g.mydom.lan>
Apr 21 08:19:14 node-g postfix/qmgr[1704]: 2F22B21014: from=<root@node-g.mydom.lan>, size=479, nrcpt=1 (queue active)
Apr 21 08:19:14 node-g postfix/pickup[1687848]: 8165A21029: uid=65534 from=<root>
Apr 21 08:19:14 node-g postfix/cleanup[1706426]: 8165A21029: message-id=<20220421111914.2F22B21014@node-g.mydom.lan>
Apr 21 08:19:14 node-g postfix/qmgr[1704]: 8165A21029: from=<root@node-g.mydom.lan>, size=664, nrcpt=1 (queue active)
Apr 21 08:19:14 node-g postfix/local[1706428]: 2F22B21014: to=<root@node-g.mydom.lan>, orig_to=<root>, relay=local, delay=0.37, delays=0.03/0.01/0/0.33, dsn=2.0.0, status=sent (delivered to command: /usr/bin/pvemailforward)
Apr 21 08:19:14 node-g postfix/qmgr[1704]: 2F22B21014: removed
Apr 21 08:19:14 node-g postfix/smtp[1706432]: connect to smtp.gmail.com[2800:3f0:4003:c01::6c]:587: Network is unreachable
Apr 21 08:19:19 node-g postfix/smtp[1706432]: 8165A21029: to=<my.address@gmail.com>, relay=smtp.gmail.com[64.233.190.109]:587, delay=4.8, delays=0/0.03/1.9/2.9, dsn=2.0.0, status=sent (250 2.0.0 OK  1650539959 t22-20020a4a8256000000b003332a0402f5sm7745240oog.23 - gsmtp)
Apr 21 08:19:19 node-g postfix/qmgr[1704]: 8165A21029: removed

I observe that all the nodes that send mail correctly have a line that says delivered to command: /usr/bin/pvemailforward, conversely, hosts that do not forward mail correctly do not mention pvemailforward, and instead refer to node-a.

I think that's where the issue lies
Thanks for any help you can give me.
 
I think that's where the issue lies
sounds sensible
could you please diff the postfix configs (/etc/postfix/main.cf and /etc/postfix/master.cf) between 2 nodes (one where it works, one where it does not work)

Finally, the .forward file exists in the root directory of all nodes, and the content is the same:
I assume this is still the case and also that the files have the same permissions/owner/etc.?
 
Ok.
I see that the /etc/postfix/main.cf are the same on first four nodes are equals (only the node-a sends mails correctly, and the others three do not); the others last three nodes, have differences:
Code:
diff node-a/main.cf node-b/main.cf
(nothing)

diff node-a/main.cf node-c/main.cf
(nothing)

diff node-a/main.cf node-d/main.cf
(nothing)

diff node-a/main.cf node-e/main.cf
3c3
< myhostname=node-a.urgencias.local
---
> myhostname=node-e.urgencias.local
16c16
< mydestination = $myhostname, node-a.urgencias.local, localhost.urgencias.local, localhost
---
> mydestination = $myhostname, node-e.urgencias.local, localhost.urgencias.local, localhost


The /etc/postfix/master.cf diffs:
diff node-a/master.cf node-b/master.cf
11d10
< smtp inet n - - - - smtpd
29,30c28,30
< pickup fifo n - - 60 1 pickup
< cleanup unix n - - - 0 cleanup
---
> smtp inet n - y - - smtpd
> pickup fifo n - y 60 1 pickup
> cleanup unix n - y - 0 cleanup
33,39c33,39
< tlsmgr unix - - - 1000? 1 tlsmgr
< rewrite unix - - - - - trivial-rewrite
< bounce unix - - - - 0 bounce
< defer unix - - - - 0 bounce
< trace unix - - - - 0 bounce
< verify unix - - - - 1 verify
< flush unix n - - 1000? 0 flush
---
> tlsmgr unix - - y 1000? 1 tlsmgr
> rewrite unix - - y - - trivial-rewrite
> bounce unix - - y - 0 bounce
> defer unix - - y - 0 bounce
> trace unix - - y - 0 bounce
> verify unix - - y - 1 verify
> flush unix n - y 1000? 0 flush
42,43c42
< smtp unix - - - - - smtp
< relay unix - - - - - smtp
---
> smtp unix - - y - - smtp
45,48c44,48
< showq unix n - - - - showq
< error unix - - - - - error
< retry unix - - - - - error
< discard unix - - - - - discard
---
> relay unix - - y - - smtp
> showq unix n - y - - showq
> error unix - - y - - error
> retry unix - - y - - error
> discard unix - - y - - discard
51,53c51,52
< lmtp unix - - - - - lmtp
< anvil unix - - - - 1 anvil
< scache unix - - - - 1 scache
---
> lmtp unix - - y - - lmtp
> anvil unix - - y - 1 anvil
66a66
> scache unix - - y - 1 scache

On all nodes:
Code:
cat /root/.forward
|/usr/bin/pvemailforward


But finally, I was able to solve the problem, correcting the line with myhostname=... of the three nodes that were not sending mail.
this is one of those problems to which they apply the expression (in Spanish language): "the turtle escaped me"
I have a doubt though: shouldn't the $myhostname variable "cover" the hostname of the corresponding node?

Anyway, thank you very much for the help, Stoiko Ivanov.
 
  • Like
Reactions: Stoiko Ivanov