451 4.3.0 Error: queue file write error;

@Stoiko Ivanov what is the maximum recommended value for maxspamsize and what would happen if the maxspamsize value is lower then the message size? Will the message be rejected for delivery or just delivered without scanning?
I am OK with lowering the value if this means that the messages will still be delivered fine.

I think we have around max message size at 100MB on our exchange send/recive connectors that is why I think this value is set higher then default. Thanks again.
 
Last edited:
FYI, I run my PMG on vmware, after install open-vm-tools on it, the rate of hitting the error have reduced with the default smtpd_proxy_timeout setting.

1639096715790.png 1639096830598.png
 
FYI, I run my PMG on vmware, after install open-vm-tools on it, the rate of hitting the error have reduced with the default smtpd_proxy_timeout setting.

View attachment 32268 View attachment 32269

I already had open-vm-tools installed and we also are running on vmware (vSphere Client version 6.7.0.46000).
open-vm-tools is already the newest version (2:11.2.5-2).
VMware Tools: Running, version:11333 (Guest Managed)
 
what is the maximum recommended value for maxspamsize and what would happen if the maxspamsize value is lower then the message size?
difficult to say in general - as we see with this example SpamAssassin's runtime is (heavily) affected by this, but it's also dependent on the system where this is running (150 seconds vs. 90 seconds)

Will the message be rejected for delivery or just delivered without scanning?
neither the message will be truncated at maxspamsize - and the beginning of it will be passed to spamassassin for analysis - in my experience this does work quite well - usually many spam-markers are in the headers or the beginning of the body - and not in the 10th excel attachement of the mail).
From a quick glance at the source clamAV (and avast and custom script) scanning is not affected by the setting.

The one downside I'm aware of is - as mentioned before - if the message is passed to SpamAssassin truncated DKIM checks will fail (thus also cause false-positives with DMARC) - but it would be great if you could verify how this affects detection rate in your environment.
 
Hi everyone,

I'd like to reactivate this thread. A few months ago I "inherited" that PMG server from user poetry and as I can see, he never solved that problem or maybe that problem reoccured. It happens almost every day at least once. And it can happen with only one disconnect and success in the next try or it can continue with retries for a few days.

I was lucky and got in my hands a pdf file (17MB), which triggers that error.
The scenario is like this:

1.) I send that attachment from gmail

2.) logs:
Oct 25 22:56:36 myMXserver postfix/smtpd[1055]: connect from mail-yw1-f178.google.com[209.85.128.178]
Oct 25 22:56:36 myMXserver postfix/smtpd[1055]: Anonymous TLS connection established from mail-yw1-f178.google.com[209.85.128.178]: TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
Oct 25 22:56:37 myMXserver postfix/smtpd[1055]: NOQUEUE: client=mail-yw1-f178.google.com[209.85.128.178]
Oct 25 22:58:19 myMXserver postfix/smtpd[1055]: warning: timeout talking to proxy 127.0.0.1:10024
Oct 25 22:58:19 myMXserver postfix/smtpd[1055]: proxy-reject: END-OF-MESSAGE: 451 4.3.0 Error: queue file write error; from=<someone@gmail.com> to=<someone@mydomain.net> proto=ESMTP helo=<mail-yw1-f178.google.com>
Oct 25 22:58:19 myMXserver postfix/smtpd[1055]: disconnect from mail-yw1-f178.google.com[209.85.128.178] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=1/2 quit=1 commands=7/8

3.) few minutes later in logs:
Oct 25 23:01:43 myMXserver postfix/smtpd[1336]: connect from localhost.localdomain[127.0.0.1]
Oct 25 23:01:43 myMXserver postfix/smtpd[1336]: 5ECAD12523D: client=localhost.localdomain[127.0.0.1], orig_client=mail-yw1-f178.google.com[209.85.128.178]
Oct 25 23:01:43 myMXserver postfix/cleanup[1337]: 5ECAD12523D: message-id=<CABrEsR=0VHeBTNiqRQ=-9-oqxFY29BAjo3PfoB3JNvALNJVaSQ@mail.gmail.com>
Oct 25 23:01:44 myMXserver postfix/qmgr[1007]: 5ECAD12523D: from=<someone@gmail.com>, size=25103666, nrcpt=1 (queue active)
Oct 25 23:01:44 myMXserver postfix/smtpd[1336]: disconnect from localhost.localdomain[127.0.0.1] ehlo=1 xforward=1 mail=1 rcpt=1 data=1 commands=5
Oct 25 23:01:44 myMXserver postfix/smtp[1338]: Untrusted TLS connection established to [10.10.10.10]:25: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)
Oct 25 23:01:46 myMXserver postfix/smtp[1338]: 5ECAD12523D: to=<someone@mydomain.net>, relay=10.10.10.10[10.10.10.10]:25, delay=2.6, delays=0.91/0/0.03/1.7, dsn=2.6.0, status=sent (250 2.6.0 <CABrEsR=0VHeBTNiqRQ=-9-oqxFY29BAjo3PfoB3JNvALNJVaSQ@mail.gmail.com> [InternalId=158965329559560, Hostname=myexchange.local] 25105024 bytes in 1.663, 14740,729 KB/sec Queued mail for delivery)
Oct 25 23:01:46 myMXserver postfix/qmgr[1007]: 5ECAD12523D: removed

4.) Then I get a notification to postmaster@mydomain.net
....
In: MAIL FROM:<someone@gmail.com> SIZE=25087327
Out: 250 2.1.0 Ok
In: RCPT TO:<someone@mydomain.net>
Out: 250 2.1.5 Ok
In: BDAT 65536
Out: 250 2.0.0 Ok: 65536 bytes
In: BDAT 25021791 LAST
Out: 451 4.3.0 Error: queue file write error
In: QUIT
Out: 221 2.0.0 Bye

5.) About the same time the mail gest delivered.

6.) A few minutes later CPU usage gets very high, sometimes even up to 100%. When I sent that attachment few times in a row, even RAM usage got from 50% to 100%. The usage stays that high for a few minutes and then it goes down. Almost all the CPU is used by clamd:
1698306286940.png


7.) After some time (cca 30min), the logs recurr, but the mail is not delivered again. Anyway there is some user, that claims, he actually got message redelivered tens of times in the same day and in the next few days, but I can't confirm that


  • I added two cores to VM and reconfigured sockets, so now I have 6 cores, 6 sockets and 8GB of RAM. Nothing changed. Even the time from the beginning of the connection to time out is still the same - cca 1min 45s
  • I disabled custom scanning from ESET. It looks like the CPU is a bit relieved, but can't say for sure. So I put it back on.
  • The error is probably connected to size of attachment, as most of the errors shows attachment larger than 10MB

The next thing I am planning to do is rearranging additional antivirus definitions. We have added Sanesecurity databases (everything except high false-positives) and Securiteinfo. The virus hits are quite well distributed between Sanesecurity databases and Securiteinfo databases, but I have few questions about that. How is the sequence of checking in that databases determined? Is the first database in the table also checked the first? And when "custom" (ESET) does its checking?
I am thinking, that maybe Sanesecurity shows so much hits because it is the first to check and if I put them out, SecuriteInfo and ESET would still find everything. And If I do that, ClamAV would maybe be able to process everything in time.


Looking forward for your comments!

 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!