queue file write error

XN-Matt · Mar 9, 2021

Getting this error in the logs.

Mar 8 22:36:10 xxx postfix/smtpd[19218]: proxy-reject: END-OF-MESSAGE: 451 4.3.0 Error: queue file write error; from=<xxxx@xxxx> to=<xxxx@xxxx> proto=ESMTP helo=<xxxxxxx>

Along with

Mar 8 22:36:10 xxx postfix/smtpd[19218]: warning: timeout talking to proxy 127.0.0.1:10024

There are no other log entries in syslog/mail log which would help and no change to the files outside of what is set in the GUI.

Happening with three emails and both have a number of attachments. All other mail seems to go through without issue.

Is there anyway to debug this further?

(Have checked the one other thread matched on there but turning off bayes/awl makes no difference)

dcsapak · Mar 9, 2021

is your disk/fs ok? anything else in the syslog/journal that might give a hint?

XN-Matt · Mar 9, 2021

Disk is fine. This error is happening on all four of our PMG servers but only for these messages (with attachments).

Every single other message is fine and sometimes it gets through but as it reports back to the sending server the error, it is resent multiple times.

Stoiko Ivanov · Mar 9, 2021

please share the logs around the message from postfix with the error message
I guess pmg-smtp-filter should log an message which might help hunting this down....

XN-Matt · Mar 9, 2021

Code:

Mar  9 10:34:16 xxx postfix/postscreen[349]: CONNECT from [x.x.x.x]:12313 to [x.x.x.x]:25
Mar  9 10:34:16 xxx postfix/postscreen[349]: PASS OLD [x.x.x.x]:12313
Mar  9 10:34:16 xxx postfix/smtpd[612]: connect from xxx.xxx.xxx[x.x.x.x]
Mar  9 10:34:16 xxx pmgpolicy[30521]: reloading configuration Proxmox_ruledb
Mar  9 10:34:16 xxx pmgpolicy[30521]: SPF says pass
Mar  9 10:34:16 xxx postfix/smtpd[612]: NOQUEUE: client=xxx.xxx.xxx[x.x.x.x]
Mar  9 10:34:16 xxx pmg-smtp-filter[940]: Starting "1" children
Mar  9 10:34:16 xxx pmg-smtp-filter[30875]: 2021/03/09-10:34:16 CONNECT TCP Peer: "[127.0.0.1]:51010" Local: "[127.0.0.1]:10024"
Mar  9 10:34:16 xxx pmg-smtp-filter[30875]: 4204F60474F288C100: new mail message-id=<0b9272ff-38d6-88ca-1e42-dbdf5fef6f1c@xxx.xxx.xxx>#012
Mar  9 10:35:47 xxx postfix/smtpd[350]: warning: timeout talking to proxy 127.0.0.1:10024
Mar  9 10:35:47 xxx postfix/smtpd[350]: proxy-reject: END-OF-MESSAGE: 451 4.3.0 Error: queue file write error; from=<them@them.com> to=<me@me.com> proto=ESMTP helo=<xxx.xxx.xxx>
Mar  9 10:35:52 xxx postfix/cleanup[722]: 2779742050: message-id=<20210309103552.2779742050@xxx.xxx.xxx>
Mar  9 10:35:52 xxx postfix/smtpd[350]: disconnect from xxx.xxx.xxx[x.x.x.x] ehlo=2 starttls=1 mail=1 rcpt=1 data=0/1 quit=1 commands=6/7

This is very much the same on all servers and there are no other log entries in any other logs.

Between it entering the filter and rejecting, seems to be 30 seconds. Could it be just timing out to some pre-determined limit?

Stoiko Ivanov · Mar 9, 2021

* try restarting pmg-smtp-filter `systemctl restart pmg-smtp-filter`
* check the running processes (`ps auxwf`)
* check what's listening on 10024 (and 10023, 10025): `ss -tlnap`

I hope this helps!

XN-Matt · Mar 9, 2021

Have already tried restarting and can confirm it is running as other mail is just fine.

Also, these messages sometimes come through without error so it appears they may complete quicker.

Is there anyway to add more logging to the filter to know what is happening as for the ones rejected, we're being re-sent these at regular intervals as they are rejected at SMTP level.

Stoiko Ivanov · Mar 9, 2021

XN-Matt said:
Is there anyway to add more logging to the filter to know what is happening as for the ones rejected, we're being re-sent these at regular intervals as they are rejected at SMTP level.

no - not really - although when pmg-smtp-filter has a problem with a message it usually logs this to the journal...
Just to be on the safe side - the logs you pasted are from the system-journal? (in case the error is happening somewhere else - the journal is probably the place where you get the best overview)

XN-Matt · Mar 9, 2021

The logs were from mail.log. The syslog gives nothing else useful.

Outside what I posted above, syslog shows the following around the same time period:

Code:

Mar  9 10:35:16 xxxx pmg-smtp-filter[940]: starting database maintainance
Mar  9 10:35:16 xxxx pmg-smtp-filter[940]: end database maintainance (17 ms)
Mar  9 10:35:27 xxxx systemd[1]: Started Session 1421 of user root.
Mar  9 10:35:27 xxxx systemd[1]: session-1421.scope: Succeeded.
Mar  9 10:35:27 xxxx systemd[1]: Started Session 1422 of user root.
Mar  9 10:35:28 xxxx systemd[1]: session-1422.scope: Succeeded.
Mar  9 10:35:28 xxxx systemd[1]: Started Session 1423 of user root.
Mar  9 10:35:28 xxxx systemd[1]: session-1423.scope: Succeeded.
Mar  9 10:35:29 xxxx systemd[1]: Started Session 1424 of user root.
Mar  9 10:35:29 xxxx systemd[1]: session-1424.scope: Succeeded.
Mar  9 10:35:29 xxxx pmgmirror[888]: starting cluster syncronization
Mar  9 10:35:30 xxxx pmgmirror[888]: cluster syncronization finished  (0 errors, 0.80 seconds (files 0.49, database 0.31, config 0.00))
Mar  9 10:35:31 xxxx systemd[1]: Started Session 1425 of user root.
Mar  9 10:35:31 xxxx systemd[1]: session-1425.scope: Succeeded.
Mar  9 10:35:31 xxxx systemd[1]: Started Session 1426 of user root.
Mar  9 10:35:31 xxxx systemd[1]: session-1426.scope: Succeeded.

If you merge those two sets together, you'd get what is shown in syslog.

One thing these messages contain are a number of PDFs. One is around 5MB in size and the other 3MB (total). From different senders and different source mail servers.

XN-Matt · Mar 9, 2021

I've taken the attachments from one of the emails, sent them to myself from Gmail and the same problem is happening.

Found some more logs and it appears it is just the time taken to scan. Whilst it is rejecting the message to the sender via 451, it still processes and accepts the message. Leading to the other end constantly resending it.

Code:

Mar  9 16:00:36 xxx postfix/smtpd[18951]: warning: timeout talking to proxy 127.0.0.1:10024
Mar  9 16:00:36 xxx postfix/smtpd[18951]: proxy-reject: END-OF-MESSAGE: 451 4.3.0 Error: queue file write error; from=<from@from.com> to=<me@me.com> proto=ESMTP helo=<mail-lf1-f48.google.com>
Mar  9 16:00:36 xxx postfix/cleanup[19182]: CCFF0204CB: message-id=<20210309160036.CCFF0204CB@xxxx.xxxx.xxxx>
Mar  9 16:00:36 xxx postfix/smtpd[18951]: disconnect from mail-lf1-f48.google.com[209.85.167.48] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=1/2 quit=1 commands=7/8
Mar  9 16:00:36 xxx postfix/qmgr[928]: CCFF0204CB: from=<double-bounce@xxxx.xxxx.xxxx>, size=1360, nrcpt=1 (queue active)
Mar  9 16:00:36 xxx postfix/cleanup[19182]: CF1AB204EC: message-id=<20210309160036.CCFF0204CB@xxxx.xxxx.xxxx>
Mar  9 16:00:36 xxx postfix/qmgr[928]: CF1AB204EC: from=<double-bounce@xxxx.xxxx.xxxx>, size=1503, nrcpt=1 (queue active)
Mar  9 16:00:36 xxx postfix/local[19183]: CCFF0204CB: to=<postmaster@xxxx.xxxx.xxxx>, orig_to=<postmaster>, relay=local, delay=0.02, delays=0.01/0.01/0/0, dsn=2.0.0, status=sent (forwarded as CF1AB204EC)
Mar  9 16:00:36 xxx postfix/qmgr[928]: CCFF0204CB: removed
Mar  9 16:00:38 xxx postfix/smtp[19184]: CF1AB204EC: to=<hostmaster@xxxx.xxxx.xxxx>, orig_to=<postmaster>, relay=incoming.appmail.custserv.net[95.215.174.88]:25, delay=1.2, delays=0/0.02/1.1/0.13, dsn=2.0.0, status=sent (250 OK)
Mar  9 16:00:38 xxx postfix/qmgr[928]: CF1AB204EC: removed
Mar  9 16:00:56 xxx pmg-smtp-filter[929]: starting database maintainance
Mar  9 16:00:56 xxx pmg-smtp-filter[929]: end database maintainance (3 ms)
Mar  9 16:01:31 xxx pmg-smtp-filter[32102]: 61B3560479B4090F49: SA score=2/5 time=4.687 bayes=undefined autolearn=no autolearn_force=no hits=DKIM_ADSP_CUSTOM_MED(0.001),DKIM_INVALID(0.1),DKIM_SIGNED(0.1),FREEMAIL_FROM(0.001),KAM_DMARC_STATUS(1),NML_ADSP_CUSTOM_MED(1.2),RCVD_IN_DNSWL_NONE(-0.0001),RCVD_IN_MSPIKE_H2(-0.1),SPF_HELO_NONE(0.001),SPF_PASS(-0.001),TVD_SPACE_RATIO(0.001),T_FREEMAIL_DOC_PDF(0.01)
Mar  9 16:01:31 xxx postfix/smtpd[19347]: connect from localhost.localdomain[127.0.0.1]
Mar  9 16:01:31 xxx postfix/smtpd[19347]: 9ECDE204CB: client=localhost.localdomain[127.0.0.1], orig_client=mail-lf1-f48.google.com[209.85.167.48]
Mar  9 16:01:31 xxx postfix/cleanup[19182]: 9ECDE204CB: message-id=<CAHLmKBJG1pmR0JRGyj4EGs+ep-krzhsESV-cXEphjpK8To-E=Q@mail.gmail.com>
Mar  9 16:01:31 xxx postfix/qmgr[928]: 9ECDE204CB: from=<from@from.com>, size=5452950, nrcpt=1 (queue active)
Mar  9 16:01:31 xxx pmg-smtp-filter[32102]: 61B3560479B4090F49: accept mail to <me@me.com> (9ECDE204CB) (rule: Quarantine/Mark Spam (Level 2))
Mar  9 16:01:31 xxx postfix/smtpd[19347]: disconnect from localhost.localdomain[127.0.0.1] ehlo=1 xforward=1 mail=1 rcpt=1 data=1 commands=5
Mar  9 16:01:31 xxx pmg-smtp-filter[32102]: 61B3560479B4090F49: processing time: 155.149 seconds (4.687, 150.167, 0)

It seems the smtpd/filter will wait 30 seconds and then causes this issue.

I would say it is a bug because if the timeout occurs, it really needs to either accept the message as it is doing and deliver, or bounce... but not both. So far, I have received these "failed" messages over 30 times and with apparent way to stop them.

Stoiko Ivanov · Mar 9, 2021

XN-Matt said:
Mar 9 16:01:31 xxx pmg-smtp-filter[32102]: 61B3560479B4090F49: processing time: 155.149 seconds (4.687, 150.167, 0)

155 seconds is really too long - the 150 seconds should be the time ClamAV needs for processing the mail (which is something I have not seen until now, short of a zip-bomb (and there were quite a few fixes to clamav for those)

* could you post the output of `pmgversion -v` (to see if your clamav installation is up to date)?
* do you have such a mail - could you share it (you can send it via e-mail to me if preferred)

XN-Matt · Mar 9, 2021

Code:

pmg-api: 6.3-5
pmg-gui: 2.3-1
pve-kernel-5.4: 6.3-6
pve-kernel-helper: 6.3-6
pve-kernel-5.4.101-1-pve: 5.4.101-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.30-1-pve: 5.4.30-1
clamav-daemon: 0.102.4+dfsg-0+deb10u1
libarchive-perl: 3.3.3-1
libjs-extjs: 6.0.1-10
libjs-framework7: 4.4.7-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-http-server-perl: 3.1-1
libxdgmime-perl: 0.01-5
lvm2: 2.03.02-3
pmg-docs: 6.3-1
pmg-log-tracker: 2.1.8-1
postgresql-11: 11.10-0+deb10u1
proxmox-mini-journalreader: 1.1-1
proxmox-spamassassin: 3.4.4-3
proxmox-widget-toolkit: 2.4-5
pve-firmware: 3.2-2
pve-xtermjs: 4.7.0-3
zfsutils-linux: 2.0.3-pve2

It seems it is only the attachments but I can't share them as they're financial related. I suspect however it is solely the size.

I've tuned down the "max scan size" to less than these emails are and it appears that has at least accepted those without sending a 451 but it's probably not the best solution longer term.

Stoiko Ivanov · Mar 9, 2021

XN-Matt said:
It seems it is only the attachments but I can't share them as they're financial related. I suspect however it is solely the size.

but 3 minutes for 8MB still seem quite a bit off (clamav might not be the fastest, but this sounds too much)

the versions look current - so at least it's not a known issue with files

could you maybe try to scan the attachments with `time clamdscan --verbose <filename>` on the PMG (you need to copy them there via e.g. scp)
(maybe also export the mail as .eml and scan that as well)

XN-Matt · Mar 9, 2021

Sure. I'll come back to you once that testing is done.

XN-Matt · Mar 9, 2021

@Stoiko Ivanov, completed and certainly two of the PDFs took a very long time.

Code:

root@xxxx:/tmp# time clamdscan --verbose *.pdf
/tmp/xxx.pdf: OK 5 sec 230k
/tmp/xxx.pdf: OK 3.3 sec 249k
/tmp/xxx.pdf: OK 0.4 sec 91k
/tmp/xxx.pdf: OK 1.7 sec 128k
/tmp/xxx.pdf: OK 1.7 sec 158k
/tmp/xxx.pdf: OK 156 sec 1.29m
/tmp/xxx.pdf: OK 46 sec 1.83m

----------- SCAN SUMMARY -----------
Infected files: 0
Time: 216.640 sec (3 m 36 s)

real    3m36.644s
user    0m0.000s
sys    0m0.005s

The email itself was very quick but have added each respective file and the time for each on a second forced run.

Stoiko Ivanov · Mar 9, 2021

XN-Matt said:
/tmp/xxx.pdf: OK 156 sec 1.29m

we have our culprit...

* do you have any particular customized clamAV config?
* any particular messages from clamAV in your logs?
* anything particularly interesting about that pdf?

I would like to reproduce that - so that we can find out what would be the best behaviour of pmg-smtp-filter in that situation.

XN-Matt · Mar 9, 2021

Nothing changed in the CAV config apart from lowering the file sized noted earlier.

Nothing in logs.

PDF is just a big file. I can actually send this one to you as it is just a standard doc if that would be of use?

Stoiko Ivanov · Mar 10, 2021

XN-Matt said:
PDF is just a big file. I can actually send this one to you as it is just a standard doc if that would be of use?

if it's fine with you - please (s.ivanov _at_ proxmox.com)

Thanks

XN-Matt · Mar 15, 2021

That should be with you shortly.

In the absence of this though, is there a better way that PMG can deal with big files which could cause a timeout? Sending a fail but then also delivering the message seems a bit flawed.

Stoiko Ivanov · Mar 17, 2021

XN-Matt said:
That should be with you shortly.

Thanks - I received it.

XN-Matt said:
In the absence of this though, is there a better way that PMG can deal with big files which could cause a timeout? Sending a fail but then also delivering the message seems a bit flawed.

I'll try to reproduce the issue here locally and maybe come up with a temporary mitigation if applicable

queue file write error

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

We value your privacy