Pipelining Error - Help

For reference, I added an SSD to the VMWare host, and moved the disk to that SSD, the only VM on that disk.
The results were the same.
Fast forward to today, I have had this happen now 3 days in a row where basically every email (10K+) each time before we notice, is getting this error.

I have no disk space issue, performance does seem to look good, CPU, Memory & Disk all look good.
While every message is getting the error, I stop postfix.
Wait... everything is quiet and I turn it back on, at which point it immediately gets write errors, just where it left off.
If I reboot, we're goof for around 24 hours (its happening more consistently).

I read another thread https://forum.proxmox.com/threads/queue-file-write-error.85569/page-2#post-640655
They had some settings they found to work in their situation:
smtpd_proxy_timeout = 600
lmtp_data_done_timeout = 600

I did that yesterday, and today, the same 12K write errors.

Any ideas or suggestions, this is highly unreliable....
Thanks!
 
Something I noticed now in further researching.
Memory suddenly was on the rise, the white is where I rebooted.

I went back and looked at the week (also attached), so I can see yesterday which also had the same issue, and same peak.
could this be the queue-file-write-error that it's out of RAM?
Possible memory leak? because its not like we received 10K messages all at once.
FYI added another 2GB of ram, lets see if it quiets down over the next few days.


MGW - 3-7-34 - right before issue.png
 

Attachments

  • MGW - 3-7-34 - week view.png
    MGW - 3-7-34 - week view.png
    193.9 KB · Views: 1
Digging a little further if I overlay SWAP usage (red), over memory usage, this looks like it could be the issue.
Swap is out, then Memory is out, queue write error.... now even if you stop postfix, at that point everything is trashed, so you don't get back memory and the only way to get things working again is reboot.

Red overlay is not the same amount of memory for swap as RAM, but its at the same time, and peaks to no more swap.

My Guess is, someone is sending a large file to literally 30+ people at one time, while the math doesn't make sense 20MB email, 35 recipients, shouldn't kill all the resources, but as I started the sentence, it's a guess.

@dcsapak Can you share your thoughts on this idea?

Also, what is recommended SWAP to Memory ratio?

1709880861681.png
 
Last edited:
I had similar situation on Monday, same situation with swap and RAM. I temporarily resolved it by lowering "Max file size" and "Max scan size" in virus detector to minimum (1MB).

Few months ago I managed to get one file that caused problems as attachment and it was 20MB big .pdf. Nothing special with it, but when Clamav tried to scan it, it ate all server's CPU. As I didn't really have the time nor knowledge to troubleshoot any further, I just lowered the "scan size" and "max file size" and then it passed trough successfully.
But on Monday we had the same situation, so i set those values to minimum and now i have no room for additional lowering.

I believe deleting some virus signatures will improve performance. If that doesn't help, I plan to reinstall server and reconfigure it to more default rules ...
 
Last edited:
I had similar situation on Monday, same situation with swap and RAM. I temporary resolved it by lowering "Max file size" and "Max scan size" in virus detector to minimum (1MB).

Few months I managed to get one file that caused problems as attachment and it was 20MB big .pdf. Nothing special with it, but when Clamav tried to scan it, it ate all server's CPU. As I didn't really have the time nor knowledge to troubleshoot any further, I just lowered the "scan size" and "max file size" and then it passed trough successfully.
But on Monday we had the same situation, so i set those values to minimum and now i have no room for additional lowering.

I believe deleting some virus signatures will improve performance. If that doesn't help, i plan to reinstall server and reconfigure it to more default rules ...
Thanks for the information, thats also a logical reason. I will try it to see how it differs on the usage.
 
This continues to happen even after the additional memory.
I had already lowered the scan size for AV, but not sure thats the issue because the math doesn't add up.
I do see however that for at least 1 hour after the error, cache never clears.
Any ideas, this is becoming a regular event?
Image showing cache not clearing.
For reference you can see the entire day with lots of email messages used ZERO swap space, but when the email with attachments hits shown by the peak on both memory and swap, swap never returns to zero.
@dcsapak

11.03.2024_23.50.45_REC.png
 
so what kind of attachment is it? any chance of uploading it somewhere (if it's not private/confidential of course)

would it be possible for you to monitor the memory usage of the processes so we get a better picture which process uses that memory?

it certainly isn't expected behaviour for 'normal' sized attachments
 
Hello,

Update:
1st, reducing the size of the AV scan to 10MB instead of 50MB did NOT solve any issues.
As you can see from the attached image, it happened again reflecting the peaks and of course write errors again.
As I may have mentioned before, if those emails get the write error or when the swap gets used in a peak fashion, the swap is not cleared for an hour, not sure how that affects any of the process or why it wouldn't clear once everything failed.

2nd I did add a little more RAM, it just allowed the peak to be a little higher, but still crashed.
Today I went overboard and bumped up swap to 16GB and we can see how that works out.

Lastly, in a 10 minute window 700 emails got write errors, that a bit of a peak since the server usually doesn't get that many in 10 minutes.

Now I can't pinpoint it to a specific email, but at the same time that it happens there are some users who send a 11MB excel document as well as a few images in the email itself, it looks like at the email level the total is 25MB and there can be between 15-30 recipients.

I simply don't know what is being written to swap and ram to peak 10+GB, the math of the above simply doesn't add up.

I added a in/out graph as well with mail count.
 

Attachments

  • 14.03.2024_05.16.18_REC.png
    14.03.2024_05.16.18_REC.png
    171.3 KB · Views: 2
  • 14.03.2024_05.41.54_REC.png
    14.03.2024_05.41.54_REC.png
    35.1 KB · Views: 2
Last edited:
@dcsapak

Another Update:
I can see the peaks, but because I got carried away with swap space it initially looks like we have gone through 2 of those peaks.
I still have a concern however that a 25MB email sent to 20 people will consume 11GB memory and 5.4GB swap....
This isn't adding up.

Besides the huge amount of memory used for this, why does swap NOT clear since the messages are delivered within a minute or so?


15.03.2024_17.15.29_REC.png
 
Last edited:
as i said
would it be possible for you to monitor the memory usage of the processes so we get a better picture which process uses that memory?

we cannot try to fix the memory usage, if we don't know which process/part consumes the memory
 
@dcsapak
I think there are 2 questions here:
First why would 20 emails with a 25MB attachment cause this over the top usage?
I don't believe the math adds up here.

Secondly, how can we figure out what process is causing this?
This is basically an out of box install, and simply adding rules, so SpamAssasin, CLAM AV are pretty much at default.

Thank you!
 
I think there are 2 questions here:
First why would 20 emails with a 25MB attachment cause this over the top usage?
I don't believe the math adds up here.
that's what we want to find out, but first we have to know which processes are actually using the memory before determining the why
also, it would still be interesting what kind of attachments those are (maybe post/upload one if it's not confidential/private) at least some
basic info about filetype/content/etc. would be nice

Secondly, how can we figure out what process is causing this?
This is basically an out of box install, and simply adding rules, so SpamAssasin, CLAM AV are pretty much at default.
e.g. the most simple way is to regularly call something like 'ps avx' and append the output to a file,
for instance you could write a script that loops

Code:
ps avx >> /tmp/log.txt

every 10 seconds or so

also you could check the journal/syslog if there are oom events and post that (that should contain a bit of memory info)
 
that's what we want to find out, but first we have to know which processes are actually using the memory before determining the why
also, it would still be interesting what kind of attachments those are (maybe post/upload one if it's not confidential/private) at least some
basic info about filetype/content/etc. would be nice


e.g. the most simple way is to regularly call something like 'ps avx' and append the output to a file,
for instance you could write a script that loops

Code:
ps avx >> /tmp/log.txt

every 10 seconds or so

also you could check the journal/syslog if there are oom events and post that (that should contain a bit of memory info)
I thought I mentioned, that the email contains a spreadsheet (11MB) and a few images within the email.

For the script, do we want to filter anything out of that?
I can see its a big list of information.
How long do you want me to run this script for?

For the log are you referring to ?
/run/systemd/journal/syslog
 
Last edited:
For the script, do we want to filter anything out of that?
I can see its a big list of information.
How long do you want me to run this script for?
since we don't know what's using the memory, no filtering would be best... as for how long, it is enough when running during one of the spikes before the issue. there is no need for any long term data here..

For the log are you referring to ?
/run/systemd/journal/syslog
/var/log/syslog most often

but if you want to use the journal you can read it with the command 'journalctl'
 

Attachments

  • 3-21-2024.jpg
    3-21-2024.jpg
    243.3 KB · Views: 6
Last edited:
hi,

thanks for providing, i'll see that i take some time soon to look at it...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!