Extracttext: error

yanfei

Active Member
Mar 7, 2019
15
3
43
36
How to check whether the configuration of tesseract is correct in proxmox? At present, I can always find warning information in the log, but I can get information by manually executing tesseract. If I don't specify the language, it will be garbled. When I manually specify the language as simplified Chinese, it can be correctly recognized. How to configure the execution parameters of tesseract in PMG?

Dec 2 09:53:31 sh-pmg pmg-smtp-filter[195600]: WARNING: extracttext: error (1) from /usr/bin/tesseract: Premature end of JPEG file

test execute Command:
tesseract wechat.png output
tesseract wechat.png output -l chi_sim
1733114926826.png
1733114957285.png
1733114995128.png
1733115028754.png
wechat.png
1733119716723.png
 
Last edited:
I found that the /etc/mail/spamassassin/v400.pre configuration file contains extracttext configuration information. I manually modified the tesseract default command to extracttext_external tesseract {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -l chi_sim+eng -c page_separator= {} -, is it possible to do this? I have installed tesseract in both Simplified Chinese and Traditional Chinese.
1733123165699.png
 
Dec 2 09:53:31 sh-pmg pmg-smtp-filter[195600]: WARNING: extracttext: error (1) from /usr/bin/tesseract: Premature end of JPEG file
on a hunch - how large is the email (go by the information from the journal/mail-log? and how large is the max spamsize you've set (GUI->Configuration->Spam Detector->Options) ? - ensure that max-spamsize is as large as your general mail-size (we'll improve this in one of the next versions as this seems to cause issues for more users)
 
I used the default parameters of 262144, and it seems that the error message size does exceed these sizes.
on a hunch - how large is the email (go by the information from the journal/mail-log? and how large is the max spamsize you've set (GUI->Configuration->Spam Detector->Options) ? - ensure that max-spamsize is as large as your general mail-size (we'll improve this in one of the next versions as this seems to cause issues for more users)
 
I used the default parameters of 262144, and it seems that the error message size does exceed these sizes.
set it to your Message Size (GUI->Configuration->Mail Proxy->Options) - and see if it works then.
 
set it to your Message Size (GUI->Configuration->Mail Proxy->Options) - and see if it works then.
OK, I'll give it a try.
I modified the default configuration file /etc/mail/spamassassin/v400.pre to
extracttext_external tesseract {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -l chi_sim+eng -c page_separator= {} -
Is it possible? I hope to recognize Chinese.
 
please try it first only with the size setting - if this does not work - adapt your modification and see if this helps
 
  • Like
Reactions: yanfei

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!