Extracttext: error

yanfei

Active Member
Mar 7, 2019
17
3
43
37
How to check whether the configuration of tesseract is correct in proxmox? At present, I can always find warning information in the log, but I can get information by manually executing tesseract. If I don't specify the language, it will be garbled. When I manually specify the language as simplified Chinese, it can be correctly recognized. How to configure the execution parameters of tesseract in PMG?

Dec 2 09:53:31 sh-pmg pmg-smtp-filter[195600]: WARNING: extracttext: error (1) from /usr/bin/tesseract: Premature end of JPEG file

test execute Command:
tesseract wechat.png output
tesseract wechat.png output -l chi_sim
1733114926826.png
1733114957285.png
1733114995128.png
1733115028754.png
wechat.png
1733119716723.png
 
Last edited:
I found that the /etc/mail/spamassassin/v400.pre configuration file contains extracttext configuration information. I manually modified the tesseract default command to extracttext_external tesseract {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -l chi_sim+eng -c page_separator= {} -, is it possible to do this? I have installed tesseract in both Simplified Chinese and Traditional Chinese.
1733123165699.png
 
Dec 2 09:53:31 sh-pmg pmg-smtp-filter[195600]: WARNING: extracttext: error (1) from /usr/bin/tesseract: Premature end of JPEG file
on a hunch - how large is the email (go by the information from the journal/mail-log? and how large is the max spamsize you've set (GUI->Configuration->Spam Detector->Options) ? - ensure that max-spamsize is as large as your general mail-size (we'll improve this in one of the next versions as this seems to cause issues for more users)
 
I used the default parameters of 262144, and it seems that the error message size does exceed these sizes.
on a hunch - how large is the email (go by the information from the journal/mail-log? and how large is the max spamsize you've set (GUI->Configuration->Spam Detector->Options) ? - ensure that max-spamsize is as large as your general mail-size (we'll improve this in one of the next versions as this seems to cause issues for more users)
 
I used the default parameters of 262144, and it seems that the error message size does exceed these sizes.
set it to your Message Size (GUI->Configuration->Mail Proxy->Options) - and see if it works then.
 
set it to your Message Size (GUI->Configuration->Mail Proxy->Options) - and see if it works then.
OK, I'll give it a try.
I modified the default configuration file /etc/mail/spamassassin/v400.pre to
extracttext_external tesseract {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -l chi_sim+eng -c page_separator= {} -
Is it possible? I hope to recognize Chinese.
 
please try it first only with the size setting - if this does not work - adapt your modification and see if this helps
 
  • Like
Reactions: yanfei