MX got timeout

plnt

Member
Jan 20, 2022
10
3
8
31
Hi guys,

We have a long-term problem with our MX cluster 8.2.0 because MX1 is getting constant timeouts. And not only MX1 but also other nodes in the cluster.
The website is slow and getting error messages with timeouts. Connecting via API where ANTISPAM is set on domains and adding and removing domains from transport also ends with timeouts. Adding anything via API is va banque.
I wanted to somehow increase the PHP Pool on MX servers for the proxy service but I couldn't do it. It only has 3 workers, which seems a bit low to me.

Example of error from API >

Bash:
[Thu Jul 10 08:03:03.887018 2025] [proxy_fcgi:error] [pid 2710157:tid 2710167] [remote 194.1.216.65:56741] AH01071: Got error 'PHP message: PHP Fatal error: Uncaught PVE2_Exception: Not logged into Proxmox host. No Login access ticket found or ticket expired. in /var/www/clients/client21/web65/web/class/pmg.php:147\nStack trace:\n#0 /var/www/clients/client21/web65/web/class/pmg.php(543): PMGService->action()\n#1 /var/www/clients/client21/web65/web/email_content_antispam.php(74): PMGService->get()\n#2 {main}\n thrown in /var/www/clients/client21/web65/web/class/pmg.php on line 147', referer: https://ourwebsite.sk/hosting-email

[Thu Jul 10 08:38:58.813208 2025] [proxy_fcgi:error] [pid 2710100:tid 2710108] [remote 195.160.182.66:51563] AH01071: Got error 'PHP message: PHP Fatal error: Uncaught PVE2_Exception: Not logged into Proxmox host. No Login access ticket found or ticket expired. in /var/www/clients/client21/web65/web/class/pmg.php:147\nStack trace:\n#0 /var/www/clients/client21/web65/web/class/pmg.php(543): PMGService->action()\n#1 /var/www/clients/client21/web65/web/email_content_antispam.php(74): PMGService->get()\n#2 {main}\n thrown in /var/www/clients/client21/web65/web/class/pmg.php on line 147', referer: https://ourwebsite.sk/hosting-email

[Thu Jul 10 08:38:58.817329 2025] [proxy_fcgi:error] [pid 1733339:tid 1733347] [remote 176.116.114.29:14378] AH01071: Got error 'PHP message: PHP Fatal error: Uncaught PVE2_Exception: Not logged into Proxmox host. No Login access ticket found or ticket expired. in /var/www/clients/client21/web65/web/class/pmg.php:147\nStack trace:\n#0 /var/www/clients/client21/web65/web/class/pmg.php(543): PMGService->action()\n#1 /var/www/clients/client21/web65/web/email_content_antispam.php(74): PMGService->get()\n#2 {main}\n thrown in /var/www/clients/client21/web65/web/class/pmg.php on line 147', referer: https://ourwebsite.sk/hosting-email

[Thu Jul 10 08:39:28.081610 2025] [proxy_fcgi:error] [pid 1155067:tid 1155147] [remote 176.116.114.29:14438] AH01071: Got error 'PHP message: PHP Fatal error: Uncaught PVE2_Exception: Not logged into Proxmox host. No Login access ticket found or ticket expired. in /var/www/clients/client21/web65/web/class/pmg.php:147\nStack trace:\n#0 /var/www/clients/client21/web65/web/class/pmg.php(543): PMGService->action()\n#1 /var/www/clients/client21/web65/web/email_content_antispam.php(74): PMGService->get()\n#2 {main}\n thrown in /var/www/clients/client21/web65/web/class/pmg.php on line 147', referer: https://ourwebsite.sk/hosting-email

[Thu Jul 10 08:41:15.807996 2025] [proxy_fcgi:error] [pid 2710100:tid 2710107] [remote 195.160.182.66:51563] AH01071: Got error 'PHP message: PHP Fatal error: Uncaught PVE2_Exception: Not logged into Proxmox host. No Login access ticket found or ticket expired. in /var/www/clients/client21/web65/web/class/pmg.php:147\nStack trace:\n#0 /var/www/clients/client21/web65/web/class/pmg.php(543): PMGService->action()\n#1 /var/www/clients/client21/web65/web/email_content_antispam.php(74): PMGService->get()\n#2 {main}\n thrown in /var/www/clients/client21/web65/web/class/pmg.php on line 147', referer: https://ourwebsite.sk/hosting-email
 

Attachments

  • Screenshot 2025-07-15 at 9.14.00 AM.png
    Screenshot 2025-07-15 at 9.14.00 AM.png
    72.3 KB · Views: 10
  • Screenshot 2025-07-15 at 9.17.55 AM.png
    Screenshot 2025-07-15 at 9.17.55 AM.png
    10.4 KB · Views: 10
  • Screenshot 2025-07-15 at 9.20.39 AM.png
    Screenshot 2025-07-15 at 9.20.39 AM.png
    40.1 KB · Views: 9
  • Screenshot 2025-07-15 at 9.26.56 AM.png
    Screenshot 2025-07-15 at 9.26.56 AM.png
    7.5 KB · Views: 9
  • Screenshot 2025-07-15 at 9.30.36 AM.png
    Screenshot 2025-07-15 at 9.30.36 AM.png
    534.2 KB · Views: 10
The errors you're seeing, like the "Not logged into Proxmox host" issue, indicate authentication problems with your API calls. Have you checked if there's a session timeout or if the authentication tokens are expiring too quickly?


Regarding your PHP pool configuration, increasing the number of workers can indeed help with handling more concurrent requests. What steps did you take when you tried to increase the PHP pool, and what issues did you encounter?


Also, for your MX cluster timeouts, have you reviewed the system logs to see if there are any resource constraints or network issues causing these timeouts across different nodes?
 
We checked the API call and it seems to be OK. The voice does not correspond to reality. It simply times out. And the web interface does that.

I tried various ways to raise workers, e.g. via >

Bash:
systemctl status pmgproxy | grep worker


mkdir /etc/systemd/system/pmgproxy.service.d
vim /etc/systemd/system/pmgproxy.service.d/override.conf

[Service]
Environment="MAX_WORKERS=8" "MAX_CONN=1500" "MAX_REQUESTS=20000"

systemctl daemon-reload
systemctl restart pmgproxy

But without success. Sometimes the service did not start, other times it did but with default parameters.

I have not yet noticed any performance problems and the same is true for the entire infrastructure network.
 
Hi guys,

Can anyone help? I need to solve this. I don't even know if it's possible to run PMG in such a large cluster with such a large number of domains.
 
  • Could you please check the logs to see why it’s timing out:
Code:
journalctl -u pmgproxy -f
grep 'error' /var/log/pmg/pmgproxy/pmgproxy.log

  • If the problem is too many connections, use NGINX in front of PMG to handle more traffic. Don’t try to change the built-in proxy.
  • Only if nothing else works, you can edit PMG/HTTPServer.pm to raise max_workers, but updates will undo your changes.
 
I don't have an error log there either.
But in
Code:
/var/log/pmgproxy/pmgproxy.log
I have a lot of good logs from the API.

Code:
::ffff:172.16.8.9 - setup@pmg [01/10/2025:22:31:28 +0200] "DELETE /api2/json/config/ruledb/who/1992/objects/158924 HTTP/1.1" 200 13

Code:
::ffff:172.16.8.9 - setup@pmg [06/10/2025:06:20:30 +0200] "GET /api2/json/config/transport HTTP/1.1" 200 2396297And so on.Of course we also use HAProxy in front of MX (for management) but it doesn't matter.

And so on.
Of course we also use HAProxy in front of MX (for management) but it doesn't matter.

We have a bug there that adds a 3-second lag to every PMG response, and which we certainly won't eliminate by adding RAM or CPU. I tested it for a long time, I polled from different servers, at different times, to different mxs, even from the localhost mx to itself, and the response always takes about 10-50 milliseconds PLUS THREE SECONDS. It doesn't matter if it's Sunday at midnight, when no one is there, or now on Monday morning, when everyone is working on mail. It's never 2.5 seconds or less, it's never 3.5 seconds or more. The response always takes between 3.01 and 3.05 seconds. Those 0.01 to 0.05 seconds depend on the current load, but those 3 seconds are a bug that we should eliminate. I also tried it directly on MX, I'll run it from anywhere (put 127.0.0.1 instead of mx3.webhouse.sk), and look at the value of "time_starttransfer"

Code:
curl -sk -o /dev/null -w ' time_namelookup: %{time_namelookup}s time_connect: %{time_connect}s time_appconnect: %{time_appconnect}s time_starttransfer:%{time_starttransfer}s time_total: %{time_total}s ' https://mx3.webhouse.sk:8006/api2/json/version

This is result >
Code:
time_namelookup: 0.002022s time_connect: 0.004828s time_appconnect: 0.053946s time_starttransfer:3.058493s time_total: 3.058712s