Cluster - communication failure (0)

kez

Member
Mar 26, 2023
85
12
13
Hi all,

One of our servers in the PMG cluster is currently down. However, it seems this then breaks the Cluster page in PMG.

Is this normal behavior?

Thanks.
 

Attachments

  • pmg-cluster-down.png
    pmg-cluster-down.png
    307.5 KB · Views: 25
and to which node do you connect to the webui ?
 
ok, can you maybe post a snippet of the logs while your surfing to the cluster page (a few minutes before and after would be good)
 
Hi there,

Okay, I just rebooted node 5 and clicked the Cluster menu in Master node 1 and got the 'communication failure (0) error'.

This is the only log I can see that looks relevant:

Code:
Apr 14 14:25:37 pmg1 systemd[1]: Started Session 102293 of user root.
Apr 14 14:25:37 pmg1 systemd[1]: session-102293.scope: Succeeded.
Apr 14 14:25:38 pmg1 pmgdaemon[3707008]: successful auth for user 'root@pam'
Apr 14 14:25:38 pmg1 pmgmirror[934]: database sync 'pmg5' failed - DBI connect('dbname=Proxmox_ruledb;host=/run/pmgtunnel;port=5;','root',...) failed: could not connect to server: No such file or directory#012#011Is the server running locally and accepting#012#011connections on Unix domain socket "/run/pmgtunnel/.s.PGSQL.5"? at /usr/share/perl5/PMG/DBTools.pm line 66.
Apr 14 14:25:38 pmg1 pmgmirror[934]: cluster synchronization finished  (1 errors, 1.09 seconds (files 0.79, database 0.30, config 0.00))
Apr 14 14:25:42 pmg1 pmgtunnel[601]: restarting crashed tunnel 3759954 195.22.156.3
Apr 14 14:25:43 pmg1 pmgpolicy[916]: starting policy database maintenance (greylist, rbl)
Apr 14 14:25:43 pmg1 pmgpolicy[916]: end policy database maintenance (30 ms, 3 ms)
Apr 14 14:25:46 pmg1 systemd[1]: Started Session 102294 of user root.
 
ok i can reproduce, but only for a short while after the node is offline. after about a minute, i get the result showing (which shows that the other node is in state 'error')
the 'communication failure' only happens if it takes to long to connect (>30s)

does the problem persist on your side?

(i'll send a patch that introduces a timeout to the call)