Cluster - communication failure (0)

Mar 26, 2023
68
10
8
Hi all,

One of our servers in the PMG cluster is currently down. However, it seems this then breaks the Cluster page in PMG.

Is this normal behavior?

Thanks.
 

Attachments

  • pmg-cluster-down.png
    pmg-cluster-down.png
    307.5 KB · Views: 22
and to which node do you connect to the webui ?
 
ok, can you maybe post a snippet of the logs while your surfing to the cluster page (a few minutes before and after would be good)
 
Hi there,

Okay, I just rebooted node 5 and clicked the Cluster menu in Master node 1 and got the 'communication failure (0) error'.

This is the only log I can see that looks relevant:

Code:
Apr 14 14:25:37 pmg1 systemd[1]: Started Session 102293 of user root.
Apr 14 14:25:37 pmg1 systemd[1]: session-102293.scope: Succeeded.
Apr 14 14:25:38 pmg1 pmgdaemon[3707008]: successful auth for user 'root@pam'
Apr 14 14:25:38 pmg1 pmgmirror[934]: database sync 'pmg5' failed - DBI connect('dbname=Proxmox_ruledb;host=/run/pmgtunnel;port=5;','root',...) failed: could not connect to server: No such file or directory#012#011Is the server running locally and accepting#012#011connections on Unix domain socket "/run/pmgtunnel/.s.PGSQL.5"? at /usr/share/perl5/PMG/DBTools.pm line 66.
Apr 14 14:25:38 pmg1 pmgmirror[934]: cluster synchronization finished  (1 errors, 1.09 seconds (files 0.79, database 0.30, config 0.00))
Apr 14 14:25:42 pmg1 pmgtunnel[601]: restarting crashed tunnel 3759954 195.22.156.3
Apr 14 14:25:43 pmg1 pmgpolicy[916]: starting policy database maintenance (greylist, rbl)
Apr 14 14:25:43 pmg1 pmgpolicy[916]: end policy database maintenance (30 ms, 3 ms)
Apr 14 14:25:46 pmg1 systemd[1]: Started Session 102294 of user root.
 
ok i can reproduce, but only for a short while after the node is offline. after about a minute, i get the result showing (which shows that the other node is in state 'error')
the 'communication failure' only happens if it takes to long to connect (>30s)

does the problem persist on your side?

(i'll send a patch that introduces a timeout to the call)