dashboard connection timeouts -> WARNING: proxy detected vanished client

rnicolaides

New Member
Oct 9, 2013
6
0
1
Dear community,

we have a 4 node cluster, after removing a 5th failed node last weekend.
I tried to readd the failed node, but it didn´t work. So I removed it completly.

The cluster communication is realized over openVPN.
On two nodes there was also a nfs server, with defined shared storage in proxmox, but it was not used at that time.

After a lot of research, here is my problem:

The dahboard is not usable anymore and I can´t fix it nor have I found any information what the problem exactly is.
Some threads in this forum mention similar problems.

Symptoms

  • VM names are only: no name specified
  • Loading content stops with: timeout connection
  • The only log information I found: WARNING: proxy detected vanished client

Things I tried

  • Empty Browser-Cache
  • Restart services on all nodes
    • pveproxy
    • pve-cluster
    • nfs-kernel-server
  • Edit storage.conf and remove the nfs entries
  • Ensure no backup processes are running
  • echo "" > /var/log/pve/tasks/active

Questions

Is this still a confirmed bug?: post94889

Do you have any hints, or tips what I could try, or debug?


Technical informations

pvecm status (quite similar on all nodes)

Version: 6.2.0
Config Version: 12
Cluster Name: WR-CLUSTER
Cluster Id: 15060
Cluster Member: Yes
Cluster Generation: 12880
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Node votes: 1
Quorum: 3
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: bay2
Node ID: 4
Multicast addresses: 239.192.58.15
Node addresses: 10.8.8.2



pveversion

  • Deleted node: pve-manager/2.3/7946f1f1
  • node2: pve-manager/3.1-21/93bf03d4 (running kernel: 2.6.32-26-pve)
  • node7: pve-manager/2.3/7946f1f1
  • node4: pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-30-pve)
  • node3: pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)




Edit

I just found out, when I click on Datacenter and then choose a vm located on node2, or node4 I can work with node2 and node4 again. For node3 and node7 the problem remains

Maybe a recopy of pve-ssl.key could fix it? Like in post76741
Although no diff and no such error log

But I am not sure and dont´t want to risk it without confirmation.
 
Last edited:
I fixed it with the following on every node:
And phew:) without downtime or vm restart...
No HA enabled

Code:
:# for n in pve-cluster cman pvedaemon pvestatd pve-manager; do /etc/init.d/$n restart; done
:# fusermount -zu /etc/pve
:# /etc/init.d/pve-cluster restart


Now node7 is out of the cluster (red), but all other nodes are green and respond as usual.
All vms on node7 are still running. So I will reinstall it in the next days.

Any chance to readd it without reinstalling?
 
Hello rnicolaides

Now node7 is out of the cluster (red), but all other nodes are green and respond as usual.
All vms on node7 are still running. So I will reinstall it in the next days.

Any chance to readd it without reinstalling?

Node7 now runs isolated from the others?

But you never removed node7 from the cluster - or did you?

What shows now

Code:
clustat
pvecm status
pvecm node

on all nodes?

Maybe it shows something suspicious ....

E.g.: are all the IP addresses correct and working?

Kind regards

Mr.Holmes
 
Hello Mr.Holmes,

thank you for replying.

node2

Code:
root@node2:# pvecm nodes
Node  Sts   Inc   Joined               Name
   2   X  12904                        bay7
   3   M  12924   2014-08-06 15:35:52  bay4
   4   M  12904   2014-08-06 15:25:41  bay2
   5   M  12904   2014-08-06 15:25:41  bay3

Code:
root@node2:# clustat
Cluster Status for WR-CLUSTER @ Thu Aug  7 10:21:56 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node7                                                                2 Offline
 node4                                                                3 Online
 node2                                                                4 Online, Local
 node3                                                                5 Online


node7

Code:
 root@node7:# clustat
Cluster Status for WR-CLUSTER @ Thu Aug  7 10:15:52 2014
Member Status: Inquorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node7                                                                2 Offline
 node4                                                                3 Offline
 node2                                                                4 Offline
 node3                                                                5 Offline

Code:
pvecm status and pvecm nodes give the following error:

cman_tool: Error getting extra info: Node is not yet a cluster member


* All IPs are up and working.
* Multicast is allowed

Code:
root@node7:# asmping 224.0.2.1 10.8.8.2
pinging 10.8.8.2 from 10.8.8.7
  unicast from 10.8.8.2, seq=1 dist=0 time=7.183 ms
multicast from 10.8.8.2, seq=1 dist=0 time=47.878 ms
  unicast from 10.8.8.2, seq=2 dist=0 time=3.017 ms
multicast from 10.8.8.2, seq=2 dist=0 time=3.935 ms

This error is in node7 syslog:

Code:
 corosync[413440]:   [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.

Firewall seems to be ok.


Today I realized that login to the dashboard is not possible anymore, even with root@pam. It shows no error, just the login again.


Thanks again,
Ruben Nicolaides
 
Hello Ruben,

node7

Code:
 root@node7:# clustat
Cluster Status for WR-CLUSTER @ Thu Aug  7 10:15:52 2014
Member Status: Inquorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node7                                                                2 Offline
 node4                                                                3 Offline
 node2                                                                4 Offline
 node3                                                                5 Offline

Code:
pvecm status and pvecm nodes give the following error:

cman_tool: Error getting extra info: Node is not yet a cluster member

The above sounds like a contradiction. I guess in node7 something is damaged in cluster-configuration. Since I am just a cluster user but not an expert for internals I cannot say if there is a chance to repair it. Look what https://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster suggests in section "Remove a cluster node":

If for whatever reason you would like to make that server to join again the same cluster, you have to

  • reinstall pve on it from scratch
  • as a new node
  • and then regularly join it, as said in the previous section.

The other problem

Today I realized that login to the dashboard is not possible anymore, even with root@pam. It shows no error, just the login again.

I´d try to solve is as described here (even it did not succeed in that particular case):

http://forum.proxmox.com/threads/19046-can-t-Login-with-GUI-but-works-with-Telnet?p=97842#1

All the best!

Mr.Holmes
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!