[SOLVED] Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0)"

qgrasso

Member
Jul 31, 2013
27
4
23
Queensland, Australia
Hi,

I having strange issues on my new proxmox cluster of 3 Nodes,

Everything was going fine earlier today and now randomly i'm getting these errors, in the webui,

"Error Connection error 596" & "Communication Failure (0)" at different times,

I generally access the cluster via node 1's IP

I'm getting these errors above when i'm clicking on the tabs of other hosts via host 1. So it seems like it mostly shows up when accessing across hosts.
for example when i'm connected to host 1's IP and access details of host 2 or 3 vm or server details I get these errors above. and same if i'm connected to host 2 or 3 and accessing 1 or 2 etc.

I've had a ping running on each of the hosts to ensure no packets have been dropped between the vm hosts and they are all local to each other. (same network)

I found on the forums running "pvecm updatecerts" resolved it for someone else but no luck.

I've also tried rebooting the cluster same issues,

pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: dc01
Cluster Id: 1341
Cluster Member: Yes
Cluster Generation: 96
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pvmm01
Node ID: 1
Multicast addresses: 239.192.5.66
Node addresses: 10.90.0.10


# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: dc01
Cluster Id: 1341
Cluster Member: Yes
Cluster Generation: 96
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pvmm02
Node ID: 2
Multicast addresses: 239.192.5.66
Node addresses: 10.90.0.11

pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: dc01
Cluster Id: 1341
Cluster Member: Yes
Cluster Generation: 96
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pvmm03
Node ID: 3
Multicast addresses: 239.192.5.66
Node addresses: 10.90.0.12


Any ideas?

Cheers,
Quenten
 
Last edited:
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

*bump*

Sorry to do this however its holding my project up.

I also tried a fresh installl of all 3 nodes, and a apt-get dist-upgrade of which same issues are happening.
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

Is pve-cluster service running on all nodes?
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

*bump*

Sorry to do this however its holding my project up.

I also tried a fresh installl of all 3 nodes, and a apt-get dist-upgrade of which same issues are happening.

Hi,
perhaps the switch have sometimes trouble with multicast?

See here - also for multicastping http://pve.proxmox.com/wiki/Multicast_notes

Udo
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

Thanks for the tip, I ended up changing to UDP and unfortunately ealier today and that didn't make any difference, :confused:

I'm Going to try and do a clean install again and see if that helps I guess.

Quenten
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

Update,

I've completed a clean install of proxmox on my 3 nodes,

I've also done and apt-get update && apt-get dist-upgrade on all 3 nodes as well, Updated /etc/hosts so all 3 nodes exist and pvelocalhost is set on each of the nodes as per each node.
I tried doing some speedtest using iperf between the nodes and all worked well.

Rebooted all the nodes and created my new cluster on node 1.

modified cluster to use udp as per,


<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>


  • activate via GUI
  • add all nodes you want to join in /etc/hosts and reboot
  • before you add a node, make sure you add all other nodes in /etc/hosts
As per, http://pve.proxmox.com/wiki/Fencing#General_HowTo_for_editing_the_cluster.conf

* Also Followed this carefully also noticed a syntax error with the /> and resolved this by removing the "/"


After updating the files above went into the web gui and activated the change and restarted this server.


Once reboot was completed I started joining the other servers to the cluster and i got this on node 2.

root@pvmm02:~#pvecm add 10.90.0.10
The authenticity of host '10.90.0.10 (10.90.0.10)' can't be established.
ECDSA key fingerprint is 05:61:57:c8:20:61:f3:36:0d:38:87:0d:4f:be:15:bb.
Are you sure you want to continue connecting (yes/no)? yes
root@10.90.0.10's password:
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-clustercan't create shared ssh key database '/etc/pve/priv/authorized_keys'
.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

And its just stuck on this.


I the only differences between my lab cluster and this setup are

Packages,

The following packages will be upgraded:
bind9-host dnsutils gnupg gpgv libbind9-80 libdns88 libgcrypt11 libisc84 libisccc80 libisccfg82 liblwres80


Hardware,
Intel 10Gbit NIC's
Force 10 S4810 Switches,
Similar Compute CPU's Hardware etc.
All Servers running latest firmware bios's etc.

from node 1)
root@pvmm01:~# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: dc01
Cluster Id: 1341
Cluster Member: Yes
Cluster Generation: 1776
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pvmm01
Node ID: 1
Multicast addresses: 255.255.255.255
Node addresses: 10.90.0.10


My network interfaces, http://pastie.org/private/ksrfmfkza06xyr27n3paza

Any ideas?

Regards,
Quenten
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

Update,

I've completed a clean install of proxmox on my 3 nodes,

I've also done and apt-get update && apt-get dist-upgrade on all 3 nodes as well, Updated /etc/hosts so all 3 nodes exist and pvelocalhost is set on each of the nodes as per each node.
I tried doing some speedtest using iperf between the nodes and all worked well.

Rebooted all the nodes and created my new cluster on node 1.

modified cluster to use udp as per,


<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>


  • activate via GUI
  • add all nodes you want to join in /etc/hosts and reboot
  • before you add a node, make sure you add all other nodes in /etc/hosts
As per, http://pve.proxmox.com/wiki/Fencing#General_HowTo_for_editing_the_cluster.conf

* Also Followed this carefully also noticed a syntax error with the /> and resolved this by removing the "/"


After updating the files above went into the web gui and activated the change and restarted this server.


Once reboot was completed I started joining the other servers to the cluster and i got this on node 2.

root@pvmm02:~#pvecm add 10.90.0.10
The authenticity of host '10.90.0.10 (10.90.0.10)' can't be established.
ECDSA key fingerprint is 05:61:57:c8:20:61:f3:36:0d:38:87:0d:4f:be:15:bb.
Are you sure you want to continue connecting (yes/no)? yes
root@10.90.0.10's password:
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-clustercan't create shared ssh key database '/etc/pve/priv/authorized_keys'
.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

And its just stuck on this.


I the only differences between my lab cluster and this setup are

Packages,

The following packages will be upgraded:
bind9-host dnsutils gnupg gpgv libbind9-80 libdns88 libgcrypt11 libisc84 libisccc80 libisccfg82 liblwres80


Hardware,
Intel 10Gbit NIC's
Force 10 S4810 Switches,
Similar Compute CPU's Hardware etc.
All Servers running latest firmware bios's etc.

from node 1)
root@pvmm01:~# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: dc01
Cluster Id: 1341
Cluster Member: Yes
Cluster Generation: 1776
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pvmm01
Node ID: 1
Multicast addresses: 255.255.255.255
Node addresses: 10.90.0.10


My network interfaces, http://pastie.org/private/ksrfmfkza06xyr27n3paza

Any ideas?

Regards,
Quenten
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

So another quick update,

On All Nodes i've ran,

iptables -A INPUT -m addrtype --dst-type MULTICAST -j ACCEPT
iptables -A INPUT -p udp -m state --state NEW -m multiport --destination-ports 5404,5405 -j ACCEPT

On nodes 2 and 3 I ran,

service cman stop
service cman start
pvecm add 10.90.0.10 -force

this allowed them to join the cluster like so,

root@pvmm02:~# pvecm add 10.90.0.10 -force
node pvmm02 already defined
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
generating node certificates
merge known_hosts file
restart services
Restarting PVE Daemon: pvedaemon.
Restarting PVE API Proxy Server: pveproxy.
successfully added node 'pvmm02' to cluster.

Status Now from node 1

root@pvmm01:/var/log/cluster# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: dc01
Cluster Id: 1341
Cluster Member: Yes
Cluster Generation: 2196
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pvmm01
Node ID: 1
Multicast addresses: 255.255.255.255
Node addresses: 10.90.0.10


next I'll reboot each of the nodes and spin up some vm's and see if i get the ui communication errors again i guess.

Cheers,
Quenten
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

Update,

so its been about 6 hours, I have spun up a few vm's on each host and all still running ok.

I'll see how it goes over next few days.

Cheers,
Quenten
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

Update,

Hi All so its been awhile now and after a clean install of all the nodes and changing from multicast to UDP before joining the additional nodes and then joining the nodes to the new cluster seems to have resolved my issues.

Hopefully this may help someone in future.

Cheers,
Quenten
 
Re: Cluster Connection Errors, "Error Connection error 596" "Communication Failure (0

Hi,
Could you please help me on
Waiting for quorum... Timed-out waiting for cluster
[FAILED]

I have done the configuration for unicast under /etc/pve/cluster.conf.new and validate the file

  • add the new transport="udpu" in /etc/pve/cluster.conf.new

<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>



  • activate via GUI
  • add all nodes you want to join in /etc/hosts and reboot
  • before you add a node, make sure you add all other nodes in /etc/hosts

I have done the necessary changes for unicast
Please guide me
How to activate via GUI and which IP should be added in /etc/hosts Public or Private IP ?

Awaiting your kind response...
Regards,
Nasim
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!