quorum timeout adding node to cluster unicast

canelli · Jul 23, 2013

I have a two node cluster (node name pm0 and pm1 ) with unicast enabled ( transport="udpu" )

Code:

<?xml version="1.0"?>
<cluster name="alidays-cluster" config_version="15">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>

  <clusternodes>
  <clusternode name="pm1" votes="1" nodeid="1"/>
  <clusternode name="pm0" votes="1" nodeid="2"/></clusternodes>

</cluster>

pveversion :

Code:

root@pm1:~# pveversion
pve-manager/2.3/7946f1f1

I setup a new host (pm2) with a fresh installation of pve . When adding this node to the cluster I got the error "Waiting for quorum... Timed-out waiting for cluster"

Code:

root@pm2:/var/log# pvecm add pm1
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-clustercan't create shared ssh key database '/etc/pve/priv/authorized_keys'
.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

the new cluster configuration on pm1:

Code:

<?xml version="1.0"?>
<cluster name="alidays-cluster" config_version="16">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>

  <clusternodes>
  <clusternode name="pm1" votes="1" nodeid="1"/>
  <clusternode name="pm0" votes="1" nodeid="2"/><clusternode name="pm2" votes="1" nodeid="3"/></clusternode
s>

</cluster>

the new node seem to be added to the cluster but when cman start it's fail to join the cluster

Code:

Jul 23 15:23:08 corosync [MAIN  ] Successfully configured openais services to load
Jul 23 15:23:08 corosync [TOTEM ] Initializing transport (UDP/IP Unicast).
Jul 23 15:23:08 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 23 15:23:08 corosync [TOTEM ] The network interface [192.168.169.15] is now up.
Jul 23 15:23:08 corosync [QUORUM] Using quorum provider quorum_cman
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jul 23 15:23:08 corosync [CMAN  ] CMAN 1352871249 (built Nov 14 2012 06:34:12) started
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync CMAN membership service 2.90
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais cluster membership service B.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais event service B.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais checkpoint service B.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais message service B.03.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais distributed locking service B.03.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais timer service A.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync configuration service
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync profile loading service
Jul 23 15:23:08 corosync [QUORUM] Using quorum provider quorum_cman
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jul 23 15:23:08 corosync [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Jul 23 15:23:08 corosync [TOTEM ] adding new UDPU member {192.168.169.10}
Jul 23 15:23:08 corosync [TOTEM ] adding new UDPU member {192.168.169.5}
Jul 23 15:23:08 corosync [TOTEM ] adding new UDPU member {192.168.169.15}
Jul 23 15:23:08 corosync [CLM   ] CLM CONFIGURATION CHANGE
Jul 23 15:23:08 corosync [CLM   ] New Configuration:
Jul 23 15:23:08 corosync [CLM   ] Members Left:
Jul 23 15:23:08 corosync [CLM   ] Members Joined:
Jul 23 15:23:08 corosync [CLM   ] CLM CONFIGURATION CHANGE
Jul 23 15:23:08 corosync [CLM   ] New Configuration:
Jul 23 15:23:08 corosync [CLM   ]       r(0) ip(192.168.169.15)
Jul 23 15:23:08 corosync [CLM   ] Members Left:
Jul 23 15:23:08 corosync [CLM   ] Members Joined:
Jul 23 15:23:08 corosync [CLM   ]       r(0) ip(192.168.169.15)
Jul 23 15:23:08 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 23 15:23:08 corosync [QUORUM] Members[1]: 3
Jul 23 15:23:08 corosync [QUORUM] Members[1]: 3
Jul 23 15:23:08 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.169.15) ; members(old:0 left:0)
Jul 23 15:23:08 corosync [MAIN  ] Completed service synchronization, ready to provide service.

tom · Jul 23, 2013

did you add the new host to /etc/hosts on ALL nodes before, rebooted?

see http://pve.proxmox.com/wiki/Multicast_notes#Use_unicast_instead_of_multicast

canelli · Jul 23, 2013

Hi Tom

I added in /etc/hosts on all node ( 2 two node cluster member and to the new host ) .
I rebooted the new host but not the entire cluster ( it's a production environment ! )

Claudio

canelli · Jul 26, 2013

Hi Tom
I solved the error .
First of all: I can't reboot the two live hosts because on a production environment with more then 10 VM runnning , so I restarted only cluster manager on all cluster nodes (pm0 and pm1), operation made with puttycs to execute the same command at the same time .
Then, I added the new node to the cluster and all worked fine .

for further documentation, This is the step I used:

First, reset cluster to the original state:
a) remove previsoulsy added new node from cluster on one of the original node

Code:

root@pm0:~#pvecm delnode pm2

b) stop the cluster manager on the new node

Code:

root@pm2:~#service cman stop
root@pm2:~#service pve-cluster stop

c) delete cluster definition on new node (pm2)

Code:

root@pm2:~#rm /etc/cluster/cluster.conf
root@pm2:~#rm -r /var/lib/pve-cluster/*

and reboot it
e) restart cluster manager on all two nodes , lunching command at the same time with puttycs

Code:

root@pm0:~#service cman stop
root@pm0:~#service pve-cluster stop
root@pm0:~#service pve-cluster start
root@pm0:~#service cman start

At this point I have a running cluster with 2 node (pm0 and pm1 ) and a standalone node pm2
now:
a) check to ensure that on all nodes the /etc/hosts have an entry for each node (the origianal two and the new entry )
on pm0

Code:

root@pm0:~#cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.168.169.5    pm0.local           pm0  pvelocalhost
10.168.169.10   pm1.local           pm1
10.168.169.15   pm2.local           pm2

on pm2

Code:

root@pm2:~#cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.168.169.5    pm0.local           pm0  
10.168.169.10   pm1.local           pm1
10.168.169.15   pm2.local           pm2 pvelocalhost

b) if modified the /etc/hosts, restart cluster manager on the cluster (see previous)
c) if modified the /etc/hosts on the new node , reboot node
d) from the new node , join the cluster

Code:

root@pm2:~#pvecm  add  pm1

The join completed succesfuly
e) to check the system, reboot the new node (pm2) . All work fine

Claudio

tom · Jul 26, 2013

thanks for feedback!

garithd · Mar 12, 2015

Hi All,

I know this is an old post but wanted to add a comment here as I've been struggling for about a week until I figured this out.

When adding a new proxmox host to an existing cluster I kept getting "waiting for quorum". I know multicast worked as I tested using omping but didn't realize that using jumbo frames on the interface that you are contacting the cluster node on somehow prevents this from working. I remove jumbo frames from the interface in question and set it back to the default 1500 MTU. After that I could join the cluster without issue.

Hope this helps someone.
Garith Dugmore

athompso · May 20, 2015

Enough people have quorum issues, corosync issues, joining issues, etc. that I think these troubleshooting steps should be documented on the wiki.

Oh, wait, I have write access to the wiki. Um. Yeah. OK, I'll start writing something up :-(.

FWIW, I just discovered that on a 1Gbit/sec network, I'm unable to add a 5th cluster node when using UDP unicast. Deleting all the cluster data (per above) and then re-joining after disabling IGMP on my switch completely works.

Search

Search

quorum timeout adding node to cluster unicast

canelli

New Member

tom

Proxmox Staff Member

canelli

New Member

canelli

New Member

tom

Proxmox Staff Member

garithd

New Member

athompso

Renowned Member

We value your privacy