quorum timeout adding node to cluster unicast

canelli

New Member
Jul 23, 2013
3
0
1
I have a two node cluster (node name pm0 and pm1 ) with unicast enabled ( transport="udpu" )

Code:
<?xml version="1.0"?>
<cluster name="alidays-cluster" config_version="15">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>

  <clusternodes>
  <clusternode name="pm1" votes="1" nodeid="1"/>
  <clusternode name="pm0" votes="1" nodeid="2"/></clusternodes>

</cluster>

pveversion :
Code:
root@pm1:~# pveversion
pve-manager/2.3/7946f1f1

I setup a new host (pm2) with a fresh installation of pve . When adding this node to the cluster I got the error "Waiting for quorum... Timed-out waiting for cluster"
Code:
root@pm2:/var/log# pvecm add pm1
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-clustercan't create shared ssh key database '/etc/pve/priv/authorized_keys'
.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

the new cluster configuration on pm1:
Code:
<?xml version="1.0"?>
<cluster name="alidays-cluster" config_version="16">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>

  <clusternodes>
  <clusternode name="pm1" votes="1" nodeid="1"/>
  <clusternode name="pm0" votes="1" nodeid="2"/><clusternode name="pm2" votes="1" nodeid="3"/></clusternode
s>

</cluster>

the new node seem to be added to the cluster but when cman start it's fail to join the cluster
Code:
Jul 23 15:23:08 corosync [MAIN  ] Successfully configured openais services to load
Jul 23 15:23:08 corosync [TOTEM ] Initializing transport (UDP/IP Unicast).
Jul 23 15:23:08 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 23 15:23:08 corosync [TOTEM ] The network interface [192.168.169.15] is now up.
Jul 23 15:23:08 corosync [QUORUM] Using quorum provider quorum_cman
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jul 23 15:23:08 corosync [CMAN  ] CMAN 1352871249 (built Nov 14 2012 06:34:12) started
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync CMAN membership service 2.90
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais cluster membership service B.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais event service B.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais checkpoint service B.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais message service B.03.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais distributed locking service B.03.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: openais timer service A.01.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync configuration service
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync profile loading service
Jul 23 15:23:08 corosync [QUORUM] Using quorum provider quorum_cman
Jul 23 15:23:08 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jul 23 15:23:08 corosync [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Jul 23 15:23:08 corosync [TOTEM ] adding new UDPU member {192.168.169.10}
Jul 23 15:23:08 corosync [TOTEM ] adding new UDPU member {192.168.169.5}
Jul 23 15:23:08 corosync [TOTEM ] adding new UDPU member {192.168.169.15}
Jul 23 15:23:08 corosync [CLM   ] CLM CONFIGURATION CHANGE
Jul 23 15:23:08 corosync [CLM   ] New Configuration:
Jul 23 15:23:08 corosync [CLM   ] Members Left:
Jul 23 15:23:08 corosync [CLM   ] Members Joined:
Jul 23 15:23:08 corosync [CLM   ] CLM CONFIGURATION CHANGE
Jul 23 15:23:08 corosync [CLM   ] New Configuration:
Jul 23 15:23:08 corosync [CLM   ]       r(0) ip(192.168.169.15)
Jul 23 15:23:08 corosync [CLM   ] Members Left:
Jul 23 15:23:08 corosync [CLM   ] Members Joined:
Jul 23 15:23:08 corosync [CLM   ]       r(0) ip(192.168.169.15)
Jul 23 15:23:08 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 23 15:23:08 corosync [QUORUM] Members[1]: 3
Jul 23 15:23:08 corosync [QUORUM] Members[1]: 3
Jul 23 15:23:08 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.169.15) ; members(old:0 left:0)
Jul 23 15:23:08 corosync [MAIN  ] Completed service synchronization, ready to provide service.
 
Hi Tom

I added in /etc/hosts on all node ( 2 two node cluster member and to the new host ) .
I rebooted the new host but not the entire cluster ( it's a production environment ! )

Claudio
 
Hi Tom
I solved the error .
First of all: I can't reboot the two live hosts because on a production environment with more then 10 VM runnning , so I restarted only cluster manager on all cluster nodes (pm0 and pm1), operation made with puttycs to execute the same command at the same time .
Then, I added the new node to the cluster and all worked fine .

for further documentation, This is the step I used:

First, reset cluster to the original state:
a) remove previsoulsy added new node from cluster on one of the original node
Code:
root@pm0:~#pvecm delnode pm2
b) stop the cluster manager on the new node
Code:
root@pm2:~#service cman stop
root@pm2:~#service pve-cluster stop
c) delete cluster definition on new node (pm2)
Code:
root@pm2:~#rm /etc/cluster/cluster.conf
root@pm2:~#rm -r /var/lib/pve-cluster/*
and reboot it
e) restart cluster manager on all two nodes , lunching command at the same time with puttycs
Code:
root@pm0:~#service cman stop
root@pm0:~#service pve-cluster stop
root@pm0:~#service pve-cluster start
root@pm0:~#service cman start


At this point I have a running cluster with 2 node (pm0 and pm1 ) and a standalone node pm2
now:
a) check to ensure that on all nodes the /etc/hosts have an entry for each node (the origianal two and the new entry )
on pm0
Code:
root@pm0:~#cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.168.169.5    pm0.local           pm0  pvelocalhost
10.168.169.10   pm1.local           pm1
10.168.169.15   pm2.local           pm2
on pm2
Code:
root@pm2:~#cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.168.169.5    pm0.local           pm0  
10.168.169.10   pm1.local           pm1
10.168.169.15   pm2.local           pm2 pvelocalhost

b) if modified the /etc/hosts, restart cluster manager on the cluster (see previous)
c) if modified the /etc/hosts on the new node , reboot node
d) from the new node , join the cluster
Code:
root@pm2:~#pvecm  add  pm1
The join completed succesfuly
e) to check the system, reboot the new node (pm2) . All work fine

Claudio
 
thanks for feedback!
 
Hi All,

I know this is an old post but wanted to add a comment here as I've been struggling for about a week until I figured this out.

When adding a new proxmox host to an existing cluster I kept getting "waiting for quorum". I know multicast worked as I tested using omping but didn't realize that using jumbo frames on the interface that you are contacting the cluster node on somehow prevents this from working. I remove jumbo frames from the interface in question and set it back to the default 1500 MTU. After that I could join the cluster without issue.

Hope this helps someone.
Garith Dugmore
 
Enough people have quorum issues, corosync issues, joining issues, etc. that I think these troubleshooting steps should be documented on the wiki.

Oh, wait, I have write access to the wiki. Um. Yeah. OK, I'll start writing something up :-(.

FWIW, I just discovered that on a 1Gbit/sec network, I'm unable to add a 5th cluster node when using UDP unicast. Deleting all the cluster data (per above) and then re-joining after disabling IGMP on my switch completely works.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!