making new cluster: Waiting for quorum... Timed-out waiting for cluster

bread-baker

Member
Mar 6, 2010
432
0
16
I did 2 fresh installs of 2.0 beta3 .
using http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster . i attempt to create a cluster .

from 1-st node:

pvecm create fbcmain
Code:
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
e4:8e:a0:45:64:c6:fd:3a:59:69:e3:23:82:0e:3c:e9 root@fbc186
The key's randomart image is:
+--[ RSA 2048]----+
|   .+.           |
|   +. .          |
|    .  ...       |
|   .   o*        |
|. ..o  *S.       |
|.+.o..=oo        |
|.oo  ..o..       |
| E.              |
|                 |
+-----------------+
Restarting pve cluster filesystem: pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]

2-nd:
Code:
root@fbc50 ~ # pvecm add  10.100.100.186
[code]
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
00:98:31:76:45:e6:83:07:a4:35:dc:9c:42:e2:3d:69 root@fbc50
The key's randomart image is:
+--[ RSA 2048]----+
|  =BO=+.         |
| ooBoO+          |
|  o E.=          |
|   . o o         |
|        S        |
|                 |
|                 |
|                 |
|                 |
+-----------------+
The authenticity of host '10.100.100.186 (10.100.100.186)' can't be established.
RSA key fingerprint is 9e:7a:ad:f5:29:f9:ee:fe:02:e3:50:b7:8a:7a:57:1c.
Are you sure you want to continue connecting (yes/no)? yes
root@10.100.100.186's password: 
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
cluster not ready - no quorum?


from the 2-nd node is the command: pvecm add 10.100.100.186 correct?
 
on fbc50 I had previously ran ssh-copyid fbc186 . maybe that caused the issue?

so try to delete fbc50 node
on fbc186:
Code:
root@fbc186 ~/.ssh # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2011-12-02 13:41:26  fbc186
   2   X      8                        fbc50
root@fbc186 ~/.ssh # pvecm delnode fbc50

cluster not ready - no quorum?


the wiki says to delete the node use:
pvecm delnode fbc50

but you'd figure this would show no 'fbc50':
pvecm nodes
Node Sts Inc Joined Name
1 M 4 2011-12-02 13:41:26 fbc186
2 X 8 fbc50



I'm reinstalling the fbc50, will use another i/p address and hostname .
 
so reinstalled to fbc192
but still get 'cluster not ready - no quorum? '
Code:
root@fbc192:~# pvecm add  10.100.100.186
root@10.100.100.186's password: 
unable to copy ssh ID
root@fbc192:~# pvecm add  10.100.100.186
root@10.100.100.186's password: 
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
cluster not ready - no quorum?

but that made it so I could remove fbc50 from the cluster. So at least 3 nodes are needed before one can be removed:
Code:
root@fbc186 /etc/pve # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2011-12-02 13:41:26  fbc186
   2   X      8                        fbc50
   3   X      0                        fbc192
root@fbc186 /etc/pve # pvecm delnode fbc50
root@fbc186 /etc/pve # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2011-12-02 13:41:26  fbc186
   3   X      0                        fbc192

but still left with
Code:
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
cluster not ready - no quorum?

what am I doing wrong?

I'll read more while waiting for the answer.
sure the answer will make all this look stupid, but it is better to look foolish and have this learning process in the open as it may help someone else .
 
ok I noticed there may have been a dns issue, so i reinstalled, on both nodes.

ended up with the same issue on the 2-nd node:
Code:
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
cluster not ready - no quorum?

note the folder 'nodes/' is missing:
Code:
drwxr-x---  2 root www-data    0 Dec 31  1969 ./
drwxr-xr-x 81 root root     4096 Dec  2 16:16 ../
-r--r-----  1 root www-data  279 Dec  2 16:16 cluster.conf
-r--r-----  1 root www-data  153 Dec 31  1969 .clusterlog
-rw-r-----  1 root www-data    2 Dec 31  1969 .debug
lr-xr-x---  1 root www-data    0 Dec 31  1969 local -> nodes/fbc186
-r--r-----  1 root www-data  225 Dec 31  1969 .members
lr-xr-x---  1 root www-data    0 Dec 31  1969 openvz -> nodes/fbc186/openvz
lr-xr-x---  1 root www-data    0 Dec 31  1969 qemu-server -> nodes/fbc186/qemu-server
-r--r-----  1 root www-data  198 Dec 31  1969 .rrd
-r--r-----  1 root www-data  229 Dec 31  1969 .version
-r--r-----  1 root www-data   18 Dec 31  1969 .vmlist

I've a good cluster at another site. the .members file from there:
Code:
{
"nodename": "fbc10",
"version": 4,
"cluster": { "name": "fbcandover", "version": 4, "nodes": 2, "quorate": 1 },
"nodelist": {
  "fbc158": { "id": 1, "online": 1, "ip": "10.100.100.158"},
  "fbc10": { "id": 2, "online": 1, "ip": "10.100.100.10"}
  }
}

and here on both nodes there is a missing i/p address:
Code:
root@fbc192 /etc/pve # cat .members
{
"nodename": "fbc192",
"version": 4,
"cluster": { "name": "fbc", "version": 2, "nodes": 2, "quorate": 1 },
"nodelist": {
  "fbc192": { "id": 1, "online": 0, "ip": "10.100.100.192"},
  "fbc186": { "id": 2, "online": 0}
  }
}
and the other node:
Code:
root@fbc186 /etc/pve # cat .members
{
"nodename": "fbc186",
"version": 3,
"cluster": { "name": "fbc", "version": 2, "nodes": 2, "quorate": 0 },
"nodelist": {
  "fbc192": { "id": 1, "online": 0},
  "fbc186": { "id": 2, "online": 1, "ip": "10.100.100.186"}
  }
}

I'll try adding the i/p manually..
 
Maybe your switch does not allow IP multicast traffic? What switch do you use exactly (vendor, model)?
 
fbc1 is attached to a Netgear GS108T , which goes to a Netgear FSM7328S .

fbc186 is attached to the FSM7328S .
 
OK I'll put them on the same switch. And will use a non managed switch network for all the cluster nodes [ will use separate nic for cluster ] . I read somewhere that managed switches are not good for sheepdog etc. I'll find that link later if someone needs the reference.

On Thursday I'll try to create the cluster.
 
Last edited:
on my 1-st attempt, I plugged both nodes into the same managed switch. and got the same 'Waiting for quorum... Timed-out waiting for cluster' .
so I plugged both into a non managed switch, and rebooted the new node, and the cluster worked .

for the 3-rd node, I put that to the non managed switch, added it to the cluster, and it joined right away .


I do not have a ton of network expertise and so probably have something set wrong in the Netgear FSM7328 .



also here is a link discussing managed switches and non managed in a cluster : http://community.spiceworks.com/topic/96916-what-kind-of-switch-are-you-using-in-your-iscsi-san

so for cluster we will use a non managed switches .
 
You can't edit those file - they are read only.

How can I change the IP-Adresses after creating with "pvecm create"?

I have two Net-Interfaces and I want to change the Cluster-Communication from the 1st to the 2nd card.
 
Dear Gerhild, please take a look at http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster

Make sure that each Proxmox VE node is installed with the final hostname and IP configuration. Changing the hostname and IP is not possible after cluster creation.
Could you document which IP is used for cluster creation in case of several IPs?
It seems that the configurationfile is /etc/hosts and /etc/hostname
Code:
grep `cat /etc/hostname` /etc/hosts|awk '{ print $1}'
It would be nice to have a separate configuration-file instead.



So, i came to my next question: How to delete a whole cluster-Setup?
I mean, there is a
Code:
pvecm create <clustername>
will there be something like
Code:
pvecm purge <clustername>
?
 
Last edited:
For me helps that:
1. node 1 - create cluster like in wiki
2.
echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
on both nodes to enable multicast (maybe)
netstat -g
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
bla-bla-bla
vmbr0 1 239.192.171.223
bla-bla-bla
on first node to view multicast groups of interfaces and to saw unusual group. It is our cluster multicast group

3. node 2 - join cluster as usual
waiting for quorum...OK
generating node certificates
merge known_hosts file
restart services
Restarting PVE Daemon: pvedaemon.
Restarting web server: apache2 ... waiting .
successfully added node 'pas1' to cluster.
P.S. to check multicast working just ping multicast group address
after joining second node:
ping -c 2 239.192.171.223
PING 239.192.171.223 (239.192.171.223) 56(84) bytes of data.
64 bytes from 192.168.7.223: icmp_req=1 ttl=64 time=0.029 ms
64 bytes from 192.168.7.222: icmp_req=1 ttl=64 time=0.126 ms (DUP!)
64 bytes from 192.168.7.223: icmp_req=2 ttl=64 time=0.021 ms

--- 239.192.171.223 ping statistics ---
2 packets transmitted, 2 received, +1 duplicates, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.021/0.058/0.126/0.048 ms
 
Sir,
I have created the new cluster and here is the my contents cluster.conf.new

<?xml version="1.0"?>
<cluster name="master3" config_version="3">


<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
</cman>


<clusternodes>
<clusternode name="node3" votes="1" nodeid="1"/>
<clusternode name="node1" votes="1" nodeid="2"/></clusternodes>


</cluster>

As I'm trying to configure a 2 node cluster with unicasting (Multicasting not supporting)
The following error coming.
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]

Please help.
Nasim
 
will there be something like
Code:
pvecm purge <clustername>
?

See the following guide, it is a good way to reset the cluster or one of the nodes http://undefinederror.org/how-to-reset-cluster-configuration-in-proxmox-2/ (works for ProxMox 3 as well)

PS. I really dislike how proxmox are pushing for subscriptions with the new enterprise repo ... pushing people into subscribing is not the right way. I was probably going to get a subscription sometime in the future, now I will just look for an alternative to proxmox, coreos looks very promising especially that I mainly care about containers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!