TASK ERROR: cluster not ready - no quorum?

raj

Renowned Member
Sep 17, 2011
219
4
83
www.businessparksolutions.com
Hiya,

I have set up a cluster currently with 2 boxes , the 3 box is ready but has not been added yet.

When i reboot the second node, I get error saying that the TASK ERROR: cluster not ready - no quorum?

But in the syslog i have something else:

Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:38 slave01 corosync[1516]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 21 12:17:38 slave01 corosync[1516]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.12) ; members(old:1 left:0)
Jul 21 12:17:38 slave01 corosync[1516]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 21 12:17:38 slave01 kernel: svc: failed to register lockdv1 RPC service (errno 97).
Jul 21 12:17:38 slave01 task UPID:slave01:000006E6:00001872:500A8FC8:startall::root@pam:: cluster not ready - no quorum?
Jul 21 12:17:38 slave01 pvesh: <root@pam> end task UPID:slave01:000006E6:00001872:500A8FC8:startall::root@pam: cluster not ready - no quorum?
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:41 slave01 corosync[1516]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 21 12:17:41 slave01 corosync[1516]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.12) ; members(old:1 left:0)
Jul 21 12:17:41 slave01 corosync[1516]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE



Multicast is enable on the switch.

Can anyone advise as i do not want to put the 3 server in if there is a configuration issue.

Cheers,

Raj
 
Multicast is running and the this is what i have in the logs:

Jul 22 21:47:02 master corosync[1557]: [CLM ] CLM CONFIGURATION CHANGE
Jul 22 21:47:02 master corosync[1557]: [CLM ] New Configuration:
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.11)
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.13)
Jul 22 21:47:02 master corosync[1557]: [CLM ] Members Left:
Jul 22 21:47:02 master corosync[1557]: [CLM ] Members Joined:
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.13)
Jul 22 21:47:02 master corosync[1557]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 22 21:47:02 master corosync[1557]: [CMAN ] quorum regained, resuming activity
Jul 22 21:47:02 master corosync[1557]: [QUORUM] This node is within the primary component and will provide service.
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[2]: 1 2
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[2]: 1 2
Jul 22 21:47:02 master pmxcfs[1435]: [status] notice: node has quorum
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[3]: 1 2 3
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[3]: 1 2 3
Jul 22 21:47:02 master pmxcfs[1435]: [dcdb] notice: members: 1/1435, 2/1418
Jul 22 21:47:02 master pmxcfs[1435]: [dcdb] notice: starting data syncronisation



This seems to be when a node is rebooted.

Is it that when you reboot a node that its in the cluster it takes time to come back?

Cheers,

Raj
 
I rebooted all 3 members in the cluster and now i am getting the following from 1 node if i try to connect to node 2 :

connection error 500: Can`t connect to 195.49.147.103:8006(connect: Connection timed out)

Also on node 1 I am getting the following error:

Jul 29 01:42:22 master pmxcfs[1435]: [status] crit: cpg_send_message failed: 9
Jul 29 01:42:22 master pmxcfs[1435]: [status] crit: cpg_send_message failed: 9

But node 2 and 3 are connecting fine.

When I do a pvecm status from node 1 :

root@master:/var/log/cluster# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: clustername
Cluster Id: 34657
Cluster Member: Yes
Cluster Generation: 420
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: master
Node ID: 1
Multicast addresses: 239.192.135.232
Node addresses: 192.168.0.11


node 2 pvecm is:

root@slave01:~# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: clustername
Cluster Id: 34657
Cluster Member: Yes
Cluster Generation: 436
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: slave01
Node ID: 2
Multicast addresses: 239.192.135.232
Node addresses: 192.168.0.12


and node 3 pvecm is :


root@slave02:~# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: clustername
Cluster Id: 34657
Cluster Member: Yes
Cluster Generation: 420
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: slave02
Node ID: 3
Multicast addresses: 239.192.135.232
Node addresses: 192.168.0.13

Cheers,

Raj
 
ok got the cluster back online by doing the following:

restarted the services :
1 nodemanager
2 pvecluster
3 cman

on all three boxes and now they are all online.


But i still would like to know this one :


But that kills the start of the vm and by the time the services are started for the quorum.

Is there a way to force the vm to restart after the quorum has started pls?


Cheers,

Raj