TASK ERROR: cluster not ready - no quorum?

raj

Renowned Member
Sep 17, 2011
219
4
83
www.businessparksolutions.com
Hiya,

I have set up a cluster currently with 2 boxes , the 3 box is ready but has not been added yet.

When i reboot the second node, I get error saying that the TASK ERROR: cluster not ready - no quorum?

But in the syslog i have something else:

Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:38 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:38 slave01 corosync[1516]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 21 12:17:38 slave01 corosync[1516]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.12) ; members(old:1 left:0)
Jul 21 12:17:38 slave01 corosync[1516]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 21 12:17:38 slave01 kernel: svc: failed to register lockdv1 RPC service (errno 97).
Jul 21 12:17:38 slave01 task UPID:slave01:000006E6:00001872:500A8FC8:startall::root@pam:: cluster not ready - no quorum?
Jul 21 12:17:38 slave01 pvesh: <root@pam> end task UPID:slave01:000006E6:00001872:500A8FC8:startall::root@pam: cluster not ready - no quorum?
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:41 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:41 slave01 corosync[1516]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 21 12:17:41 slave01 corosync[1516]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.12) ; members(old:1 left:0)
Jul 21 12:17:41 slave01 corosync[1516]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] New Configuration:
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] Members Left:
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] Members Joined:
Jul 21 12:17:44 slave01 corosync[1516]: [CLM ] CLM CONFIGURATION CHANGE



Multicast is enable on the switch.

Can anyone advise as i do not want to put the 3 server in if there is a configuration issue.

Cheers,

Raj
 
Multicast is running and the this is what i have in the logs:

Jul 22 21:47:02 master corosync[1557]: [CLM ] CLM CONFIGURATION CHANGE
Jul 22 21:47:02 master corosync[1557]: [CLM ] New Configuration:
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.11)
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.13)
Jul 22 21:47:02 master corosync[1557]: [CLM ] Members Left:
Jul 22 21:47:02 master corosync[1557]: [CLM ] Members Joined:
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.12)
Jul 22 21:47:02 master corosync[1557]: [CLM ] #011r(0) ip(192.168.0.13)
Jul 22 21:47:02 master corosync[1557]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 22 21:47:02 master corosync[1557]: [CMAN ] quorum regained, resuming activity
Jul 22 21:47:02 master corosync[1557]: [QUORUM] This node is within the primary component and will provide service.
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[2]: 1 2
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[2]: 1 2
Jul 22 21:47:02 master pmxcfs[1435]: [status] notice: node has quorum
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[3]: 1 2 3
Jul 22 21:47:02 master corosync[1557]: [QUORUM] Members[3]: 1 2 3
Jul 22 21:47:02 master pmxcfs[1435]: [dcdb] notice: members: 1/1435, 2/1418
Jul 22 21:47:02 master pmxcfs[1435]: [dcdb] notice: starting data syncronisation



This seems to be when a node is rebooted.

Is it that when you reboot a node that its in the cluster it takes time to come back?

Cheers,

Raj
 
But that kills the start of the vm and by the time the services are started for the quorum.

Is there a way to force the vm to restart after the quorum has started pls?


Cheers,

Raj
 
I rebooted all 3 members in the cluster and now i am getting the following from 1 node if i try to connect to node 2 :

connection error 500: Can`t connect to 195.49.147.103:8006(connect: Connection timed out)

Also on node 1 I am getting the following error:

Jul 29 01:42:22 master pmxcfs[1435]: [status] crit: cpg_send_message failed: 9
Jul 29 01:42:22 master pmxcfs[1435]: [status] crit: cpg_send_message failed: 9

But node 2 and 3 are connecting fine.

When I do a pvecm status from node 1 :

root@master:/var/log/cluster# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: clustername
Cluster Id: 34657
Cluster Member: Yes
Cluster Generation: 420
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: master
Node ID: 1
Multicast addresses: 239.192.135.232
Node addresses: 192.168.0.11


node 2 pvecm is:

root@slave01:~# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: clustername
Cluster Id: 34657
Cluster Member: Yes
Cluster Generation: 436
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: slave01
Node ID: 2
Multicast addresses: 239.192.135.232
Node addresses: 192.168.0.12


and node 3 pvecm is :


root@slave02:~# pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: clustername
Cluster Id: 34657
Cluster Member: Yes
Cluster Generation: 420
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: slave02
Node ID: 3
Multicast addresses: 239.192.135.232
Node addresses: 192.168.0.13

Cheers,

Raj
 
ok got the cluster back online by doing the following:

restarted the services :
1 nodemanager
2 pvecluster
3 cman

on all three boxes and now they are all online.


But i still would like to know this one :


But that kills the start of the vm and by the time the services are started for the quorum.

Is there a way to force the vm to restart after the quorum has started pls?


Cheers,

Raj
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!