No quorum

Lamarus

Well-Known Member
Sep 18, 2017
51
0
46
Hi, i have three servers on collacation. Trying to get a cluster. Administrators of the collacation say that multicast is enabled in network and check it. But command omping -c 10000 -i 0.001 -F -q 10.22.16.5 10.22.16.6 says "waiting for response msg" all time command is runing. What can i check on servers or its no multicast in network ?
 
did you run the omping command on all nodes simultaneously ?
 
  • Like
Reactions: Lamarus
did you run the omping command on all nodes simultaneously ?
no, but ok, i run omping on all nodes, test connection. Then whay cluster dont work ? I run before test just two commands - pvecm create and pvecm add, on another server. On first server, where i run pvecm create, in syslog many messages like this:

Sep 18 17:27:33 pve pmxcfs[2919]: [status] crit: cpg_send_message failed: 9
Sep 18 17:27:33 pve pmxcfs[2919]: [status] crit: cpg_send_message failed: 9
Sep 18 17:27:33 pve pmxcfs[2919]: [status] crit: cpg_send_message failed: 9
Sep 18 17:27:34 pve pmxcfs[2919]: [dcdb] notice: cpg_join retry 40710
Sep 18 17:27:35 pve pmxcfs[2919]: [dcdb] notice: cpg_join retry 40720
 
Last edited:
there are few steps you have to follow before creating cluster
1. check network configuration, all nodes need to be able to ping each other
2. add all nodes name in /etc/hosts file & check host name must be resolve for all host
3. check firewall rules
4. check multicast with the command omping -c 10000 -i 0.001 -F -q 10.22.16.5 10.22.16.6 10.22.16.6 on all nodes simultaneously.
5. create cluster with the command pvecm create clustername
6. check your cluster with pvecm status before add other host servers to cluster, if its show quorum with 1 vote it means you have create a cluster and also you can check logs with journalctl -f
 
  • Like
Reactions: Lamarus
there are few steps you have to follow before creating cluster
1. check network configuration, all nodes need to be able to ping each other
2. add all nodes name in /etc/hosts file & check host name must be resolve for all host
3. check firewall rules
4. check multicast with the command omping -c 10000 -i 0.001 -F -q 10.22.16.5 10.22.16.6 10.22.16.6 on all nodes simultaneously.
5. create cluster with the command pvecm create clustername
6. check your cluster with pvecm status before add other host servers to cluster, if its show quorum with 1 vote it means you have create a cluster and also you can check logs with journalctl -f

1. Check, ok
2. Check, ok. But i add all nodes name in /etc/hosts after cluster create. Its critical ?
3. no ip tables rules on server
4. Check, ok now < 1 %.
6. pvecm status:

Version: 6.2.0
Config Version: 1
Cluster Name: prox-cluster
Cluster Id: 59572
Cluster Member: Yes
Cluster Generation: 44
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags:
Ports Bound:
Node name: pve
Node ID: 1
Multicast addresses: x
Node addresses: 10.22.16.5

Now my problem is in no quorum
 
can you please run pvecm expected 1 command and then share the output of pvecm status.
also share journalctl -f output
Version: 6.2.0
Config Version: 1
Cluster Name: prox-cluster
Cluster Id: 59572
Cluster Member: Yes
Cluster Generation: 44
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags:
Ports Bound:
Node name: pve
Node ID: 1
Multicast addresses: x
Node addresses: 10.22.16.5

>>also share journalctl -f output

I have debian 7.8, but maybe syslog may help:

Sep 19 11:08:42 pve pmxcfs[2919]: [status] crit: cpg_send_message failed: 6
Sep 19 11:08:42 pve pvestatd[3442]: status update time (30.147 seconds)
Sep 19 11:08:43 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 10
Sep 19 11:08:44 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 20
Sep 19 11:08:45 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 30
Sep 19 11:08:46 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 40
Sep 19 11:08:47 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 50
Sep 19 11:08:48 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 60
Sep 19 11:08:49 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 70
Sep 19 11:08:50 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 80
Sep 19 11:08:51 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 90
Sep 19 11:08:52 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 100
Sep 19 11:08:52 pve pmxcfs[2919]: [status] notice: cpg_send_message retried 100 times
Sep 19 11:08:52 pve pmxcfs[2919]: [status] crit: cpg_send_message failed: 6
Sep 19 11:08:53 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 10
Sep 19 11:08:54 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 20
Sep 19 11:08:55 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 30
Sep 19 11:08:56 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 40
Sep 19 11:08:57 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 50
Sep 19 11:08:58 pve pmxcfs[2919]: [status] notice: cpg_send_message retry 60

and tail -100 /var/log/messages
root@pve:~# tail -100 /var/log/messages
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Sep 18 16:19:27 pve corosync[3077]: [CMAN ] CMAN 1364188437 (built Mar 25 2013 06:14:01) started
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: openais cluster membership service B.01.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: openais event service B.01.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: openais message service B.03.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: openais distributed locking service B.03.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: openais timer service A.01.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync configuration service
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync profile loading service
Sep 18 16:19:27 pve corosync[3077]: [QUORUM] Using quorum provider quorum_cman
Sep 18 16:19:27 pve corosync[3077]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Sep 18 16:19:27 pve corosync[3077]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Sep 18 16:19:27 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 16:19:27 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Left:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 16:19:27 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Left:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 16:19:27 pve corosync[3077]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 18 16:19:27 pve corosync[3077]: [CMAN ] quorum regained, resuming activity
Sep 18 16:19:27 pve corosync[3077]: [QUORUM] This node is within the primary component and will provide service.
Sep 18 16:19:27 pve corosync[3077]: [QUORUM] Members[1]: 1
Sep 18 16:19:27 pve corosync[3077]: [QUORUM] Members[1]: 1
Sep 18 16:19:27 pve corosync[3077]: [CPG ] chosen downlist: sender r(0) ip(10.22.16.5) ; members(old:0 left:0)
Sep 18 16:19:27 pve corosync[3077]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 18 16:19:27 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 16:19:27 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Left:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 16:19:27 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 16:19:27 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Left:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 16:19:27 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 16:19:27 pve corosync[3077]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 18 16:19:31 pve fenced[3135]: fenced 1364188437 started
Sep 18 16:19:31 pve dlm_controld[3151]: dlm_controld 1364188437 started
Sep 18 16:19:41 pve kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Sep 18 16:19:41 pve kernel: Netfilter messages via NETLINK v0.30.
Sep 18 16:19:41 pve kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
Sep 18 16:19:41 pve kernel: tun: Universal TUN/TAP device driver, 1.6
Sep 18 16:19:41 pve kernel: tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
Sep 18 16:19:41 pve kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Sep 18 16:19:41 pve kernel: Enabling conntracks and NAT for ve0
Sep 18 16:19:41 pve kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Sep 18 16:19:41 pve kernel: ploop_dev: module loaded
Sep 18 16:19:42 pve kernel: ip_set: protocol 6
Sep 18 16:19:45 pve pvesh: <root@pam> starting task UPID:pve:00000D8C:00000C36:59BF9DC1:startall::root@pam:
Sep 18 17:27:43 pve kernel: vmbr0: port 1(eth0) entering disabled state
Sep 18 17:27:46 pve kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Up, 100 Mbps full duplex
Sep 18 17:27:46 pve kernel: vmbr0: port 1(eth0) entering forwarding state
Sep 18 17:27:52 pve corosync[3077]: [TOTEM ] A processor failed, forming new configuration.
Sep 18 17:27:54 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 17:27:54 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 17:27:54 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 17:27:54 pve corosync[3077]: [CLM ] Members Left:
Sep 18 17:27:54 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 17:27:54 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 17:27:54 pve corosync[3077]: [CMAN ] quorum lost, blocking activity
Sep 18 17:27:54 pve corosync[3077]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Sep 18 17:27:54 pve corosync[3077]: [QUORUM] Members[0]:
Sep 18 17:27:54 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 17:27:54 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 17:27:54 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 17:27:54 pve corosync[3077]: [CLM ] Members Left:
Sep 18 17:27:54 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 17:27:54 pve corosync[3077]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 18 17:27:54 pve corosync[3077]: [CMAN ] quorum regained, resuming activity
Sep 18 17:27:54 pve corosync[3077]: [QUORUM] This node is within the primary component and will provide service.
Sep 18 17:27:54 pve corosync[3077]: [QUORUM] Members[1]: 1
Sep 18 17:27:54 pve corosync[3077]: [QUORUM] Members[1]: 1
Sep 18 17:27:54 pve corosync[3077]: [CPG ] chosen downlist: sender r(0) ip(10.22.16.5) ; members(old:1 left:0)
Sep 18 17:27:54 pve corosync[3077]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 18 17:27:59 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 17:27:59 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 17:27:59 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 17:27:59 pve corosync[3077]: [CLM ] Members Left:
Sep 18 17:27:59 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 17:27:59 pve corosync[3077]: [CLM ] CLM CONFIGURATION CHANGE
Sep 18 17:27:59 pve corosync[3077]: [CLM ] New Configuration:
Sep 18 17:27:59 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 17:27:59 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 17:27:59 pve corosync[3077]: [CLM ] Members Left:
Sep 18 17:27:59 pve corosync[3077]: [CLM ] Members Joined:
Sep 18 17:27:59 pve corosync[3077]: [CLM ] #011r(0) ip(10.22.16.5)
Sep 18 17:27:59 pve corosync[3077]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 19 06:25:05 pve rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2775" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Sep 19 11:06:31 pve corosync[3077]: [QUORUM] Members[1]: 1
 
Last edited:
can you please post the result of the omping ?
node1
10.22.16.6 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.108/0.191/0.667/0.045
10.22.16.6 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.112/0.196/0.650/0.042
10.22.16.7 : unicast, xmt/rcv/%loss = 9484/9484/0%, min/avg/max/std-dev = 0.113/0.196/0.921/0.048
10.22.16.7 : multicast, xmt/rcv/%loss = 9484/9484/0%, min/avg/max/std-dev = 0.115/0.205/0.933/0.046
node2
10.22.16.5 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.107/0.188/0.510/0.042
10.22.16.5 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.120/0.203/0.526/0.042
10.22.16.7 : unicast, xmt/rcv/%loss = 9654/9654/0%, min/avg/max/std-dev = 0.108/0.184/0.881/0.043
10.22.16.7 : multicast, xmt/rcv/%loss = 9654/9654/0%, min/avg/max/std-dev = 0.121/0.198/0.892/0.043
node3
10.22.16.5 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.102/0.193/1.840/0.045
10.22.16.5 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.113/0.200/1.833/0.043
10.22.16.6 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.111/0.182/1.236/0.039
10.22.16.6 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.120/0.194/1.219/0.040
 
I suspect that IGMP snooping is active on the switch but no querier is active, use a longer running check to confirm this.
On all three nodes execute:
Code:
omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...

This check runs for ~10mins as the default IGMP snooping interval for when to drop multicast messages from a not registered multicast group member is 5 min.
The check you run already was a quicker check for confirming that multicast may work at all.
 
  • Like
Reactions: Lamarus
Oh, and if you're On Debian 7.8 which means Proxmox VE 3.X and you now try to create a new cluster I highly recommend to use PVE 5 if you can start from fresh.
PVE 3.4 is EOL.
 
  • Like
Reactions: Lamarus
My quest is to update from pve 3.4 with ceph and many vm's to 5.

I.e., the opposite of an fresh start.
OK, then lets see if we can get cluster communication to work for now, that must be done with PVE 3.4 as with PVE 5.
 
  • Like
Reactions: Lamarus
I suspect that IGMP snooping is active on the switch but no querier is active, use a longer running check to confirm this.
On all three nodes execute:
Code:
omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...

This check runs for ~10mins as the default IGMP snooping interval for when to drop multicast messages from a not registered multicast group member is 5 min.
The check you run already was a quicker check for confirming that multicast may work at all.

node1
10.22.16.51 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.138/0.304/0.548/0.064
10.22.16.51 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.151/0.316/0.542/0.068
10.22.16.52 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.126/0.280/0.541/0.062
10.22.16.52 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.134/0.293/0.565/0.066
node2
10.22.16.50 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.149/0.276/0.389/0.056
10.22.16.50 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.158/0.293/0.400/0.058
10.22.16.52 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.122/0.276/0.448/0.061
10.22.16.52 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.133/0.288/0.468/0.059
node3
10.22.16.50 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.169/0.347/0.410/0.022
10.22.16.50 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.189/0.351/0.399/0.022
10.22.16.51 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.179/0.323/0.399/0.018
10.22.16.51 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.201/0.344/0.388/0.016

ip's not changed, just write in convenient for me format
 
OK, looks good. Also revisited your logs, corosync does not complain about communication.
Can you post the cluster configs from /etc/pve/cluster.conf (redact public IPs)
Maybe from all tree nodes?
 
OK, looks good. Also revisited your logs, corosync does not complain about communication.
Can you post the cluster configs from /etc/pve/cluster.conf (redact public IPs)
Maybe from all tree nodes?

node1
root@pve:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="prox-cluster" config_version="1">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>

<clusternodes>
<clusternode name="pve" votes="1" nodeid="1"/>
</clusternodes>

</cluster>

node2
root@pve:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="prox-cluster" config_version="1">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>

<clusternodes>
<clusternode name="pve" votes="1" nodeid="1"/>
</clusternodes>

</cluster>

node3
root@pve:~# cat /etc/pve/cluster.conf
cat: /etc/pve/cluster.conf: No such file or directory

hmmm, i restart all nodes some time ago. Do not change anything just to resolve no quorum problem. This node now say this on pvecm status command:
root@pve:~# pvecm status
cman_tool: Cannot open connection to cman, is it running ?

and /etc/init.d/cman status gives nothing

remembered that when the node2 and node3 was added to the cluster, the message "waited for the quorum" was repeated many times
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!